Big Data Forensics: Hadoop Distributed File Systems as a Case Study

Author(s):  
Mohammed Asim ◽  
Dean Richard McKinnel ◽  
Ali Dehghantanha ◽  
Reza M. Parizi ◽  
Mohammad Hammoudeh ◽  
...  

Nowadays, the digital technologies and information systems (i.e. cloud computing and Internet of Things) generated the vast data in terabytes to extract the knowledge for making a better decision by the end users. However, these massive data require a large effort of researchers at multiple levels to analyze for decision making. To find a better development, researchers concentrated on Big Data Analysis (BDA), but the traditional databases, data techniques and platforms suffers from storage, imbalance data, scalability, insufficient accuracy, slow responsiveness and scalability, which leads to very less efficiency in Big Data (BD) context. Therefore, the main objective of this research is to present a generalized view of complete BD system that consists of various stages and major components of every stage to process the BD. In specific, the data management process describes the NoSQL databases and different Parallel Distributed File Systems (PDFS) and then, the impact of challenges, analyzed for BD with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.


2018 ◽  
Vol 210 ◽  
pp. 04042
Author(s):  
Ammar Alhaj Ali ◽  
Pavel Varacha ◽  
Said Krayem ◽  
Roman Jasek ◽  
Petr Zacek ◽  
...  

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.


Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.


Big Data ◽  
2016 ◽  
pp. 2074-2097 ◽  
Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.


Author(s):  
Przemysław Lisowski ◽  
Adam Piórkowski ◽  
Andrzej Lesniak

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.


Author(s):  
Rupali Ahuja ◽  
Jigyasa Malik ◽  
Ronak Tyagi ◽  
R. Brinda

Today, the world is revolving around Big Data. Each organization is trying hard to explore ways for deriving value out of huge pile of data we are generating each moment. Open Source Software are widely being adopted by most academicians, researchers and industrialists to handle various Big Data needs because of their easy availability, flexibility, affordability and interoperability. As a result, several open source Big Data tools have been developed. This chapter discusses the role of Open Source Software in Big Data Storage and how various organizations have benefitted from its use. It provides an overview of popular Open Source Big Data Storage technologies existing today. Distributed File Systems and NoSQL databases meant for storing Big Data have been discussed with their features, applications and comparison.


2018 ◽  
Vol 7 (4.15) ◽  
pp. 16
Author(s):  
Mohammed Fakherldin ◽  
Ibrahim Aaker Targio Hashem ◽  
Abdullah Alzuabi ◽  
Faiz Alotaibi

Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas in big data. Hadoop is one of the most popular platforms for big data, thus, Hadoop MapReduce is used to store data in Hadoop distributed file systems. While, cloud computing is considered an excellent candidate for storing and processing the big data. However, processing big data across multiple nodes is a challenging task. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. This paper provides a review and analysis of the impact of using physical versus cloud cluster in the processing a large amount of data. This analysis has an impact on the processing in terms of execution time and cost of using each one of them. The result indicates that the use of cloud virtual machines helped better utilize the resources of the host computer. 


Author(s):  
Dawn E. Holmes

The amount of data generated is approximately doubling every two years. How do we store and manage these colossal amounts of data? ‘Storing big data’ considers database storage and the idea of distributing tasks across clusters of computers. Relational database management systems are used to create, maintain, access, and manipulate structured data, whereas distributed file systems provide effective and reliable storage for unstructured data across many servers. NoSQL databases and their architecture are discussed along with the CAP Theorem and Cloud storage. The difference between lossless compression for text files and lossy data compression for sound and image files is also explained.


Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.


Sign in / Sign up

Export Citation Format

Share Document