Big Data Forensics: Hadoop Distributed File Systems as a Case Study

Nowadays, the digital technologies and information systems (i.e. cloud computing and Internet of Things) generated the vast data in terabytes to extract the knowledge for making a better decision by the end users. However, these massive data require a large effort of researchers at multiple levels to analyze for decision making. To find a better development, researchers concentrated on Big Data Analysis (BDA), but the traditional databases, data techniques and platforms suffers from storage, imbalance data, scalability, insufficient accuracy, slow responsiveness and scalability, which leads to very less efficiency in Big Data (BD) context. Therefore, the main objective of this research is to present a generalized view of complete BD system that consists of various stages and major components of every stage to process the BD. In specific, the data management process describes the NoSQL databases and different Parallel Distributed File Systems (PDFS) and then, the impact of challenges, analyzed for BD with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

Big Data Processing and Big Analytics

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch014 ◽

2019 ◽

pp. 285-315

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Daily Lives ◽

Scalable Systems ◽

Diverse Data ◽

Special Category

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.

Download Full-text

Challenges and Opportunities in Big Data Processing

Big Data ◽

10.4018/978-1-4666-9840-6.ch096 ◽

2016 ◽

pp. 2074-2097 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

Tools for the Storage and Analysis of Spatial Big Data

Proccedings of 10th International Conference "Environmental Engineering" ◽

10.3846/enviro.2017.216 ◽

2017 ◽

Author(s):

Przemysław Lisowski ◽

Adam Piórkowski ◽

Andrzej Lesniak

Keyword(s):

Big Data ◽

Spatial Data ◽

File Systems ◽

Large Datasets ◽

Distributed File Systems ◽

Data Systems ◽

Data Production ◽

Spatial Big Data ◽

Big Data Systems ◽

Access To Data

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.

Download Full-text

Role of Open Source Software in Big Data Storage

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch005 ◽

2018 ◽

pp. 123-150 ◽

Cited By ~ 1

Author(s):

Rupali Ahuja ◽

Jigyasa Malik ◽

Ronak Tyagi ◽

R. Brinda

Keyword(s):

Big Data ◽

Open Source ◽

Data Storage ◽

Open Source Software ◽

File Systems ◽

Distributed File Systems ◽

The World ◽

Storage Technologies ◽

Big Data Storage

Today, the world is revolving around Big Data. Each organization is trying hard to explore ways for deriving value out of huge pile of data we are generating each moment. Open Source Software are widely being adopted by most academicians, researchers and industrialists to handle various Big Data needs because of their easy availability, flexibility, affordability and interoperability. As a result, several open source Big Data tools have been developed. This chapter discusses the role of Open Source Software in Big Data Storage and how various organizations have benefitted from its use. It provides an overview of popular Open Source Big Data Storage technologies existing today. Distributed File Systems and NoSQL databases meant for storing Big Data have been discussed with their features, applications and comparison.

Download Full-text

Performance Evaluation of Hadoop in Cloud for Big Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.15.21363 ◽

2018 ◽

Vol 7 (4.15) ◽

pp. 16

Author(s):

Mohammed Fakherldin ◽

Ibrahim Aaker Targio Hashem ◽

Abdullah Alzuabi ◽

Faiz Alotaibi

Keyword(s):

Cloud Computing ◽

Big Data ◽

Virtual Machines ◽

File Systems ◽

Research Direction ◽

Distributed File Systems ◽

Hadoop Mapreduce ◽

Recent Trends ◽

New Research ◽

The Impact

Recent trends in big data have shown that the amount of data continues to increase at an exponential rate. This trend has inspired many researchers over the past few years to explore new research direction of studies related to multiple areas in big data. Hadoop is one of the most popular platforms for big data, thus, Hadoop MapReduce is used to store data in Hadoop distributed file systems. While, cloud computing is considered an excellent candidate for storing and processing the big data. However, processing big data across multiple nodes is a challenging task. The problem is even more complex using virtualized clusters in a cloud computing to execute a large number of tasks. This paper provides a review and analysis of the impact of using physical versus cloud cluster in the processing a large amount of data. This analysis has an impact on the processing in terms of execution time and cost of using each one of them. The result indicates that the use of cloud virtual machines helped better utilize the resources of the host computer.

Download Full-text

3. Storing big data

Big Data: A Very Short Introduction ◽

10.1093/actrade/9780198779575.003.0003 ◽

2017 ◽

pp. 26-43

Author(s):

Dawn E. Holmes

Keyword(s):

Big Data ◽

Data Compression ◽

File Systems ◽

Lossless Compression ◽

Unstructured Data ◽

Distributed File Systems ◽

Nosql Databases ◽

The Difference ◽

Relational Database Management ◽

Relational Database Management Systems

The amount of data generated is approximately doubling every two years. How do we store and manage these colossal amounts of data? ‘Storing big data’ considers database storage and the idea of distributing tasks across clusters of computers. Relational database management systems are used to create, maintain, access, and manipulate structured data, whereas distributed file systems provide effective and reliable storage for unstructured data across many servers. NoSQL databases and their architecture are discussed along with the CAP Theorem and Cloud storage. The difference between lossless compression for text files and lossy data compression for sound and image files is also explained.

Download Full-text

Challenges and Opportunities in Big Data Processing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch001 ◽

2016 ◽

pp. 1-24 ◽

Cited By ~ 4

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

Big Data Forensics: Hadoop Distributed File Systems as a Case Study

Efficient Handling of Big Data Volume Using Heterogeneous Distributed File Systems

A Big Data Analysis on Distributed File Storage System

Modeling of distributed file System in big data storage by event- B

Big Data Processing and Big Analytics

Challenges and Opportunities in Big Data Processing

Tools for the Storage and Analysis of Spatial Big Data

Role of Open Source Software in Big Data Storage

Performance Evaluation of Hadoop in Cloud for Big Data

3. Storing big data

Challenges and Opportunities in Big Data Processing

Export Citation Format