3. Storing big data

Big Data: A Very Short Introduction ◽

10.1093/actrade/9780198779575.003.0003 ◽

2017 ◽

pp. 26-43

Author(s):

Dawn E. Holmes

Keyword(s):

Big Data ◽

Data Compression ◽

File Systems ◽

Lossless Compression ◽

Unstructured Data ◽

Distributed File Systems ◽

Nosql Databases ◽

The Difference ◽

Relational Database Management ◽

Relational Database Management Systems

The amount of data generated is approximately doubling every two years. How do we store and manage these colossal amounts of data? ‘Storing big data’ considers database storage and the idea of distributing tasks across clusters of computers. Relational database management systems are used to create, maintain, access, and manipulate structured data, whereas distributed file systems provide effective and reliable storage for unstructured data across many servers. NoSQL databases and their architecture are discussed along with the CAP Theorem and Cloud storage. The difference between lossless compression for text files and lossy data compression for sound and image files is also explained.

Download Full-text

Efficient Handling of Big Data Volume Using Heterogeneous Distributed File Systems

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v15p132 ◽

2014 ◽

Vol 15 (4) ◽

pp. 151-154 ◽

Cited By ~ 1

Author(s):

Radha krishnan R ◽

◽

Karthik S

Keyword(s):

Big Data ◽

File Systems ◽

Distributed File Systems ◽

Data Volume

Download Full-text

A Big Data Analysis on Distributed File Storage System

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6427.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2383-2388

Keyword(s):

Big Data ◽

Data Analysis ◽

Storage System ◽

Real Life ◽

File Systems ◽

Big Data Analysis ◽

Distributed File Systems ◽

File Storage ◽

Recent Developments ◽

The Impact

Nowadays, the digital technologies and information systems (i.e. cloud computing and Internet of Things) generated the vast data in terabytes to extract the knowledge for making a better decision by the end users. However, these massive data require a large effort of researchers at multiple levels to analyze for decision making. To find a better development, researchers concentrated on Big Data Analysis (BDA), but the traditional databases, data techniques and platforms suffers from storage, imbalance data, scalability, insufficient accuracy, slow responsiveness and scalability, which leads to very less efficiency in Big Data (BD) context. Therefore, the main objective of this research is to present a generalized view of complete BD system that consists of various stages and major components of every stage to process the BD. In specific, the data management process describes the NoSQL databases and different Parallel Distributed File Systems (PDFS) and then, the impact of challenges, analyzed for BD with recent developments provides a better understanding that how different tools and technologies apply to solve real-life applications.

Download Full-text

Modeling of distributed file System in big data storage by event- B

MATEC Web of Conferences ◽

10.1051/matecconf/201821004042 ◽

2018 ◽

Vol 210 ◽

pp. 04042

Author(s):

Ammar Alhaj Ali ◽

Pavel Varacha ◽

Said Krayem ◽

Roman Jasek ◽

Petr Zacek ◽

...

Keyword(s):

Big Data ◽

Data Storage ◽

High Performance ◽

File System ◽

Formal Method ◽

File Systems ◽

Distributed File System ◽

Distributed File Systems ◽

Data Systems ◽

Big Data Systems

Nowadays, a wide set of systems and application, especially in high performance computing, depends on distributed environments to process and analyses huge amounts of data. As we know, the amount of data increases enormously, and the goal to provide and develop efficient, scalable and reliable storage solutions has become one of the major issue for scientific computing. The storage solution used by big data systems is Distributed File Systems (DFSs), where DFS is used to build a hierarchical and unified view of multiple file servers and shares on the network. In this paper we will offer Hadoop Distributed File System (HDFS) as DFS in big data systems and we will present an Event-B as formal method that can be used in modeling, where Event-B is a mature formal method which has been widely used in a number of industry projects in a number of domains, such as automotive, transportation, space, business information, medical device and so on, And will propose using the Rodin as modeling tool for Event-B, which integrates modeling and proving as well as the Rodin platform is open source, so it supports a large number of plug-in tools.

Download Full-text

Big Data Processing and Big Analytics

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch014 ◽

2019 ◽

pp. 285-315

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Daily Lives ◽

Scalable Systems ◽

Diverse Data ◽

Special Category

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.

Download Full-text

Challenges and Opportunities in Big Data Processing

Big Data ◽

10.4018/978-1-4666-9840-6.ch096 ◽

2016 ◽

pp. 2074-2097 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

Performance-efficient Recommendation and Prediction Service for Big Data frameworks focusing on Data Compression and In-memory Data Storage Indicators

Scalable Computing Practice and Experience ◽

10.12694/scpe.v22i4.1945 ◽

2021 ◽

Vol 22 (4) ◽

pp. 401-412

Author(s):

Hrachya Astsatryan ◽

Arthur Lalayan ◽

Aram Kocharyan ◽

Daniel Hagimont

Keyword(s):

Big Data ◽

Data Compression ◽

Data Storage ◽

File Systems ◽

Large Datasets ◽

Data Sets ◽

Mapreduce Framework ◽

Data Intensive ◽

Parallel Data ◽

Data Intensive Applications

The MapReduce framework manages Big Data sets by splitting the large datasets into a set of distributed blocks and processes them in parallel. Data compression and in-memory file systems are widely used methods in Big Data processing to reduce resource-intensive I/O operations and improve I/O rate correspondingly. The article presents a performance-efficient modular and configurable decision-making robust service relying on data compression and in-memory data storage indicators. The service consists of Recommendation and Prediction modules, predicts the execution time of a given job based on metrics, and recommends the best configuration parameters to improve Hadoop and Spark frameworks' performance. Several CPU and data-intensive applications and micro-benchmarks have been evaluated to improve the performance, including Log Analyzer, WordCount, and K-Means.

Download Full-text

Use of Big Data in Aviation

Automated Systems in the Aviation and Aerospace Industries - Advances in Mechatronics and Mechanical Engineering ◽

10.4018/978-1-5225-7709-6.ch017 ◽

2019 ◽

pp. 436-452

Author(s):

Roman Odarchenko ◽

Zohaib Hassan ◽

Abnash Zaman

Keyword(s):

Big Data ◽

Relational Database ◽

Management System ◽

Database Management System ◽

Relational Models ◽

Nosql Databases ◽

Major Shift ◽

Logical Modeling ◽

Common Problems ◽

Relational Database Management

The expansion of data and its efficient handling is becoming a more popular tendency in recent times bringing new difficulties to learn new avenues. Data analytics can be done more proficiently with the availability of distributed architecture of not only SQL (NoSQL) databases. Technological advancements around us are changing very rapidly, and major shift is being carried out, a switch from relational to non-relational world. When moving from relational to non-relational models, database administrators face common problems due to the fact that NoSQL is a no-schema database. The purpose of conducting this research is to propose a mechanism by which the schema of a relational database management system and its data can be transformed into big data by following some standardize guidelines. This model can be quite useful for relational database administrators by enabling them to give attention to logical modeling rather than procedural writing for each and every SQL to NoSQL transition.

Download Full-text

Tools for the Storage and Analysis of Spatial Big Data

Proccedings of 10th International Conference "Environmental Engineering" ◽

10.3846/enviro.2017.216 ◽

2017 ◽

Author(s):

Przemysław Lisowski ◽

Adam Piórkowski ◽

Andrzej Lesniak

Keyword(s):

Big Data ◽

Spatial Data ◽

File Systems ◽

Large Datasets ◽

Distributed File Systems ◽

Data Systems ◽

Data Production ◽

Spatial Big Data ◽

Big Data Systems ◽

Access To Data

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.

Download Full-text

Role of Open Source Software in Big Data Storage

Advances in Data Mining and Database Management - Handbook of Research on Big Data Storage and Visualization Techniques ◽

10.4018/978-1-5225-3142-5.ch005 ◽

2018 ◽

pp. 123-150 ◽

Cited By ~ 1

Author(s):

Rupali Ahuja ◽

Jigyasa Malik ◽

Ronak Tyagi ◽

R. Brinda

Keyword(s):

Big Data ◽

Open Source ◽

Data Storage ◽

Open Source Software ◽

File Systems ◽

Distributed File Systems ◽

The World ◽

Storage Technologies ◽

Big Data Storage

Today, the world is revolving around Big Data. Each organization is trying hard to explore ways for deriving value out of huge pile of data we are generating each moment. Open Source Software are widely being adopted by most academicians, researchers and industrialists to handle various Big Data needs because of their easy availability, flexibility, affordability and interoperability. As a result, several open source Big Data tools have been developed. This chapter discusses the role of Open Source Software in Big Data Storage and how various organizations have benefitted from its use. It provides an overview of popular Open Source Big Data Storage technologies existing today. Distributed File Systems and NoSQL databases meant for storing Big Data have been discussed with their features, applications and comparison.

Download Full-text

Two-level fusion big data compression and reconstruction framework combining second-generation wavelet and lossless compression

Complex & Intelligent Systems ◽

10.1007/s40747-020-00158-z ◽

2020 ◽

Vol 6 (3) ◽

pp. 607-620

Author(s):

Zhang Chuanchao

Keyword(s):

Big Data ◽

Data Compression ◽

Second Generation ◽

Lossless Compression ◽

Second Generation Wavelet ◽

Level Fusion

Download Full-text