BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

Download Full-text

Challenges and Opportunities in Big Data Processing

Big Data ◽

10.4018/978-1-4666-9840-6.ch096 ◽

2016 ◽

pp. 2074-2097 ◽

Cited By ~ 1

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.

Download Full-text

Challenges and Opportunities in Big Data Processing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Managing Big Data in Cloud Computing Environments ◽

10.4018/978-1-4666-9834-5.ch001 ◽

2016 ◽

pp. 1-24 ◽

Cited By ~ 4

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Data Management Systems ◽

Scalable Systems ◽

Challenges And Opportunities ◽

Special Category

Download Full-text

Computer Data Storage and Management Platform Based on Big Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/2066/1/012022 ◽

2021 ◽

Vol 2066 (1) ◽

pp. 012022

Author(s):

Cheng Luo

Keyword(s):

Big Data ◽

Data Management ◽

Performance Improvement ◽

Data Storage ◽

Nonvolatile Memory ◽

Linear Structure ◽

Ring Structure ◽

Computer Data ◽

Management Platform ◽

A Performance

Abstract Due to the continuous development of information technology, data has increasingly become the core of the daily operation of enterprises and institutions, the main basis for decision-making development. At the same time, due to the development of network, the storage and management of computer data has attracted more and more attention. Aiming at the common problems of computer data storage and management in practical work, this paper analyzes the object and content of data management, investigates the situation of computer data storage and management in China in recent two years, and interviews and tests the data of programming in this design platform. At the same time, in view of the related problems, the research results are applied to practice. On the basis of big data, the storage and management platform is designed. The research and design adopts a special B+ tree node linear structure of CIRC tree, and the linear node structure is changed into a ring structure, which greatly reduces the number of data persistence instructions and the performance overhead. The results show that compared with the most advanced B+ tree design for nonvolatile memory, crab tree has 3.1 times and 2.5 times performance improvement in reading and writing, respectively. Compared with the previous NV tree designed for nonvolatile memory, it has a performance improvement of 1.5 times, and a performance improvement of 8.4 times compared with the latest fast-fair. In the later stage, the expansion of the platform functions is conducive to the analysis and construction of data related storage and management functions, and further improve the ability of data management.

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Extreme big data processing in large-scale graph analytics and billion-scale social simulation

Proceedings of the 5th ACM/SPEC international conference on Performance engineering - ICPE '14 ◽

10.1145/2568088.2576096 ◽

2014 ◽

Cited By ~ 1

Author(s):

Toyotaro Suzumura

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

Social Simulation ◽

Big Data Processing ◽

Graph Analytics

Download Full-text

Semantic Enriched Category Recommendation System for Large-Scale Emails Exploiting Big Data Processing Technologies

2015 IEEE 39th Annual Computer Software and Applications Conference ◽

10.1109/compsac.2015.96 ◽

2015 ◽

Author(s):

Jae-Ik Kim ◽

Kyung-Wook Park ◽

Hyung-Rak Jo ◽

Dong-Ho Lee

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

Recommendation System ◽

Big Data Processing ◽

Processing Technologies ◽

Enriched Category

Download Full-text

Big Data Processing and Big Analytics

Advances in Data Mining and Database Management - Emerging Technologies and Applications in Data Processing and Management ◽

10.4018/978-1-5225-8446-9.ch014 ◽

2019 ◽

pp. 285-315

Author(s):

Jaroslav Pokorny ◽

Bela Stantic

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

File Systems ◽

Distributed File Systems ◽

Big Data Processing ◽

Daily Lives ◽

Scalable Systems ◽

Diverse Data ◽

Special Category

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.

Download Full-text

Comparison Study of Different NoSQL and Cloud Paradigm for Better Data Storage Technology

Handbook of Research on Cloud and Fog Computing Infrastructures for Data Science - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-5225-5972-6.ch015 ◽

2018 ◽

pp. 312-343

Author(s):

Pankaj Lathar ◽

K. G. Srinivasa ◽

Abhishek Kumar ◽

Nabeel Siddiqui

Keyword(s):

Cloud Computing ◽

Data Management ◽

Data Storage ◽

High Performance ◽

Large Scale ◽

Web Based ◽

Storage Technology ◽

Data Store ◽

Challenges And Opportunities ◽

User Data

Advancements in web-based technology and the proliferation of sensors and mobile devices interacting with the internet have resulted in immense data management requirements. These data management activities include storage, processing, demand of high-performance read-write operations of big data. Large-scale and high-concurrency applications like SNS and search engines have appeared to be facing challenges in using the relational database to store and query dynamic user data. NoSQL and cloud computing has emerged as a paradigm that could meet these requirements. The available diversity of existing NoSQL and cloud computing solutions make it difficult to comprehend the domain and choose an appropriate solution for a specific business task. Therefore, this chapter reviews NoSQL and cloud-system-based solutions with the goal of providing a perspective in the field of data storage technology/algorithms, leveraging guidance to researchers and practitioners to select the best-fit data store, and identifying challenges and opportunities of the paradigm.

Download Full-text

“Saksham Model” Performance Improvisation Using Node Capability Evaluation in Apache Hadoop

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch012 ◽

2020 ◽

pp. 206-230

Author(s):

Ankit Shah ◽

Mamta C. Padole

Keyword(s):

Big Data ◽

Distributed Computing ◽

Data Processing ◽

Data Storage ◽

Model Performance ◽

Big Data Processing ◽

Apache Hadoop ◽

Processing Capability ◽

Proposed Model ◽

Capability Evaluation

Big Data processing and analysis requires tremendous processing capability. Distributed computing brings many commodity systems under the common platform to answer the need for Big Data processing and analysis. Apache Hadoop is the most suitable set of tools for Big Data storage, processing, and analysis. But Hadoop found to be inefficient when it comes to heterogeneous set computers which have different processing capabilities. In this research, we propose the Saksham model which optimizes the processing time by efficient use of node processing capability and file management. The proposed model shows the performance improvement for Big Data processing. To achieve better performance, Saksham model uses two vital aspects of heterogeneous distributed computing: Effective block rearrangement policy and use of node processing capability. The results demonstrate that the proposed model successfully achieves better job execution time and improves data locality.

Download Full-text

Design of Big Data Processing System Architecture Based on Hadoop under the Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.6302 ◽

2014 ◽

Vol 556-562 ◽

pp. 6302-6306 ◽

Cited By ~ 3

Author(s):

Chun Mei Duan

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Storage ◽

System Architecture ◽

Processing System ◽

System Stability ◽

Security Model ◽

Data Processing System ◽

Big Data Processing

In allusion to limitations of traditional data processing technology in big data processing, big data processing system architecture based on hadoop is designed, using the characteristics of quantification, unstructured and dynamic of cloud computing.It uses HDFS be responsible for big data storage, and uses MapReduce be responsible for big data calculation and uses Hbase as unstructured data storage database, at the same time a system of storage and cloud computing security model are designed, in order to implement efficient storage, management, and retrieval of data,thus it can save construction cost, and guarantee system stability, reliability and security.

Download Full-text