BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

2012 ◽  
Vol 13 (03n04) ◽  
pp. 1250009 ◽  
Author(s):  
CHANGQING JI ◽  
YU LI ◽  
WENMING QIU ◽  
YINGWEI JIN ◽  
YUJIE XU ◽  
...  

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

Big Data ◽  
2016 ◽  
pp. 2074-2097 ◽  
Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.


Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

The development and extensive use of highly distributed and scalable systems to process Big Data have been widely considered. New data management architectures, e.g. distributed file systems and NoSQL databases, are used in this context. However, features of Big Data like their complexity and data analytics demands indicate that these concepts solve Big Data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of Big Data Management Systems is considered. In this work we will discuss these trends and evaluate some current approaches to Big Data processing, identify the current challenges, and suggest possible research directions.


2021 ◽  
Vol 2066 (1) ◽  
pp. 012022
Author(s):  
Cheng Luo

Abstract Due to the continuous development of information technology, data has increasingly become the core of the daily operation of enterprises and institutions, the main basis for decision-making development. At the same time, due to the development of network, the storage and management of computer data has attracted more and more attention. Aiming at the common problems of computer data storage and management in practical work, this paper analyzes the object and content of data management, investigates the situation of computer data storage and management in China in recent two years, and interviews and tests the data of programming in this design platform. At the same time, in view of the related problems, the research results are applied to practice. On the basis of big data, the storage and management platform is designed. The research and design adopts a special B+ tree node linear structure of CIRC tree, and the linear node structure is changed into a ring structure, which greatly reduces the number of data persistence instructions and the performance overhead. The results show that compared with the most advanced B+ tree design for nonvolatile memory, crab tree has 3.1 times and 2.5 times performance improvement in reading and writing, respectively. Compared with the previous NV tree designed for nonvolatile memory, it has a performance improvement of 1.5 times, and a performance improvement of 8.4 times compared with the latest fast-fair. In the later stage, the expansion of the platform functions is conducive to the analysis and construction of data related storage and management functions, and further improve the ability of data management.


Author(s):  
Ganesh Chandra Deka

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.


Author(s):  
Jaroslav Pokorny ◽  
Bela Stantic

Development and wide acceptance of data-driven applications in many aspects of our daily lives is generating waste volume of diverse data, which can be collected and analyzed to support various valuable decisions. Management and processing of this big data is a challenge. The development and extensive use of highly distributed and scalable systems to process big data have been widely considered. New data management architectures (e.g., distributed file systems and NoSQL databases) are used in this context. However, features of big data like their complexity and data analytics demands indicate that these concepts solve big data problems only partially. A development of so called NewSQL databases is highly relevant and even special category of big data management systems is considered. In this chapter, the authors discuss these trends and evaluate some current approaches to big data processing and analytics, identify the current challenges, and suggest possible research directions.


Author(s):  
Pankaj Lathar ◽  
K. G. Srinivasa ◽  
Abhishek Kumar ◽  
Nabeel Siddiqui

Advancements in web-based technology and the proliferation of sensors and mobile devices interacting with the internet have resulted in immense data management requirements. These data management activities include storage, processing, demand of high-performance read-write operations of big data. Large-scale and high-concurrency applications like SNS and search engines have appeared to be facing challenges in using the relational database to store and query dynamic user data. NoSQL and cloud computing has emerged as a paradigm that could meet these requirements. The available diversity of existing NoSQL and cloud computing solutions make it difficult to comprehend the domain and choose an appropriate solution for a specific business task. Therefore, this chapter reviews NoSQL and cloud-system-based solutions with the goal of providing a perspective in the field of data storage technology/algorithms, leveraging guidance to researchers and practitioners to select the best-fit data store, and identifying challenges and opportunities of the paradigm.


Author(s):  
Ankit Shah ◽  
Mamta C. Padole

Big Data processing and analysis requires tremendous processing capability. Distributed computing brings many commodity systems under the common platform to answer the need for Big Data processing and analysis. Apache Hadoop is the most suitable set of tools for Big Data storage, processing, and analysis. But Hadoop found to be inefficient when it comes to heterogeneous set computers which have different processing capabilities. In this research, we propose the Saksham model which optimizes the processing time by efficient use of node processing capability and file management. The proposed model shows the performance improvement for Big Data processing. To achieve better performance, Saksham model uses two vital aspects of heterogeneous distributed computing: Effective block rearrangement policy and use of node processing capability. The results demonstrate that the proposed model successfully achieves better job execution time and improves data locality.


2014 ◽  
Vol 556-562 ◽  
pp. 6302-6306 ◽  
Author(s):  
Chun Mei Duan

In allusion to limitations of traditional data processing technology in big data processing, big data processing system architecture based on hadoop is designed, using the characteristics of quantification, unstructured and dynamic of cloud computing.It uses HDFS be responsible for big data storage, and uses MapReduce be responsible for big data calculation and uses Hbase as unstructured data storage database, at the same time a system of storage and cloud computing security model are designed, in order to implement efficient storage, management, and retrieval of data,thus it can save construction cost, and guarantee system stability, reliability and security.


Sign in / Sign up

Export Citation Format

Share Document