MAPREDUCE: INSIGHT ANALYSIS OF BIG DATA VIA PARALLEL DATA PROCESSING USING JAVA PROGRAMMING, HIVE AND APACHE PIG

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.

Download Full-text

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Mathematical Problems in Engineering ◽

10.1155/2021/3839800 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Yajun Wang ◽

Shengming Cheng ◽

Xinchen Zhang ◽

Junyu Leng ◽

Jun Liu

Keyword(s):

Big Data ◽

Data Processing ◽

Analysis Method ◽

Parallel Data ◽

Extraction Algorithm ◽

Seafood Products ◽

Hadoop Platform ◽

Storage Optimization ◽

Block Storage ◽

Fusion Feature

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.

Download Full-text