MAPREDUCE: INSIGHT ANALYSIS OF BIG DATA VIA PARALLEL DATA PROCESSING USING JAVA PROGRAMMING, HIVE AND APACHE PIG

2018 ◽  
Vol 9 (1) ◽  
pp. 536-540 ◽  
Author(s):  
Dr. Ujjwal Agarwal ◽  
AIP Advances ◽  
2018 ◽  
Vol 8 (7) ◽  
pp. 075019
Author(s):  
Wanshan Zhu ◽  
Junfeng Jiang ◽  
Jin Wang ◽  
Xinggang Liu ◽  
Tiegen Liu

2014 ◽  
Vol 513-517 ◽  
pp. 1464-1469 ◽  
Author(s):  
Zhi Kun Chen ◽  
Shu Qiang Yang ◽  
Shuang Tan ◽  
Hui Zhao ◽  
Li He ◽  
...  

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yajun Wang ◽  
Shengming Cheng ◽  
Xinchen Zhang ◽  
Junyu Leng ◽  
Jun Liu

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.


2019 ◽  
Vol 12 (1) ◽  
pp. 42 ◽  
Author(s):  
Andrey I. Vlasov ◽  
Konstantin A. Muraviev ◽  
Alexandra A. Prudius ◽  
Demid A. Uzenkov

Sign in / Sign up

Export Citation Format

Share Document