Storage Optimization of Product Big Data Based on Hadoop Platform

2021 ◽  
Vol 11 (05) ◽  
pp. 1503-1511
Author(s):  
耐东 王
2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yajun Wang ◽  
Shengming Cheng ◽  
Xinchen Zhang ◽  
Junyu Leng ◽  
Jun Liu

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.


Author(s):  
P. Venkateswara Rao ◽  
A. Ramamohan Reddy ◽  
V. Sucharita

In the field of Aquaculture with the help of digital advancements huge amount of data is constantly produced for which the data of the aquaculture has entered in the big data world. The requirement for data management and analytics model is increased as the development progresses. Therefore, all the data cannot be stored on single machine. There is need for solution that stores and analyzes huge amounts of data which is nothing but Big Data. In this chapter a framework is developed that provides a solution for shrimp disease by using historical data based on Hive and Hadoop. The data regarding shrimps is acquired from different sources like aquaculture websites, various reports of laboratory etc. The noise is removed after the collection of data from various sources. Data is to be uploaded on HDFS after normalization is done and is to be put in a file that supports Hive. Finally classified data will be located in particular place. Based on the features extracted from aquaculture data, HiveQL can be used to analyze shrimp diseases symptoms.


2018 ◽  
Vol 7 (2) ◽  
pp. 85-96
Author(s):  
Grzegorz Arkit ◽  
Aleksandra Arkit ◽  
Silva Robak

Processing massive data amounts and Big Data became nowadays one of the most significant problems in computer science. The difficulties with education on this field arise, the appropriate teaching methods and tools are needed. The processing of vast amounts of data arriving quickly requires the choice and arrangement of extended hardware platforms.In the paper we will show an approach for teaching students in Big Data and also the choice and arrangement of an appropriate programming platform for Big Data laboratories. Usage of an e-learning platform Moodle, a dedicated platform for teaching, could allow the teaching staff and students an improved contact with by enhancing mutually communication possibilities. We will show the preparation of Hadoop platform tools and Big Data cluster based on Cloudera and Ambari. The both solutions together could enable to cope with the problems in education of students in the field of Big Data.


Sensors ◽  
2019 ◽  
Vol 19 (15) ◽  
pp. 3438 ◽  
Author(s):  
Xia ◽  
Huang ◽  
Li ◽  
Zhou ◽  
Zhang

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.


Sign in / Sign up

Export Citation Format

Share Document