Storage Optimization of Product Big Data Based on Hadoop Platform

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.

Download Full-text

Big Data Analytics in Aquaculture Using Hive and Hadoop Platform

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch002 ◽

2018 ◽

pp. 29-35

Author(s):

P. Venkateswara Rao ◽

A. Ramamohan Reddy ◽

V. Sucharita

Keyword(s):

Big Data ◽

Data Management ◽

Single Machine ◽

Data Analytics ◽

Historical Data ◽

Big Data Analytics ◽

Huge Amount ◽

Hadoop Platform ◽

Different Sources

In the field of Aquaculture with the help of digital advancements huge amount of data is constantly produced for which the data of the aquaculture has entered in the big data world. The requirement for data management and analytics model is increased as the development progresses. Therefore, all the data cannot be stored on single machine. There is need for solution that stores and analyzes huge amounts of data which is nothing but Big Data. In this chapter a framework is developed that provides a solution for shrimp disease by using historical data based on Hive and Hadoop. The data regarding shrimps is acquired from different sources like aquaculture websites, various reports of laboratory etc. The noise is removed after the collection of data from various sources. Data is to be uploaded on HDFS after normalization is done and is to be put in a file that supports Hive. Finally classified data will be located in particular place. Based on the features extracted from aquaculture data, HiveQL can be used to analyze shrimp diseases symptoms.

Download Full-text

Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment

Soft Computing and Medical Bioinformatics - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-981-13-0059-2_6 ◽

2018 ◽

pp. 47-54

Author(s):

S. Kalyan Chakravarthy ◽

N. Sudhakar ◽

E. Srinivasa Reddy ◽

D. Venkata Subramanian ◽

P. Shankar

Keyword(s):

Big Data ◽

Dimension Reduction ◽

Optimization Techniques ◽

Cluster Environment ◽

Storage Optimization ◽

And Storage

Download Full-text

Research on adaptive recommendation algorithm for big data mining based on Hadoop platform

International Journal of Internet Protocol Technology ◽

10.1504/ijipt.2019.103712 ◽

2019 ◽

Vol 12 (4) ◽

pp. 213

Author(s):

Jinming Zhang

Keyword(s):

Data Mining ◽

Big Data ◽

Recommendation Algorithm ◽

Big Data Mining ◽

Hadoop Platform

Download Full-text

Design and implementation of Hadoop platform for processing big data of logistics which is based on IoT

International Journal of Services Technology and Management ◽

10.1504/ijstm.2017.081883 ◽

2017 ◽

Vol 23 (1/2) ◽

pp. 131 ◽

Cited By ~ 1

Author(s):

Nam Ho Kim

Keyword(s):

Big Data ◽

Design And Implementation ◽

Hadoop Platform

Download Full-text

APPLYING E-LEARNING SYSTEMS FOR BIG DATA EDUCATION

Information System in Management ◽

10.22630/isim.2018.7.2.8 ◽

2018 ◽

Vol 7 (2) ◽

pp. 85-96

Author(s):

Grzegorz Arkit ◽

Aleksandra Arkit ◽

Silva Robak

Keyword(s):

Big Data ◽

Computer Science ◽

Teaching Methods ◽

Learning Systems ◽

Teaching Staff ◽

Massive Data ◽

Learning Platform ◽

E Learning ◽

Hadoop Platform ◽

Education Of Students

Processing massive data amounts and Big Data became nowadays one of the most significant problems in computer science. The difficulties with education on this field arise, the appropriate teaching methods and tools are needed. The processing of vast amounts of data arriving quickly requires the choice and arrangement of extended hardware platforms.In the paper we will show an approach for teaching students in Big Data and also the choice and arrangement of an appropriate programming platform for Big Data laboratories. Usage of an e-learning platform Moodle, a dedicated platform for teaching, could allow the teaching staff and students an improved contact with by enhancing mutually communication possibilities. We will show the preparation of Hadoop platform tools and Big Data cluster based on Cloudera and Ambari. The both solutions together could enable to cope with the problems in education of students in the field of Big Data.

Download Full-text

Management and Visualization of Geoscience Big Data based on Hadoop Platform

Scientific Journal of Research & Reviews ◽

10.33552/sjrr.2021.03.000554 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Zhiqiang Zeng

Keyword(s):

Big Data ◽

Hadoop Platform

Download Full-text

Research Challenges in Big Data Security with Hadoop Platform

Communications in Computer and Information Science - Recent Trends in Image Processing and Pattern Recognition ◽

10.1007/978-981-13-9187-3_49 ◽

2019 ◽

pp. 550-560

Author(s):

M. R. Shrihari ◽

T. N. Manjunath ◽

R. A. Archana ◽

Ravindra S. Hegadi

Keyword(s):

Big Data ◽

Data Security ◽

Research Challenges ◽

Hadoop Platform

Download Full-text

PARSUC: A Parallel Subsampling-Based Method for Clustering Remote Sensing Big Data

Sensors ◽

10.3390/s19153438 ◽

2019 ◽

Vol 19 (15) ◽

pp. 3438 ◽

Cited By ~ 3

Author(s):

Xia ◽

Huang ◽

Li ◽

Zhou ◽

Zhang

Keyword(s):

Remote Sensing ◽

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Image Data ◽

Data Partitioning ◽

Data Mining Technique ◽

Mining Technique ◽

Hadoop Platform ◽

Parallel Clustering

Remote sensing big data (RSBD) is generally characterized by huge volumes, diversity, and high dimensionality. Mining hidden information from RSBD for different applications imposes significant computational challenges. Clustering is an important data mining technique widely used in processing and analyzing remote sensing imagery. However, conventional clustering algorithms are designed for relatively small datasets. When applied to problems with RSBD, they are, in general, too slow or inefficient for practical use. In this paper, we proposed a parallel subsampling-based clustering (PARSUC) method for improving the performance of RSBD clustering in terms of both efficiency and accuracy. PARSUC leverages a novel subsampling-based data partitioning (SubDP) method to realize three-step parallel clustering, effectively solving the notable performance bottleneck of the existing parallel clustering algorithms; that is, they must cope with numerous repeated calculations to get a reasonable result. Furthermore, we propose a centroid filtering algorithm (CFA) to eliminate subsampling errors and to guarantee the accuracy of the clustering results. PARSUC was implemented on a Hadoop platform by using the MapReduce parallel model. Experiments conducted on massive remote sensing imageries with different sizes showed that PARSUC (1) provided much better accuracy than conventional remote sensing clustering algorithms in handling larger image data; (2) achieved notable scalability with increased computing nodes added; and (3) spent much less time than the existing parallel clustering algorithm in handling RSBD.

Download Full-text

Design and implementation of Hadoop platform for processing big data of logistics which is based on IoT

International Journal of Services Technology and Management ◽

10.1504/ijstm.2017.10002714 ◽

2017 ◽

Vol 23 (1/2) ◽

pp. 131

Author(s):

Nam Ho Kim

Keyword(s):

Big Data ◽

Design And Implementation ◽

Hadoop Platform

Download Full-text