Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.

Download Full-text

MAPREDUCE: INSIGHT ANALYSIS OF BIG DATA VIA PARALLEL DATA PROCESSING USING JAVA PROGRAMMING, HIVE AND APACHE PIG

International Journal of Advanced Research in Computer Science ◽

10.26483/ijarcs.v9i1.5414 ◽

2018 ◽

Vol 9 (1) ◽

pp. 536-540 ◽

Cited By ~ 1

Author(s):

Dr. Ujjwal Agarwal ◽

Keyword(s):

Big Data ◽

Data Processing ◽

Java Programming ◽

Parallel Data ◽

Apache Pig

Download Full-text

Parallel Data Mining and Applications in Hospital Big Data Processing

Big Data Management and Processing ◽

10.1201/9781315154008-20 ◽

2017 ◽

pp. 403-424

Author(s):

Jianguo Chen ◽

Zhuo Tang ◽

Kenli Li ◽

Keqin Li

Keyword(s):

Data Mining ◽

Big Data ◽

Data Processing ◽

Big Data Processing ◽

Parallel Data ◽

Parallel Data Mining

Download Full-text

Optimize Parallel Data Access in Big Data Processing

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing ◽

10.1109/ccgrid.2015.168 ◽

2015 ◽

Cited By ~ 2

Author(s):

Jiangling Yin ◽

Jun Wang

Keyword(s):

Big Data ◽

Data Processing ◽

Data Access ◽

Big Data Processing ◽

Parallel Data

Download Full-text

A real-time parallel data acquisition and big data processing method for four-in-one optical fiber sensor network

AIP Advances ◽

10.1063/1.5029815 ◽

2018 ◽

Vol 8 (7) ◽

pp. 075019

Author(s):

Wanshan Zhu ◽

Junfeng Jiang ◽

Jin Wang ◽

Xinggang Liu ◽

Tiegen Liu

Keyword(s):

Big Data ◽

Optical Fiber ◽

Data Processing ◽

Real Time ◽

Sensor Network ◽

Optical Fiber Sensor ◽

Processing Method ◽

Fiber Sensor ◽

Data Processing Method ◽

Parallel Data

Download Full-text

Gene Sequences Parallel Alignment Model Based on Multiple Inputs and Outputs

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2019.2.3539 ◽

2019 ◽

Vol 14 (2) ◽

pp. 141-153

Author(s):

Xiaolong Feng ◽

Jing Gao

Keyword(s):

Big Data ◽

Data Processing ◽

Sequence Alignment ◽

Gene Sequence ◽

Computational Time ◽

Gene Sequences ◽

Model Based ◽

Hadoop Platform ◽

Alignment Model ◽

Inputs And Outputs

Bioinformatics computing is a kind of big data processing problem, which usually has the characteristics of large data scale, large computational load and long computational time. Therefore, the use of big data technology in bioinformatics computing has gradually become a research hotspot, and using Hadoop for gene sequence alignment is one of it. It is a common way to use various tools to complete a job in the field of Biocomputing. In most studies of parallel alignment of gene sequences using Hadoop, third-party tools are also needed. However, there are few methods using Hadoop independently to complete gene sequences alignment. Adding data processing with other tools to Hadoop workflow not only affects the improvement of computing performance, but also complicates the application. In this paper, a parallel alignment model of gene sequences based on multiple inputs and outputs is proposed, which can independently complete parallel alignment of gene sequences in Hadoop platform without using other tools. This model not only simplifies the process flow of gene sequence alignment, but also improves the performance compared with other methods. This paper describes in detail the method of manipulating gene sequences with multiple inputs and outputs modes on Hadoop platform and the design of a computing model based on this method, and proves the superiority of this model through experiments.

Download Full-text

A Dietary Nutrition Analysis Method Leveraging Big Data Processing and Fuzzy Clustering

Health Information Science - Lecture Notes in Computer Science ◽

10.1007/978-3-319-48335-1_14 ◽

2016 ◽

pp. 129-135

Author(s):

Lihui Lei ◽

Yuan Cai

Keyword(s):

Big Data ◽

Data Processing ◽

Fuzzy Clustering ◽

Analysis Method ◽

Big Data Processing ◽

Dietary Nutrition

Download Full-text

Storage Optimization of Product Big Data Based on Hadoop Platform

Computer Science and Application ◽

10.12677/csa.2021.115154 ◽

2021 ◽

Vol 11 (05) ◽

pp. 1503-1511

Author(s):

耐东王

Keyword(s):

Big Data ◽

Hadoop Platform ◽

Storage Optimization

Download Full-text

The Data Allocation Strategy Based on Load in NoSQL Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1464 ◽

2014 ◽

Vol 513-517 ◽

pp. 1464-1469 ◽

Cited By ~ 3

Author(s):

Zhi Kun Chen ◽

Shu Qiang Yang ◽

Shuang Tan ◽

Hui Zhao ◽

Li He ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Large Scale ◽

Internet Technology ◽

Data Allocation ◽

Allocation Strategy ◽

Data Parallel ◽

Large Scale Data ◽

Nosql Database ◽

Parallel Data

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.

Download Full-text

Study of CDR Real-Time Query Based on Big Data Technologies

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.845 ◽

2013 ◽

Vol 462-463 ◽

pp. 845-848

Author(s):

Zhi Heng Gao ◽

Kang Chen ◽

Ling Yan Bi

Keyword(s):

Big Data ◽

Data Processing ◽

Open Source ◽

Real Time ◽

Performance Test ◽

Real Dataset ◽

Big Data Technologies ◽

Hadoop Platform ◽

Query System ◽

Big Data Technology

This paper describes big data technology layers, analyses the CDR (Call Data Records) real-time query scenario of telecommunications and brings forward a fast indexing and query solution based on the open source Hadoop platform. A CDR real-time query system was built according to the solution. A performance test was conducted with the real dataset of a city with 3 million subscribers. Compared with the existing system, the big data solution can greatly improve data processing performance and support real-time query with lower hardware and software investment.

Download Full-text

Implementasi Sistem Informasi Monitoring Pengolahan Data Inventory Gudang Pada PT.Talaga Mulya Indah

Journal CERITA ◽

10.33050/cerita.v6i2.1158 ◽

2020 ◽

Vol 6 (2) ◽

pp. 187-197

Author(s):

Nurlaila Suci Rahayu Rais ◽

Dedeh Apriyani ◽

Gito Gardjito

Keyword(s):

Data Processing ◽

Integrated Control ◽

Unified Modeling Language ◽

Review Literature ◽

Analysis Method ◽

Inventory Data ◽

Literature Study ◽

Unified Modeling ◽

The Difference ◽

The Right

Monitoring of warehouse inventory data processing is an important thing for companies. PT Talaga mulya indah is still manual using paper media, causing problems that have an effect on existing information, namely: problems with data processing of incoming and outgoing goods. And the difference between data on the amount of stock of goods available with physical data, often occurs inputting data more than once for the same item, searching for available data, and making reports so that it impedes companies in monitoring inventory of existing stock of goods. Which aims to create a system that can provide updated information to facilitate the warehouse admin in making inventory reports, and reduce errors in input by means of integrated control. In this study, the authors used the data collection method used in this analysis using the method of observation, interviews, and literature review (literature study). For analysis using the PIECES analysis method. Furthermore, the system design used is UML (Unified Modeling Language). The results of this study are expected to produce the right data in the process of monitoring inventory data processing, also can provide the right information and make it easier to control the overall availability of goods.

Download Full-text