scholarly journals Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yajun Wang ◽  
Shengming Cheng ◽  
Xinchen Zhang ◽  
Junyu Leng ◽  
Jun Liu

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.

AIP Advances ◽  
2018 ◽  
Vol 8 (7) ◽  
pp. 075019
Author(s):  
Wanshan Zhu ◽  
Junfeng Jiang ◽  
Jin Wang ◽  
Xinggang Liu ◽  
Tiegen Liu

2019 ◽  
Vol 14 (2) ◽  
pp. 141-153
Author(s):  
Xiaolong Feng ◽  
Jing Gao

Bioinformatics computing is a kind of big data processing problem, which usually has the characteristics of large data scale, large computational load and long computational time. Therefore, the use of big data technology in bioinformatics computing has gradually become a research hotspot, and using Hadoop for gene sequence alignment is one of it. It is a common way to use various tools to complete a job in the field of Biocomputing. In most studies of parallel alignment of gene sequences using Hadoop, third-party tools are also needed. However, there are few methods using Hadoop independently to complete gene sequences alignment. Adding data processing with other tools to Hadoop workflow not only affects the improvement of computing performance, but also complicates the application. In this paper, a parallel alignment model of gene sequences based on multiple inputs and outputs is proposed, which can independently complete parallel alignment of gene sequences in Hadoop platform without using other tools. This model not only simplifies the process flow of gene sequence alignment, but also improves the performance compared with other methods. This paper describes in detail the method of manipulating gene sequences with multiple inputs and outputs modes on Hadoop platform and the design of a computing model based on this method, and proves the superiority of this model through experiments.


2014 ◽  
Vol 513-517 ◽  
pp. 1464-1469 ◽  
Author(s):  
Zhi Kun Chen ◽  
Shu Qiang Yang ◽  
Shuang Tan ◽  
Hui Zhao ◽  
Li He ◽  
...  

With the development of Internet technology and Cloud Computing, more and more applications have to be confronted with the challenges of big data. NoSQL Database is fit to the management of big data because of the characteristics of high scalability, high availability and high fault-tolerance. And it is one of the technologies of the management of big data. We will improve the performance of massive data processing of NoSQL Database through the large scale data parallel data processing and data localize of computing. So how to allocate the data will be a big challenge of NoSQL Database. In this paper we will propose a data allocation strategy based on the nodes load, which can adjust the data allocation strategy by the execute status of the system. And it can keep the balance of data allocation by a small cost. At last we will use some experiments to verify the effectiveness of the strategy which is proposed in this paper. The experiments show that it can improve the systems performance than other allocation strategy.


2013 ◽  
Vol 462-463 ◽  
pp. 845-848
Author(s):  
Zhi Heng Gao ◽  
Kang Chen ◽  
Ling Yan Bi

This paper describes big data technology layers, analyses the CDR (Call Data Records) real-time query scenario of telecommunications and brings forward a fast indexing and query solution based on the open source Hadoop platform. A CDR real-time query system was built according to the solution. A performance test was conducted with the real dataset of a city with 3 million subscribers. Compared with the existing system, the big data solution can greatly improve data processing performance and support real-time query with lower hardware and software investment.


2020 ◽  
Vol 6 (2) ◽  
pp. 187-197
Author(s):  
Nurlaila Suci Rahayu Rais ◽  
Dedeh Apriyani ◽  
Gito Gardjito

Monitoring of warehouse inventory data processing is an important thing for companies. PT Talaga mulya indah is still manual using paper media, causing problems that have an effect on existing information, namely: problems with data processing of incoming and outgoing goods. And the difference between data on the amount of stock of goods available with physical data, often occurs inputting data more than once for the same item, searching for available data, and making reports so that it impedes companies in monitoring inventory of existing stock of goods. Which aims to create a system that can provide updated information to facilitate the warehouse admin in making inventory reports, and reduce errors in input by means of integrated control. In this study, the authors used the data collection method used in this analysis using the method of observation, interviews, and literature review (literature study). For analysis using the PIECES analysis method. Furthermore, the system design used is UML (Unified Modeling Language). The results of this study are expected to produce the right data in the process of monitoring inventory data processing, also can provide the right information and make it easier to control the overall availability of goods.


Sign in / Sign up

Export Citation Format

Share Document