hadoop platform
Recently Published Documents


TOTAL DOCUMENTS

162
(FIVE YEARS 46)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Yajun Wang ◽  
Shengming Cheng ◽  
Xinchen Zhang ◽  
Junyu Leng ◽  
Jun Liu

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.


2021 ◽  
Vol 9 (2) ◽  
pp. 30-37
Author(s):  
Nahla Aljojo

While Big Data analytics can provide a variety of benefits, processing heterogeneous data comes with its own set of limitations. A transaction pattern must be studied independently while working with Bitcoin data, this study examines twitter data related to Bitcoin and investigate communications pattern on bitcoin transactional tweet. Using the hashtags #Bitcoin or #BTC on Twitter, a vast amount of data was gathered, which was mined to uncover a pattern that everyone either (speculators, teaches, or the stakeholders) uses on Twitter to discuss Bitcoin transactions. This aim is to determine the direction of Bitcoin transaction tweets based on historical data. As a result, this research proposes using Big Data analytics to track Bitcoin transaction communications in tweets in order to discover a pattern. Hadoop platform MapReduce was used. The finding indicate that In the map step of the procedure, Hadoop's tokenize the dataset and parse them to the mapper where thirteen patterns were established and reduced to three patterns using the attributes previously stored data in the Hadoop context, one of which is the Emoji data that was left out in previous research discussions, but the text is only one piece of the puzzle on bitcoin transaction interaction, and the key part of it is “No certainty, only possibilities” in Bitcoin transactions


2021 ◽  
Vol 2083 (3) ◽  
pp. 032059
Author(s):  
Qiang Chen ◽  
Meiling Deng

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.


2021 ◽  
Vol 11 (18) ◽  
pp. 8651
Author(s):  
Vladimir Belov ◽  
Alexander N. Kosenkov ◽  
Evgeny Nikulchev

One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts.


2021 ◽  
Author(s):  
Xudong Wei ◽  
Qingzhen Sun ◽  
Xianli Liu ◽  
Caixu Yue ◽  
Steven Y. Liang ◽  
...  

Abstract In the big data era, traditional data mining technology cannot meet the requirements of massive data processing with the background of intelligent manufacturing. Aiming at insufficient computing power and low efficiency in mining process, this paper proposes a improved K-means clustering algorithm based on the concept of distributed clustering in cloud computing environment. The improved algorithm (T.K-means) is combined with MapReduce computing framework of Hadoop platform to realize parallel computing, so as to perform processing tasks of massive data. In order to verify the practical performance of T.K-means algorithm, taking machining data of milling Ti-6Al-4V alloy as the mining object. The mapping relationship among milling parameters, surface roughness and material removal rate is mined, and the optimized value for milling parameters are obtained. The results show that T.K-means algorithm can be used to mine the optimal milling parameters, so that the best surface roughness can be obtained in milling Ti-6Al-4V titanium alloy.


CONVERTER ◽  
2021 ◽  
pp. 373-390
Author(s):  
Wei Zhan, Jinhui She, Yangyang Zhang, Chenfan Sun

With the rapid increase in the sales scale of e-commerce platforms is accompanied by the rapid growth of consumer evaluation data on commodities at the same time. How to use big data analysis and visualization technology to mine the valuable information in the massive consumers evaluation data is an urgent issue in promoting the development of e-commerce platforms. However, the amount of e-commerce evaluation data is huge, growing fast, and mostly unstructured data, which is typical big data. In order to efficiently realize the visualization of e-commerce evaluation big data, this paper proposes an end-to-end four-layer framework for data visualization system. The data acquisition layer uses the Webcollector crawler to crawl a total of 420,000 mobile sales evaluation data on the JD website and stores them in the MySQL database; The data import layer uses the Sqoop tool to import MySQL data into the Hadoop platform; The data processing layer uses HDFS and MapReduce to process and analyze big data; The visualization implementation layer uses Jsp+Servelet+JavaScript+echart integrated technology to visualize the big data of distribution of mobile phone sales, user purchase impressions, and user mobile phone portraits. Which helps consumers choose their favorite mobile phones conveniently, and provide decision-making support for e-commerce companies to more accurately launch products, benefiting both parties


Sign in / Sign up

Export Citation Format

Share Document