hadoop platform Latest Research Papers

Design of Internet Opinion Analysis System for Emergencies in Big Data Environment Based on Hadoop Platform

2021 International Conference on Big Data Analytics for Cyber-Physical System in Smart City - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-981-16-7469-3_10 ◽

2022 ◽

pp. 95-101

Author(s):

Yongchang Ren ◽

Lihua Han ◽

Jun Li

Keyword(s):

Big Data ◽

Opinion Analysis ◽

Hadoop Platform ◽

Data Environment ◽

Analysis System

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Mathematical Problems in Engineering ◽

10.1155/2021/3839800 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Yajun Wang ◽

Shengming Cheng ◽

Xinchen Zhang ◽

Junyu Leng ◽

Jun Liu

Keyword(s):

Big Data ◽

Data Processing ◽

Analysis Method ◽

Parallel Data ◽

Extraction Algorithm ◽

Seafood Products ◽

Hadoop Platform ◽

Storage Optimization ◽

Block Storage ◽

Fusion Feature

The traditional distributed database storage architecture has the problems of low efficiency and storage capacity in managing data resources of seafood products. We reviewed various storage and retrieval technologies for the big data resources. A block storage layout optimization method based on the Hadoop platform and a parallel data processing and analysis method based on the MapReduce model are proposed. A multireplica consistent hashing algorithm based on data correlation and spatial and temporal properties is used in the parallel data processing and analysis method. The data distribution strategy and block size adjustment are studied based on the Hadoop platform. A multidata source parallel join query algorithm and a multi-channel data fusion feature extraction algorithm based on data-optimized storage are designed for the big data resources of seafood products according to the MapReduce parallel frame work. Practical verification shows that the storage optimization and data-retrieval methods provide supports for constructing a big data resource-management platform for seafood products and realize efficient organization and management of the big data resources of seafood products. The execution time of multidata source parallel retrieval is only 32% of the time of the standard Hadoop scheme, and the execution time of the multichannel data fusion feature extraction algorithm is only 35% of the time of the standard Hadoop scheme.

Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

ARO-The Scientific Journal of Koya University ◽

10.14500/aro.10857 ◽

2021 ◽

Vol 9 (2) ◽

pp. 30-37

Author(s):

Nahla Aljojo

Keyword(s):

Big Data ◽

Data Analytics ◽

Historical Data ◽

Big Data Analytics ◽

Large Data ◽

Heterogeneous Data ◽

Twitter Data ◽

Data Volume ◽

Hadoop Platform ◽

Transaction Pattern

While Big Data analytics can provide a variety of benefits, processing heterogeneous data comes with its own set of limitations. A transaction pattern must be studied independently while working with Bitcoin data, this study examines twitter data related to Bitcoin and investigate communications pattern on bitcoin transactional tweet. Using the hashtags #Bitcoin or #BTC on Twitter, a vast amount of data was gathered, which was mined to uncover a pattern that everyone either (speculators, teaches, or the stakeholders) uses on Twitter to discuss Bitcoin transactions. This aim is to determine the direction of Bitcoin transaction tweets based on historical data. As a result, this research proposes using Big Data analytics to track Bitcoin transaction communications in tweets in order to discover a pattern. Hadoop platform MapReduce was used. The finding indicate that In the map step of the procedure, Hadoop's tokenize the dataset and parse them to the mapper where thirteen patterns were established and reduced to three patterns using the attributes previously stored data in the Hadoop context, one of which is the Emoji data that was left out in previous research discussions, but the text is only one piece of the puzzle on bitcoin transaction interaction, and the key part of it is “No certainty, only possibilities” in Bitcoin transactions

Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/2083/3/032059 ◽

2021 ◽

Vol 2083 (3) ◽

pp. 032059

Author(s):

Qiang Chen ◽

Meiling Deng

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Privacy Protection ◽

Data Privacy ◽

Absolute Error ◽

Average Absolute Error ◽

Regression Algorithms ◽

Hadoop Platform ◽

Logistic Regression Algorithm ◽

Computing Speed

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.

Optimization of Relevance Weighting Algorithm Based on Hadoop Platform in Human Resource Information System

10.1109/icosec51865.2021.9591765 ◽

2021 ◽

Author(s):

Zhihua Yan

Keyword(s):

Information System ◽

Human Resource ◽

Human Resource Information System ◽

Resource Information ◽

Hadoop Platform

Routing algorithm of real-time multicast communication based on Hadoop platform

10.1145/3482632.3482756 ◽

2021 ◽

Author(s):

Zhengnan Wu

Keyword(s):

Real Time ◽

Routing Algorithm ◽

Multicast Communication ◽

Hadoop Platform

Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

Applied Sciences ◽

10.3390/app11188651 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8651

Author(s):

Vladimir Belov ◽

Alexander N. Kosenkov ◽

Evgeny Nikulchev

Keyword(s):

Big Data ◽

Data Storage ◽

Storage System ◽

Apache Hadoop ◽

Aggregated Data ◽

Data Marts ◽

Hadoop Platform ◽

Analytical Platforms ◽

Big Data Storage

One of the most popular methods for building analytical platforms involves the use of the concept of data lakes. A data lake is a storage system in which the data are presented in their original format, making it difficult to conduct analytics or present aggregated data. To solve this issue, data marts are used, representing environments of stored data of highly specialized information, focused on the requests of employees of a certain department, the vector of an organization’s work. This article presents a study of big data storage formats in the Apache Hadoop platform when used to build data marts.

Design and implementation strategy of data migration system based on Hadoop platform

Journal of Physics Conference Series ◽

10.1088/1742-6596/2010/1/012082 ◽

2021 ◽

Vol 2010 (1) ◽

pp. 012082

Author(s):

Xiaoshu Wang

Keyword(s):

Implementation Strategy ◽

Data Migration ◽

Migration System ◽

Design And Implementation ◽

Hadoop Platform

Research on Parallel Distributed Clustering Algorithm Applied to Milling Parameter Optimization

10.21203/rs.3.rs-808329/v1 ◽

2021 ◽

Author(s):

Xudong Wei ◽

Qingzhen Sun ◽

Xianli Liu ◽

Caixu Yue ◽

Steven Y. Liang ◽

...

Keyword(s):

Surface Roughness ◽

Clustering Algorithm ◽

Removal Rate ◽

Intelligent Manufacturing ◽

Massive Data ◽

Distributed Clustering ◽

Milling Parameters ◽

Hadoop Platform ◽

Practical Performance ◽

Low Efficiency

Abstract In the big data era, traditional data mining technology cannot meet the requirements of massive data processing with the background of intelligent manufacturing. Aiming at insufficient computing power and low efficiency in mining process, this paper proposes a improved K-means clustering algorithm based on the concept of distributed clustering in cloud computing environment. The improved algorithm (T.K-means) is combined with MapReduce computing framework of Hadoop platform to realize parallel computing, so as to perform processing tasks of massive data. In order to verify the practical performance of T.K-means algorithm, taking machining data of milling Ti-6Al-4V alloy as the mining object. The mapping relationship among milling parameters, surface roughness and material removal rate is mined, and the optimized value for milling parameters are obtained. The results show that T.K-means algorithm can be used to mine the optimal milling parameters, so that the best surface roughness can be obtained in milling Ti-6Al-4V titanium alloy.

Statistics Analysis and Visualization for Big Data of E-commerce Platform Sales Evaluation

CONVERTER ◽

10.17762/converter.136 ◽

2021 ◽

pp. 373-390

Author(s):

Wei Zhan, Jinhui She, Yangyang Zhang, Chenfan Sun

Keyword(s):

Big Data ◽

Mobile Phone ◽

Big Data Analysis ◽

Unstructured Data ◽

Evaluation Data ◽

Visualization System ◽

Consumer Evaluation ◽

Integrated Technology ◽

Hadoop Platform ◽

Visualization Technology

With the rapid increase in the sales scale of e-commerce platforms is accompanied by the rapid growth of consumer evaluation data on commodities at the same time. How to use big data analysis and visualization technology to mine the valuable information in the massive consumers evaluation data is an urgent issue in promoting the development of e-commerce platforms. However, the amount of e-commerce evaluation data is huge, growing fast, and mostly unstructured data, which is typical big data. In order to efficiently realize the visualization of e-commerce evaluation big data, this paper proposes an end-to-end four-layer framework for data visualization system. The data acquisition layer uses the Webcollector crawler to crawl a total of 420,000 mobile sales evaluation data on the JD website and stores them in the MySQL database; The data import layer uses the Sqoop tool to import MySQL data into the Hadoop platform; The data processing layer uses HDFS and MapReduce to process and analyze big data; The visualization implementation layer uses Jsp+Servelet+JavaScript+echart integrated technology to visualize the big data of distribution of mobile phone sales, user purchase impressions, and user mobile phone portraits. Which helps consumers choose their favorite mobile phones conveniently, and provide decision-making support for e-commerce companies to more accurately launch products, benefiting both parties

hadoop platform
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Design of Internet Opinion Analysis System for Emergencies in Big Data Environment Based on Hadoop Platform

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Optimization of Relevance Weighting Algorithm Based on Hadoop Platform in Human Resource Information System

Routing algorithm of real-time multicast communication based on Hadoop platform

Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

Design and implementation strategy of data migration system based on Hadoop platform

Research on Parallel Distributed Clustering Algorithm Applied to Milling Parameter Optimization

Statistics Analysis and Visualization for Big Data of E-commerce Platform Sales Evaluation

Export Citation Format

hadoop platformRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Design of Internet Opinion Analysis System for Emergencies in Big Data Environment Based on Hadoop Platform

Block Storage Optimization and Parallel Data Processing and Analysis of Product Big Data Based on the Hadoop Platform

Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

Study of a Privacy Preserving Logistic Regression Algorithm (PPLRA) For Data Privacy in the Context of Big Data

Optimization of Relevance Weighting Algorithm Based on Hadoop Platform in Human Resource Information System

Routing algorithm of real-time multicast communication based on Hadoop platform

Experimental Characteristics Study of Data Storage Formats for Data Marts Development within Data Lakes

Design and implementation strategy of data migration system based on Hadoop platform

Research on Parallel Distributed Clustering Algorithm Applied to Milling Parameter Optimization

Statistics Analysis and Visualization for Big Data of E-commerce Platform Sales Evaluation

hadoop platform
Recently Published Documents