A New Algorithm of Grading and Classification for Massive Data Processing Based on Decision Tree

Big data problem has caused widespread concern from industry to academia in recent years. As the amount of data produced by various industries and sectors of rapid growth, increasing demands on data processing and analysis capabilities, how to face the challenges of data, discover new opportunities, the issue has received wide attention. As a traditional industry, the oil drilling or refinery enterprise is facing the operational status of the system to produce large amounts of data. This text introduced an approach to massive data processing for oil enterprise based on cloud computing and Internet of Things.

Download Full-text

A Test Data Generation for Performance Testing in Massive Data Processing Systems

Lecture Notes in Electrical Engineering - Advanced Multimedia and Ubiquitous Engineering ◽

10.1007/978-981-13-1328-8_26 ◽

2018 ◽

pp. 207-213

Author(s):

Sunkyung Kim ◽

JiSu Park ◽

Kang Hyoun Kim ◽

Jin Gon Shon

Keyword(s):

Data Processing ◽

Test Data ◽

Performance Testing ◽

Massive Data ◽

Test Data Generation ◽

Data Generation ◽

Massive Data Processing

Download Full-text

GPU Computations on Hadoop Clusters for Massive Data Processing

Lecture Notes in Electrical Engineering - Proceedings of the 3rd International Conference on Intelligent Technologies and Engineering Systems (ICITES2014) ◽

10.1007/978-3-319-17314-6_66 ◽

2016 ◽

pp. 515-521

Author(s):

Wenbo Chen ◽

Shungou Xu ◽

Hai Jiang ◽

Tien-Hsiung Weng ◽

Mario Donato Marino ◽

...

Keyword(s):

Data Processing ◽

Massive Data ◽

Hadoop Clusters ◽

Massive Data Processing

Download Full-text

Upgrading a high performance computing environment for massive data processing

Journal of Internet Services and Applications ◽

10.1186/s13174-019-0118-7 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Lucas M. Ponce ◽

Walter dos Santos ◽

Wagner Meira ◽

Dorgival Guedes ◽

Daniele Lezzi ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Data Access ◽

Massive Data ◽

Analysis Tool ◽

Data Framework ◽

Performance Computing ◽

Massive Data Processing

Abstract High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

Download Full-text