An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.

Download Full-text

High-Performance Computing for Big Data Processing

Future Generation Computer Systems ◽

10.1016/j.future.2018.07.054 ◽

2018 ◽

Vol 88 ◽

pp. 693-695 ◽

Cited By ~ 1

Author(s):

Yulei Wu ◽

Yang Xiang ◽

Jingguo Ge ◽

Peter Muller

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Big Data Processing ◽

Performance Computing

Download Full-text

An Overview on the Convergence of High Performance Computing and Big Data Processing

2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS) ◽

10.1109/padsw.2018.8644997 ◽

2018 ◽

Cited By ~ 1

Author(s):

Songzhu Mei ◽

Hongtao Guan ◽

Qinglin Wang

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Big Data Processing ◽

Performance Computing

Download Full-text

Using the World Wide Web to provide a platform independent interface to high performance computing

Digest of Papers COMPCON 95 Technologies for the Information Superhighway CMPCON-95 ◽

10.1109/cmpcon.1995.512355 ◽

2002 ◽

Author(s):

D.W. Robertson ◽

W.E. Johnston

Keyword(s):

World Wide Web ◽

High Performance Computing ◽

High Performance ◽

World Wide ◽

The World ◽

Performance Computing

Download Full-text

Application of parallel calculations in the calculation of quality maintenance of application queues

Electronics and Communications ◽

10.20535/2312-1807.2011.16.4.246766 ◽

2011 ◽

Vol 16 (4) ◽

pp. 177-181

Author(s):

M.O. Alieksieiev ◽

L.S. Hloba ◽

K.O. Yermakova ◽

V.V. Kushnir

Keyword(s):

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Quality Maintenance ◽

Parallel Calculations ◽

Computing Technologies ◽

Qos Parameters ◽

Performance Computing

The usage of high-performance computing technologies in the areas of scientific and engineering researches is considered. The method of the effective data processing paralleling is described. The using of high-performance computing based on the OpenMP library for solving problems in the field of Telecommunication, e.g. computation of the queues QoS parameters, is also analyzed

Download Full-text

CONTENT OF A SPECIAL COURSE ON HIGH PERFORMANCE COMPUTING

Bulletin Series of Physics & Mathematical Sciences ◽

10.51889/2020-3.1728-7901.40 ◽

2020 ◽

Vol 71 (3) ◽

pp. 263-267

Author(s):

М. Serik ◽

◽

G. Zh. Yerlanova ◽

Keyword(s):

Information Technology ◽

High Performance Computing ◽

Computer Technology ◽

High Performance ◽

Practical Importance ◽

Modern Society ◽

Educational Process ◽

Dynamic Development ◽

The World ◽

Performance Computing

At present, along with the dynamic development of computer technology in the world, the most effective ways of solving problems of practical importance are being considered. High performance computing takes the lead in this. Therefore, the development of modern society is closely related to the training of experienced, modern specialists in the field of information technology. This, in turn, depends on the inclusion of new courses in the curriculum and full coverage of these issues in the content of the taught courses. This article analyzes the courses on high performance computing, taught at experimental bases and abroad, on the basis of this, the topics of the special course and the content recommended for implementation in the educational process are determined. During the training, the competencies of students in high performance computing were identified.

Download Full-text

On Construction of Cluster and Grid Computing Platforms for Parallel Bioinformatics Applications

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch405 ◽

2012 ◽

pp. 841-861

Author(s):

Chao-Tung Yang ◽

Wen-Chung Shih

Keyword(s):

Grid Computing ◽

High Performance Computing ◽

High Speed ◽

High Performance ◽

Cluster Computing ◽

Structural Features ◽

Performance Technology ◽

Data Intensive ◽

Computing Platforms ◽

Performance Computing

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.

Download Full-text

Decoupled I/O for Data-Intensive High Performance Computing

2014 43rd International Conference on Parallel Processing Workshops ◽

10.1109/icppw.2014.48 ◽

2014 ◽

Cited By ~ 2

Author(s):

Chao Chen ◽

Yong Chen ◽

Kun Feng ◽

Yanlong Yin ◽

Hassan Eslami ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Data Intensive ◽

Performance Computing

Download Full-text

Hierarchical computing: A high performance computing architecture for data-processing in IoT era

2017 4th International Conference on Systems and Informatics (ICSAI) ◽

10.1109/icsai.2017.8248557 ◽

2017 ◽

Author(s):

Zhihe Yang

Keyword(s):

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Computing Architecture ◽

Performance Computing

Download Full-text

Upgrading a high performance computing environment for massive data processing

Journal of Internet Services and Applications ◽

10.1186/s13174-019-0118-7 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

Lucas M. Ponce ◽

Walter dos Santos ◽

Wagner Meira ◽

Dorgival Guedes ◽

Daniele Lezzi ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

High Performance Computing ◽

High Performance ◽

Data Access ◽

Massive Data ◽

Analysis Tool ◽

Data Framework ◽

Performance Computing ◽

Massive Data Processing

Abstract High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.

Download Full-text