An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE

Author(s):  
Geetha J. ◽  
Uday Bhaskar N ◽  
Chenna Reddy P.

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.

2018 ◽  
Vol 88 ◽  
pp. 693-695 ◽  
Author(s):  
Yulei Wu ◽  
Yang Xiang ◽  
Jingguo Ge ◽  
Peter Muller

2011 ◽  
Vol 16 (4) ◽  
pp. 177-181
Author(s):  
M.O. Alieksieiev ◽  
L.S. Hloba ◽  
K.O. Yermakova ◽  
V.V. Kushnir

The usage of high-performance computing technologies in the areas of scientific and engineering researches is considered. The method of the effective data processing paralleling is described. The using of high-performance computing based on the OpenMP library for solving problems in the field of Telecommunication, e.g. computation of the queues QoS parameters, is also analyzed


2020 ◽  
Vol 71 (3) ◽  
pp. 263-267
Author(s):  
М. Serik ◽  
◽  
G. Zh. Yerlanova ◽  

At present, along with the dynamic development of computer technology in the world, the most effective ways of solving problems of practical importance are being considered. High performance computing takes the lead in this. Therefore, the development of modern society is closely related to the training of experienced, modern specialists in the field of information technology. This, in turn, depends on the inclusion of new courses in the curriculum and full coverage of these issues in the content of the taught courses. This article analyzes the courses on high performance computing, taught at experimental bases and abroad, on the basis of this, the topics of the special course and the content recommended for implementation in the educational process are determined. During the training, the competencies of students in high performance computing were identified.


2012 ◽  
pp. 841-861
Author(s):  
Chao-Tung Yang ◽  
Wen-Chung Shih

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.


Author(s):  
Lucas M. Ponce ◽  
Walter dos Santos ◽  
Wagner Meira ◽  
Dorgival Guedes ◽  
Daniele Lezzi ◽  
...  

Abstract High-performance computing (HPC) and massive data processing (Big Data) are two trends that are beginning to converge. In that process, aspects of hardware architectures, systems support and programming paradigms are being revisited from both perspectives. This paper presents our experience on this path of convergence with the proposal of a framework that addresses some of the programming issues derived from such integration. Our contribution is the development of an integrated environment that integretes (i) COMPSs, a programming framework for the development and execution of parallel applications for distributed infrastructures; (ii) Lemonade, a data mining and analysis tool; and (iii) HDFS, the most widely used distributed file system for Big Data systems. To validate our framework, we used Lemonade to create COMPSs applications that access data through HDFS, and compared them with equivalent applications built with Spark, a popular Big Data framework. The results show that the HDFS integration benefits COMPSs by simplifying data access and by rearranging data transfer, reducing execution time. The integration with Lemonade facilitates COMPSs’s use and may help its popularization in the Data Science community, by providing efficient algorithm implementations for experts from the data domain that want to develop applications with a higher level abstraction.


Sign in / Sign up

Export Citation Format

Share Document