Data Movement in Data-Intensive High Performance Computing

Author(s):  
Pietro Cicotti ◽  
Sarp Oral ◽  
Gokcen Kestor ◽  
Roberto Gioiosa ◽  
Shawn Strande ◽  
...  
2012 ◽  
pp. 841-861
Author(s):  
Chao-Tung Yang ◽  
Wen-Chung Shih

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.


Author(s):  
Geetha J. ◽  
Uday Bhaskar N ◽  
Chenna Reddy P.

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.


Author(s):  
Chao-Tung Yang ◽  
Wen-Chung Shih

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.


2021 ◽  
Vol 13 (23) ◽  
pp. 4756
Author(s):  
Pasquale Imperatore ◽  
Antonio Pepe ◽  
Eugenio Sansosti

Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can be considered today as a mature technology, which incorporates computationally intensive and data-intensive tasks. In this paper, a perspective on the state-of-the-art of high performance computing (HPC) methodologies applied to spaceborne SAR interferometry (InSAR) is presented, and the different parallel algorithms for interferometric processing of SAR data are critically discussed at different levels. Emphasis is placed on the key processing steps, which typically occur in the interferometric techniques, categorized according to their computational relevance. Existing implementations of the different InSAR stages using diverse parallel strategies and architectures are examined and their performance discussed. Furthermore, some InSAR computational schemes selected in the literature are analyzed at the level of the entire processing chain, thus emphasizing their potentialities and limitations. Therefore, the survey focuses on the inherent computational approaches enabling large-scale interferometric SAR processing, thus offering insight into some open issues, and outlining future trends in the field.


Sign in / Sign up

Export Citation Format

Share Document