Data Movement in Data-Intensive High Performance Computing

High Speed ◽

High Performance ◽

Cluster Computing ◽

Structural Features ◽

Performance Technology ◽

Data Intensive ◽

Computing Platforms ◽

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.

Decoupled I/O for Data-Intensive High Performance Computing

2014 43rd International Conference on Parallel Processing Workshops ◽

10.1109/icppw.2014.48 ◽

2014 ◽

Cited By ~ 2

Author(s):

Chao Chen ◽

Yong Chen ◽

Kun Feng ◽

Yanlong Yin ◽

Hassan Eslami ◽

...

Keyword(s):

High Performance ◽

Data Intensive ◽

International Journal of Information Communication Technologies and Human Development ◽

An Analytical Approach for Optimizing the Performance of Hadoop Map Reduce Over RoCE

10.4018/ijicthd.2018040101 ◽

2018 ◽

Vol 10 (2) ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Geetha J. ◽

Uday Bhaskar N ◽

Chenna Reddy P.

Keyword(s):

Data Processing ◽

High Performance ◽

Direct Memory Access ◽

Performance Measure ◽

Data Intensive ◽

The World ◽

Mapreduce Paradigm ◽

Mathematical Derivation ◽

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.

FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems

2014 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2014.7004214 ◽

2014 ◽

Cited By ~ 55

Author(s):

Dongfang Zhao ◽

Zhao Zhang ◽

Xiaobing Zhou ◽

Tonglin Li ◽

Ke Wang ◽

...

Keyword(s):

High Performance ◽

Scientific Applications ◽

Computing Systems ◽

Data Intensive ◽

Extreme Scale ◽

2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

Optimization of data-intensive next generation sequencing in high performance computing

10.1109/bibe.2015.7367654 ◽

2015 ◽

Cited By ~ 1

Author(s):

Nagarajan Kathiresan ◽

Rashid Al-Ali ◽

Puthen V. Jithesh ◽

Tariq AbuZaid ◽

Ramzi Temanni ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Performance ◽

Next Generation ◽

Data Intensive ◽

Performance Computing ◽

Generation Sequencing

On Construction of Cluster and Grid Computing Platforms for Parallel Bioinformatics Applications

Applications and Developments in Grid, Cloud, and High Performance Computing ◽

10.4018/978-1-4666-2065-0.ch019 ◽

2013 ◽

pp. 286-306

Author(s):

Chao-Tung Yang ◽

Wen-Chung Shih

Keyword(s):

Grid Computing ◽

High Speed ◽

High Performance ◽

Cluster Computing ◽

Structural Features ◽

Performance Technology ◽

Data Intensive ◽

Computing Platforms ◽

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.

High Performance Computing in Satellite SAR Interferometry: A Critical Perspective

Remote Sensing ◽

10.3390/rs13234756 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4756

Author(s):

Pasquale Imperatore ◽

Antonio Pepe ◽

Eugenio Sansosti

Keyword(s):

High Performance ◽

Large Scale ◽

Sar Interferometry ◽

Critical Perspective ◽

Data Intensive ◽

Computationally Intensive ◽

Interferometric Sar ◽

Performance Computing ◽

Sar Processing

Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can be considered today as a mature technology, which incorporates computationally intensive and data-intensive tasks. In this paper, a perspective on the state-of-the-art of high performance computing (HPC) methodologies applied to spaceborne SAR interferometry (InSAR) is presented, and the different parallel algorithms for interferometric processing of SAR data are critically discussed at different levels. Emphasis is placed on the key processing steps, which typically occur in the interferometric techniques, categorized according to their computational relevance. Existing implementations of the different InSAR stages using diverse parallel strategies and architectures are examined and their performance discussed. Furthermore, some InSAR computational schemes selected in the literature are analyzed at the level of the entire processing chain, thus emphasizing their potentialities and limitations. Therefore, the survey focuses on the inherent computational approaches enabling large-scale interferometric SAR processing, thus offering insight into some open issues, and outlining future trends in the field.