Multiscale Hemodynamics Using GPU Clusters

AbstractThe parallel implementation of MUPHY, a concurrent multiscale code for large-scale hemodynamic simulations in anatomically realistic geometries, for multi-GPU platforms is presented. Performance tests show excellent results, with a nearly linear parallel speed-up on up to 32GPUs and a more than tenfold GPU/CPU acceleration, all across the range of GPUs. The basic MUPHY scheme combines a hydrokinetic (Lattice Boltzmann) representation of the blood plasma, with a Particle Dynamics treatment of suspended biological bodies, such as red blood cells. To the best of our knowledge, this represents the first effort in the direction of laying down general design principles for multiscale/physics parallel Particle Dynamics applications in non-ideal geometries. This configures the present multi-GPU version of MUPHY as one of the first examples of a high-performance parallel code for multiscale/physics biofluidic applications in realistically complex geometries.

Download Full-text

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

The Journal of Supercomputing ◽

10.1007/s11227-021-04204-6 ◽

2022 ◽

Author(s):

You Fu ◽

Wei Zhou

Keyword(s):

Biological Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Gpu Clusters ◽

Markov Clustering

Download Full-text

RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern Mining

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3488380 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-33

Author(s):

Danlu Liu ◽

Yu Li ◽

William Baskett ◽

Dan Lin ◽

Chi-Ren Shyu

Keyword(s):

High Performance ◽

Large Scale ◽

Pattern Mining ◽

Tree Structure ◽

Research Initiative ◽

Speed Up ◽

Dynamic Tree ◽

Significant Patterns ◽

Search Approach ◽

Hierarchical Pattern

Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems.

Download Full-text

A parallel computational framework for ultra-large-scale sequence clustering analysis

Bioinformatics ◽

10.1093/bioinformatics/bty617 ◽

2018 ◽

Vol 35 (3) ◽

pp. 380-388 ◽

Cited By ~ 2

Author(s):

Wei Zheng ◽

Qi Mao ◽

Robert J Genco ◽

Jean Wactawski-Wende ◽

Michael Buck ◽

...

Keyword(s):

Parallel Computing ◽

High Performance ◽

Large Scale ◽

De Novo ◽

Rapid Development ◽

Operational Taxonomic Unit ◽

Supplementary Information ◽

Computational Framework ◽

Speed Up ◽

Scale Sequence

Abstract Motivation The rapid development of sequencing technology has led to an explosive accumulation of genomic data. Clustering is often the first step to be performed in sequence analysis. However, existing methods scale poorly with respect to the unprecedented growth of input data size. As high-performance computing systems are becoming widely accessible, it is highly desired that a clustering method can easily scale to handle large-scale sequence datasets by leveraging the power of parallel computing. Results In this paper, we introduce SLAD (Separation via Landmark-based Active Divisive clustering), a generic computational framework that can be used to parallelize various de novo operational taxonomic unit (OTU) picking methods and comes with theoretical guarantees on both accuracy and efficiency. The proposed framework was implemented on Apache Spark, which allows for easy and efficient utilization of parallel computing resources. Experiments performed on various datasets demonstrated that SLAD can significantly speed up a number of popular de novo OTU picking methods and meanwhile maintains the same level of accuracy. In particular, the experiment on the Earth Microbiome Project dataset (∼2.2B reads, 437 GB) demonstrated the excellent scalability of the proposed method. Availability and implementation Open-source software for the proposed method is freely available at https://www.acsu.buffalo.edu/~yijunsun/lab/SLAD.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DISSIPATIVE PARTICLE DYNAMICS: INTRODUCTION, METHODOLOGY AND COMPLEX FLUID APPLICATIONS — A REVIEW

International Journal of Applied Mechanics ◽

10.1142/s1758825109000381 ◽

2009 ◽

Vol 01 (04) ◽

pp. 737-763 ◽

Cited By ~ 100

Author(s):

E. MOEENDARBARY ◽

T. Y. NG ◽

M. ZANGENEH

Keyword(s):

High Performance ◽

Dissipative Particle Dynamics ◽

Coarse Graining ◽

Particle Dynamics ◽

Hydrodynamic Behavior ◽

Computational Speed ◽

Speed Up ◽

Complex Fluid ◽

Performance Computing ◽

Dpd Simulation

The dissipative particle dynamics (DPD) technique is a relatively new mesoscale technique which was initially developed to simulate hydrodynamic behavior in mesoscopic complex fluids. It is essentially a particle technique in which molecules are clustered into the said particles, and this coarse graining is a very important aspect of the DPD as it allows significant computational speed-up. This increased computational efficiency, coupled with the recent advent of high performance computing, has subsequently enabled researchers to numerically study a host of complex fluid applications at a refined level. In this review, we trace the developments of various important aspects of the DPD methodology since it was first proposed in the in the early 1990's. In addition, we review notable published works which employed DPD simulation for complex fluid applications.

Download Full-text

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification

Neural Computation ◽

10.1162/neco_a_01245 ◽

2020 ◽

Vol 32 (1) ◽

pp. 182-204 ◽

Cited By ~ 3

Author(s):

Xiping Ju ◽

Biao Fang ◽

Rui Yan ◽

Xiaoliang Xu ◽

Huajin Tang

Keyword(s):

Neural Networks ◽

High Performance ◽

Large Scale ◽

Hardware Architecture ◽

Clock Frequency ◽

Data Set ◽

Speed Up ◽

Fast Classification ◽

Spike Signals ◽

Gpu Implementation

A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41[Formula: see text] speed-up compared to CPU implementation and 22 times lower power than GPU implementation.

Download Full-text

Parallel PPI Prediction Performance Study on HPC Platforms

Journal of Circuits System and Computers ◽

10.1142/s0218126615500747 ◽

2015 ◽

Vol 24 (05) ◽

pp. 1550074 ◽

Cited By ~ 1

Author(s):

Ali A. El-Moursy ◽

Wael S. Afifi ◽

Fadi N. Sibai ◽

Salwa M. Nassar

Keyword(s):

Protein Interactions ◽

Execution Time ◽

High Performance ◽

Large Scale ◽

Parallel Implementation ◽

Prediction Method ◽

Protein Protein Interactions ◽

Performance Study ◽

Ppi Prediction ◽

Performance Computing

STRIKE is an algorithm which predicts protein–protein interactions (PPIs) and determines that proteins interact if they contain similar substrings of amino acids. Unlike other methods for PPI prediction, STRIKE is able to achieve reasonable improvement over the existing PPI prediction methods. Although its high accuracy as a PPI prediction method, STRIKE consumes a large execution time and hence it is considered to be a compute-intensive application. In this paper, we develop and implement a parallel STRIKE algorithm for high-performance computing (HPC) systems. Using a large-scale cluster, the execution time of the parallel implementation of this bioinformatics algorithm was reduced from about a week on a serial uniprocessor machine to about 16.5 h on 16 computing nodes, down to about 2 h on 128 parallel nodes. Communication overheads between nodes are thoroughly studied.

Download Full-text

Remote Sensing Big Data Classification with High Performance Distributed Deep Learning

Remote Sensing ◽

10.3390/rs11243056 ◽

2019 ◽

Vol 11 (24) ◽

pp. 3056 ◽

Cited By ~ 8

Author(s):

Rocco Sedona ◽

Gabriele Cavallaro ◽

Jenia Jitsev ◽

Alexandre Strube ◽

Morris Riedel ◽

...

Keyword(s):

Remote Sensing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Parallel Implementation ◽

Open Data ◽

Test Accuracy ◽

Deep Convolutional Neural Networks ◽

Sensing Applications ◽

Big Data Classification

High-Performance Computing (HPC) has recently been attracting more attention in remote sensing applications due to the challenges posed by the increased amount of open data that are produced daily by Earth Observation (EO) programs. The unique parallel computing environments and programming techniques that are integrated in HPC systems are able to solve large-scale problems such as the training of classification algorithms with large amounts of Remote Sensing (RS) data. This paper shows that the training of state-of-the-art deep Convolutional Neural Networks (CNNs) can be efficiently performed in distributed fashion using parallel implementation techniques on HPC machines containing a large number of Graphics Processing Units (GPUs). The experimental results confirm that distributed training can drastically reduce the amount of time needed to perform full training, resulting in near linear scaling without loss of test accuracy.

Download Full-text

An Efficient Three-Dimensional FNPF Numerical Wave Tank for Large-Scale Wave Basin Experiment Simulation

Journal of Offshore Mechanics and Arctic Engineering ◽

10.1115/1.4007597 ◽

2013 ◽

Vol 135 (2) ◽

Cited By ~ 7

Author(s):

Seshu B. Nimmala ◽

Solomon C. Yim ◽

Stephan T. Grilli

Keyword(s):

High Performance ◽

Large Scale ◽

Parallel Implementation ◽

Mathematical Formulation ◽

Three Dimensional ◽

Numerical Wave Tank ◽

Wave Tank ◽

Wave Basin ◽

Numerical Wave ◽

Computing Platforms

This paper presents a parallel implementation and validation of an accurate and efficient three-dimensional computational model (3D numerical wave tank), based on fully nonlinear potential flow (FNPF) theory, and its extension to incorporate the motion of a laboratory snake piston wavemaker, as well as an absorbing beach, to simulate experiments in a large-scale 3D wave basin. This work is part of a long-term effort to develop a “virtual” computational wave basin to facilitate and complement large-scale physical wave-basin experiments. The code is based on a higher-order boundary-element method combined with a fast multipole algorithm (FMA). Particular efforts were devoted to making the code efficient for large-scale simulations using high-performance computing platforms. The numerical simulation capability can be tailored to serve as an optimization tool at the planning and detailed design stages of large-scale experiments at a specific basin by duplicating its exact physical and algorithmic features. To date, waves that can be generated in the numerical wave tank (NWT) include solitary, cnoidal, and airy waves. In this paper we detail the wave-basin model, mathematical formulation, wave generation, and analyze the performance of the parallelized FNPF-BEM-FMA code as a function of numerical parameters. Experimental or analytical comparisons with NWT results are provided for several cases to assess the accuracy and applicability of the numerical model to practical engineering problems.

Download Full-text