scholarly journals Multiscale Hemodynamics Using GPU Clusters

2012 ◽  
Vol 11 (1) ◽  
pp. 48-64 ◽  
Author(s):  
Mauro Bisson ◽  
Massimo Bernaschi ◽  
Simone Melchionna ◽  
Sauro Succi ◽  
Efthimios Kaxiras

AbstractThe parallel implementation of MUPHY, a concurrent multiscale code for large-scale hemodynamic simulations in anatomically realistic geometries, for multi-GPU platforms is presented. Performance tests show excellent results, with a nearly linear parallel speed-up on up to 32GPUs and a more than tenfold GPU/CPU acceleration, all across the range of GPUs. The basic MUPHY scheme combines a hydrokinetic (Lattice Boltzmann) representation of the blood plasma, with a Particle Dynamics treatment of suspended biological bodies, such as red blood cells. To the best of our knowledge, this represents the first effort in the direction of laying down general design principles for multiscale/physics parallel Particle Dynamics applications in non-ideal geometries. This configures the present multi-GPU version of MUPHY as one of the first examples of a high-performance parallel code for multiscale/physics biofluidic applications in realistically complex geometries.

2022 ◽  
Vol 16 (4) ◽  
pp. 1-33
Author(s):  
Danlu Liu ◽  
Yu Li ◽  
William Baskett ◽  
Dan Lin ◽  
Chi-Ren Shyu

Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems.


2018 ◽  
Vol 35 (3) ◽  
pp. 380-388 ◽  
Author(s):  
Wei Zheng ◽  
Qi Mao ◽  
Robert J Genco ◽  
Jean Wactawski-Wende ◽  
Michael Buck ◽  
...  

Abstract Motivation The rapid development of sequencing technology has led to an explosive accumulation of genomic data. Clustering is often the first step to be performed in sequence analysis. However, existing methods scale poorly with respect to the unprecedented growth of input data size. As high-performance computing systems are becoming widely accessible, it is highly desired that a clustering method can easily scale to handle large-scale sequence datasets by leveraging the power of parallel computing. Results In this paper, we introduce SLAD (Separation via Landmark-based Active Divisive clustering), a generic computational framework that can be used to parallelize various de novo operational taxonomic unit (OTU) picking methods and comes with theoretical guarantees on both accuracy and efficiency. The proposed framework was implemented on Apache Spark, which allows for easy and efficient utilization of parallel computing resources. Experiments performed on various datasets demonstrated that SLAD can significantly speed up a number of popular de novo OTU picking methods and meanwhile maintains the same level of accuracy. In particular, the experiment on the Earth Microbiome Project dataset (∼2.2B reads, 437 GB) demonstrated the excellent scalability of the proposed method. Availability and implementation Open-source software for the proposed method is freely available at https://www.acsu.buffalo.edu/~yijunsun/lab/SLAD.html. Supplementary information Supplementary data are available at Bioinformatics online.


2009 ◽  
Vol 01 (04) ◽  
pp. 737-763 ◽  
Author(s):  
E. MOEENDARBARY ◽  
T. Y. NG ◽  
M. ZANGENEH

The dissipative particle dynamics (DPD) technique is a relatively new mesoscale technique which was initially developed to simulate hydrodynamic behavior in mesoscopic complex fluids. It is essentially a particle technique in which molecules are clustered into the said particles, and this coarse graining is a very important aspect of the DPD as it allows significant computational speed-up. This increased computational efficiency, coupled with the recent advent of high performance computing, has subsequently enabled researchers to numerically study a host of complex fluid applications at a refined level. In this review, we trace the developments of various important aspects of the DPD methodology since it was first proposed in the in the early 1990's. In addition, we review notable published works which employed DPD simulation for complex fluid applications.


2020 ◽  
Vol 32 (1) ◽  
pp. 182-204 ◽  
Author(s):  
Xiping Ju ◽  
Biao Fang ◽  
Rui Yan ◽  
Xiaoliang Xu ◽  
Huajin Tang

A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41[Formula: see text] speed-up compared to CPU implementation and 22 times lower power than GPU implementation.


2015 ◽  
Vol 24 (05) ◽  
pp. 1550074 ◽  
Author(s):  
Ali A. El-Moursy ◽  
Wael S. Afifi ◽  
Fadi N. Sibai ◽  
Salwa M. Nassar

STRIKE is an algorithm which predicts protein–protein interactions (PPIs) and determines that proteins interact if they contain similar substrings of amino acids. Unlike other methods for PPI prediction, STRIKE is able to achieve reasonable improvement over the existing PPI prediction methods. Although its high accuracy as a PPI prediction method, STRIKE consumes a large execution time and hence it is considered to be a compute-intensive application. In this paper, we develop and implement a parallel STRIKE algorithm for high-performance computing (HPC) systems. Using a large-scale cluster, the execution time of the parallel implementation of this bioinformatics algorithm was reduced from about a week on a serial uniprocessor machine to about 16.5 h on 16 computing nodes, down to about 2 h on 128 parallel nodes. Communication overheads between nodes are thoroughly studied.


2019 ◽  
Vol 11 (24) ◽  
pp. 3056 ◽  
Author(s):  
Rocco Sedona ◽  
Gabriele Cavallaro ◽  
Jenia Jitsev ◽  
Alexandre Strube ◽  
Morris Riedel ◽  
...  

High-Performance Computing (HPC) has recently been attracting more attention in remote sensing applications due to the challenges posed by the increased amount of open data that are produced daily by Earth Observation (EO) programs. The unique parallel computing environments and programming techniques that are integrated in HPC systems are able to solve large-scale problems such as the training of classification algorithms with large amounts of Remote Sensing (RS) data. This paper shows that the training of state-of-the-art deep Convolutional Neural Networks (CNNs) can be efficiently performed in distributed fashion using parallel implementation techniques on HPC machines containing a large number of Graphics Processing Units (GPUs). The experimental results confirm that distributed training can drastically reduce the amount of time needed to perform full training, resulting in near linear scaling without loss of test accuracy.


Author(s):  
Seshu B. Nimmala ◽  
Solomon C. Yim ◽  
Stephan T. Grilli

This paper presents a parallel implementation and validation of an accurate and efficient three-dimensional computational model (3D numerical wave tank), based on fully nonlinear potential flow (FNPF) theory, and its extension to incorporate the motion of a laboratory snake piston wavemaker, as well as an absorbing beach, to simulate experiments in a large-scale 3D wave basin. This work is part of a long-term effort to develop a “virtual” computational wave basin to facilitate and complement large-scale physical wave-basin experiments. The code is based on a higher-order boundary-element method combined with a fast multipole algorithm (FMA). Particular efforts were devoted to making the code efficient for large-scale simulations using high-performance computing platforms. The numerical simulation capability can be tailored to serve as an optimization tool at the planning and detailed design stages of large-scale experiments at a specific basin by duplicating its exact physical and algorithmic features. To date, waves that can be generated in the numerical wave tank (NWT) include solitary, cnoidal, and airy waves. In this paper we detail the wave-basin model, mathematical formulation, wave generation, and analyze the performance of the parallelized FNPF-BEM-FMA code as a function of numerical parameters. Experimental or analytical comparisons with NWT results are provided for several cases to assess the accuracy and applicability of the numerical model to practical engineering problems.


2018 ◽  
Vol 46 (6) ◽  
pp. e33-e33 ◽  
Author(s):  
Ariful Azad ◽  
Georgios A Pavlopoulos ◽  
Christos A Ouzounis ◽  
Nikos C Kyrpides ◽  
Aydin Buluç

Sign in / Sign up

Export Citation Format

Share Document