scholarly journals Arioc: High-concurrency short-read alignment on multiple GPUs

2020 ◽  
Vol 16 (11) ◽  
pp. e1008383
Author(s):  
Richard Wilton ◽  
Alexander S. Szalay

In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software. When two or more GPUs are available, Arioc's speed increases proportionately because the software executes concurrently on each available GPU device. We have adapted Arioc to recent multi-GPU hardware architectures that support high-bandwidth peer-to-peer memory accesses among multiple GPUs. By modifying Arioc's implementation to exploit this GPU memory architecture we obtained a further 1.8x-2.9x increase in overall alignment speeds. With this additional acceleration, Arioc computes two million short-read alignments per second in a four-GPU system; it can align the reads from a human WGS sequencer run–over 500 million 150nt paired-end reads–in less than 15 minutes. As WGS data accumulates exponentially and high-concurrency computational resources become widespread, Arioc addresses a growing need for timely computation in the short-read data analysis toolchain.

2017 ◽  
Author(s):  
Richard Wilton ◽  
Xin Li ◽  
Andrew P. Feinberg ◽  
Alexander S. Szalay

AbstractThe alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can potentially be addressed by appropriate software-engineering and algorithmic improvements. One strategy is to integrate this additional programming logic into the read-alignment implementation in a way that the software becomes amenable to optimizations that lead to both higher speed and greater sensitivity than can be achieved without this integration.We have evaluated this approach using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by the most widely used BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.


2011 ◽  
Vol 27 (10) ◽  
pp. 1351-1358 ◽  
Author(s):  
Jochen Blom ◽  
Tobias Jakobi ◽  
Daniel Doppmeier ◽  
Sebastian Jaenicke ◽  
Jörn Kalinowski ◽  
...  

2019 ◽  
Vol 23 (2) ◽  
pp. 1505-1516 ◽  
Author(s):  
Mohammad Hossein Shafiabadi ◽  
Hossein Pedram ◽  
Midia Reshadi ◽  
Akram Reza

Information ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 193 ◽  
Author(s):  
Sebastian Raschka ◽  
Joshua Patterson ◽  
Corey Nolet

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.


2011 ◽  
Vol 21 (01) ◽  
pp. 31-47 ◽  
Author(s):  
NOEL LOPES ◽  
BERNARDETE RIBEIRO

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.


Author(s):  
Jucele França de Alencar Vasconcellos ◽  
Edson Norberto Cáceres ◽  
Henrique Mongelli ◽  
Siang Wun Song ◽  
Frank Dehne ◽  
...  

Computing a spanning tree (ST) and a minimum ST (MST) of a graph are fundamental problems in graph theory and arise as a subproblem in many applications. In this article, we propose parallel algorithms to these problems. One of the steps of previous parallel MST algorithms relies on the heavy use of parallel list ranking which, though efficient in theory, is very time-consuming in practice. Using a different approach with a graph decomposition, we devised new parallel algorithms that do not make use of the list ranking procedure. We proved that our algorithms are correct, and for a graph [Formula: see text], [Formula: see text], and [Formula: see text], the algorithms can be executed on a Bulk Synchronous Parallel/Coarse Grained Multicomputer (BSP/CGM) model using [Formula: see text] communications rounds with [Formula: see text] computation time for each round. To show that our algorithms have good performance on real parallel machines, we have implemented them on graphics processing unit. The obtained speedups are competitive and showed that the BSP/CGM model is suitable for designing general purpose parallel algorithms.


Author(s):  
Driss En-Nejjary ◽  
Francois Pinet ◽  
Myoung-Ah Kang

Recently, in the field of information systems, the acquisition of geo-referenced data has made a huge leap forward in terms of technology. There is a real issue in terms of the data processing optimization, and different research works have been proposed to analyze large geo-referenced datasets based on multi-core approaches. In this article, different methods based on general-purpose logic on graphics processing unit (GPGPU) are modelled and compared to parallelize overlapping aggregations of raster sequences. Our methods are tested on a sequence of rasters representing the evolution of temperature over time for the same region. Each raster corresponds to a different data acquisition time period, and each raster geo-referenced cell is associated with a temperature value. This article proposes optimized methods to calculate the average temperature for the region for all the possible raster subsequences of a determined length, i.e., to calculate overlapping aggregated data summaries. In these aggregations, the same subsets of values are aggregated several times. For example, this type of aggregation can be useful in different environmental data analyses, e.g., to pre-calculate all the average temperatures in a database. The present article highlights a significant increase in performance and shows that the use of GPGPU parallel processing enabled us to run the aggregations up to more than 50 times faster than the sequential method including data transfer cost and more than 200 times faster without data transfer cost.


2010 ◽  
Vol 19 (01) ◽  
pp. 173-189
Author(s):  
SEUNG-HUN YOO ◽  
CHANG-SUNG JEONG

Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.


Sign in / Sign up

Export Citation Format

Share Document