Arioc: High-concurrency short-read alignment on multiple GPUs

In large DNA sequence repositories, archival data storage is often coupled with computers that provide 40 or more CPU threads and multiple GPU (general-purpose graphics processing unit) devices. This presents an opportunity for DNA sequence alignment software to exploit high-concurrency hardware to generate short-read alignments at high speed. Arioc, a GPU-accelerated short-read aligner, can compute WGS (whole-genome sequencing) alignments ten times faster than comparable CPU-only alignment software. When two or more GPUs are available, Arioc's speed increases proportionately because the software executes concurrently on each available GPU device. We have adapted Arioc to recent multi-GPU hardware architectures that support high-bandwidth peer-to-peer memory accesses among multiple GPUs. By modifying Arioc's implementation to exploit this GPU memory architecture we obtained a further 1.8x-2.9x increase in overall alignment speeds. With this additional acceleration, Arioc computes two million short-read alignments per second in a four-GPU system; it can align the reads from a human WGS sequencer run–over 500 million 150nt paired-end reads–in less than 15 minutes. As WGS data accumulates exponentially and high-concurrency computational resources become widespread, Arioc addresses a growing need for timely computation in the short-read data analysis toolchain.

Download Full-text

GPU-accelerated alignment of bisulfite-treated short-read sequences

10.1101/175729 ◽

2017 ◽

Author(s):

Richard Wilton ◽

Xin Li ◽

Andrew P. Feinberg ◽

Alexander S. Szalay

Keyword(s):

Dna Sequences ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Short Read ◽

Wide Range ◽

Programming Logic ◽

Short Read Aligner ◽

Graphics Processing ◽

Better Than

AbstractThe alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can potentially be addressed by appropriate software-engineering and algorithmic improvements. One strategy is to integrate this additional programming logic into the read-alignment implementation in a way that the software becomes amenable to optimizations that lead to both higher speed and greater sensitivity than can be achieved without this integration.We have evaluated this approach using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by the most widely used BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.

Download Full-text

Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming

Bioinformatics ◽

10.1093/bioinformatics/btr151 ◽

2011 ◽

Vol 27 (10) ◽

pp. 1351-1358 ◽

Cited By ~ 67

Author(s):

Jochen Blom ◽

Tobias Jakobi ◽

Daniel Doppmeier ◽

Sebastian Jaenicke ◽

Jörn Kalinowski ◽

...

Keyword(s):

Graphics Processing Unit ◽

Processing Unit ◽

Short Read ◽

Microbial Genomes ◽

Read Alignment ◽

Short Read Alignment ◽

Graphics Processing

Download Full-text

Practical Implementation of Prestack Kirchhoff Time Migration on a General Purpose Graphics Processing Unit

Acta Geophysica ◽

10.1515/acgeo-2016-0033 ◽

2016 ◽

Vol 64 (4) ◽

pp. 1051-1063 ◽

Cited By ~ 2

Author(s):

Guofeng Liu ◽

Chun Li

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Practical Implementation ◽

Processing Unit ◽

Time Migration ◽

Graphics Processing

Download Full-text

Comprehensive regression-based model to predict performance of general-purpose graphics processing unit

Cluster Computing ◽

10.1007/s10586-019-03011-2 ◽

2019 ◽

Vol 23 (2) ◽

pp. 1505-1516 ◽

Cited By ~ 2

Author(s):

Mohammad Hossein Shafiabadi ◽

Hossein Pedram ◽

Midia Reshadi ◽

Akram Reza

Keyword(s):

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Graphics Processing

Download Full-text

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Information ◽

10.3390/info11040193 ◽

2020 ◽

Vol 11 (4) ◽

pp. 193 ◽

Cited By ~ 7

Author(s):

Sebastian Raschka ◽

Joshua Patterson ◽

Corey Nolet

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Science ◽

Gpu Computing ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

The Core ◽

Critical Components ◽

High Level

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

Download Full-text

AN EVALUATION OF MULTIPLE FEED-FORWARD NETWORKS ON GPUs

International Journal of Neural Systems ◽

10.1142/s0129065711002638 ◽

2011 ◽

Vol 21 (01) ◽

pp. 31-47 ◽

Cited By ~ 14

Author(s):

NOEL LOPES ◽

BERNARDETE RIBEIRO

Keyword(s):

Graphics Processing Unit ◽

Parallel Implementation ◽

Low Cost ◽

Back Propagation ◽

General Purpose ◽

Training System ◽

Graphics Hardware ◽

Processing Unit ◽

Data Parallel ◽

Graphics Processing

The Graphics Processing Unit (GPU) originally designed for rendering graphics and which is difficult to program for other tasks, has since evolved into a device suitable for general-purpose computations. As a result graphics hardware has become progressively more attractive yielding unprecedented performance at a relatively low cost. Thus, it is the ideal candidate to accelerate a wide variety of data parallel tasks in many fields such as in Machine Learning (ML). As problems become more and more demanding, parallel implementations of learning algorithms are crucial for a useful application. In particular, the implementation of Neural Networks (NNs) in GPUs can significantly reduce the long training times during the learning process. In this paper we present a GPU parallel implementation of the Back-Propagation (BP) and Multiple Back-Propagation (MBP) algorithms, and describe the GPU kernels needed for this task. The results obtained on well-known benchmarks show faster training times and improved performances as compared to the implementation in traditional hardware, due to maximized floating-point throughput and memory bandwidth. Moreover, a preliminary GPU based Autonomous Training System (ATS) is developed which aims at automatically finding high-quality NNs-based solutions for a given problem.

Download Full-text

New BSP/CGM algorithms for spanning trees

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018803672 ◽

2018 ◽

Vol 33 (3) ◽

pp. 444-461

Author(s):

Jucele França de Alencar Vasconcellos ◽

Edson Norberto Cáceres ◽

Henrique Mongelli ◽

Siang Wun Song ◽

Frank Dehne ◽

...

Keyword(s):

Parallel Algorithms ◽

Parallel Machines ◽

Spanning Trees ◽

Graphics Processing Unit ◽

Computation Time ◽

General Purpose ◽

Coarse Grained ◽

Processing Unit ◽

List Ranking ◽

Bulk Synchronous Parallel

Computing a spanning tree (ST) and a minimum ST (MST) of a graph are fundamental problems in graph theory and arise as a subproblem in many applications. In this article, we propose parallel algorithms to these problems. One of the steps of previous parallel MST algorithms relies on the heavy use of parallel list ranking which, though efficient in theory, is very time-consuming in practice. Using a different approach with a graph decomposition, we devised new parallel algorithms that do not make use of the list ranking procedure. We proved that our algorithms are correct, and for a graph [Formula: see text], [Formula: see text], and [Formula: see text], the algorithms can be executed on a Bulk Synchronous Parallel/Coarse Grained Multicomputer (BSP/CGM) model using [Formula: see text] communications rounds with [Formula: see text] computation time for each round. To show that our algorithms have good performance on real parallel machines, we have implemented them on graphics processing unit. The obtained speedups are competitive and showed that the BSP/CGM model is suitable for designing general purpose parallel algorithms.

Download Full-text

Modeling and Computing Overlapping Aggregation of Large Data Sequences in Geographic Information Systems

International Journal of Information System Modeling and Design ◽

10.4018/ijismd.2019010102 ◽

2019 ◽

Vol 10 (1) ◽

pp. 20-41

Author(s):

Driss En-Nejjary ◽

Francois Pinet ◽

Myoung-Ah Kang

Keyword(s):

Information Systems ◽

Data Transfer ◽

Graphics Processing Unit ◽

Large Data ◽

General Purpose ◽

Environmental Data ◽

Acquisition Time ◽

Processing Unit ◽

Sequential Method ◽

Transfer Cost

Recently, in the field of information systems, the acquisition of geo-referenced data has made a huge leap forward in terms of technology. There is a real issue in terms of the data processing optimization, and different research works have been proposed to analyze large geo-referenced datasets based on multi-core approaches. In this article, different methods based on general-purpose logic on graphics processing unit (GPGPU) are modelled and compared to parallelize overlapping aggregations of raster sequences. Our methods are tested on a sequence of rasters representing the evolution of temperature over time for the same region. Each raster corresponds to a different data acquisition time period, and each raster geo-referenced cell is associated with a temperature value. This article proposes optimized methods to calculate the average temperature for the region for all the possible raster subsequences of a determined length, i.e., to calculate overlapping aggregated data summaries. In these aggregations, the same subsets of values are aggregated several times. For example, this type of aggregation can be useful in different environmental data analyses, e.g., to pre-calculate all the average temperatures in a database. The present article highlights a significant increase in performance and shows that the use of GPGPU parallel processing enabled us to run the aggregations up to more than 50 times faster than the sequential method including data transfer cost and more than 200 times faster without data transfer cost.

Download Full-text

IMAGE REGISTRATION AND FUSION SYSTEM BASED ON GPU

Journal of Circuits System and Computers ◽

10.1142/s0218126610006049 ◽

2010 ◽

Vol 19 (01) ◽

pp. 173-189

Author(s):

SEUNG-HUN YOO ◽

CHANG-SUNG JEONG

Keyword(s):

Visual Information ◽

High Speed ◽

Graphics Processing Unit ◽

Aerial Images ◽

Processing Unit ◽

Infrared Sensors ◽

Fusion Algorithm ◽

Registration Method ◽

Multi Scale ◽

Graphics Processing

Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.

Download Full-text