Accelerating Large-Scale Genome-Wide Association Studies with Graphics Processors
Large-scale Genome-Wide Association Studies (GWAS) are a Big Data application due to the great amount of data to process and high computation intensity. Furthermore, numerical issues (e.g., floating point underflow) limit the data scale in some applications. Graphics Processors (GPUs) have been used to accelerate genomic data analytics, such as sequence alignment, single-Nucleotide Polymorphism (SNP) detection, and Minor Allele Frequency (MAF) computation. As MAF computation is the most time-consuming task in GWAS, the authors discuss in detail their techniques of accelerating this task using the GPU. They first present a reduction-based algorithm that better matches the GPU’s data-parallelism feature than the original algorithm implemented in the CPU-based tool. Then they implement this algorithm on the GPU efficiently by carefully optimizing local memory utilization and avoiding user-level synchronization. As the MAF computation suffers from floating point underflow, the authors transform the computation to logarithm space. In addition to the MAF computation, they briefly introduce the GPU-accelerated sequence alignment and SNP detection. The experimental results show that the GPU-based GWAS implementations can accelerate state-of-the-art CPU-based tools by up to an order of magnitude.