Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis

Large-scale Genome-Wide Association Studies (GWAS) are a Big Data application due to the great amount of data to process and high computation intensity. Furthermore, numerical issues (e.g., floating point underflow) limit the data scale in some applications. Graphics Processors (GPUs) have been used to accelerate genomic data analytics, such as sequence alignment, single-Nucleotide Polymorphism (SNP) detection, and Minor Allele Frequency (MAF) computation. As MAF computation is the most time-consuming task in GWAS, the authors discuss in detail their techniques of accelerating this task using the GPU. They first present a reduction-based algorithm that better matches the GPU’s data-parallelism feature than the original algorithm implemented in the CPU-based tool. Then they implement this algorithm on the GPU efficiently by carefully optimizing local memory utilization and avoiding user-level synchronization. As the MAF computation suffers from floating point underflow, the authors transform the computation to logarithm space. In addition to the MAF computation, they briefly introduce the GPU-accelerated sequence alignment and SNP detection. The experimental results show that the GPU-based GWAS implementations can accelerate state-of-the-art CPU-based tools by up to an order of magnitude.

Download Full-text

SNP detection for massively parallel whole-genome resequencing

Genome Research ◽

10.1101/gr.088013.108 ◽

2009 ◽

Vol 19 (6) ◽

pp. 1124-1132 ◽

Cited By ~ 659

Author(s):

R. Li ◽

Y. Li ◽

X. Fang ◽

H. Yang ◽

J. Wang ◽

...

Keyword(s):

Massively Parallel ◽

Whole Genome ◽

Genome Resequencing ◽

Snp Detection ◽

Whole Genome Resequencing

Download Full-text

Whole-genome resequencing: changing the paradigms of SNP detection, molecular mapping and gene discovery

Molecular Breeding ◽

10.1007/s11032-015-0240-6 ◽

2015 ◽

Vol 35 (1) ◽

Cited By ~ 16

Author(s):

Xiangyang Xu ◽

Guihua Bai

Keyword(s):

Molecular Mapping ◽

Gene Discovery ◽

Whole Genome ◽

Genome Resequencing ◽

Snp Detection ◽

Whole Genome Resequencing

Download Full-text

Accelerating Large-Scale Genome-Wide Association Studies With Graphics Processors

Biotechnology ◽

10.4018/978-1-5225-8903-7.ch017 ◽

2019 ◽

pp. 428-461

Author(s):

Mian Lu ◽

Qiong Luo

Keyword(s):

Sequence Alignment ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Floating Point ◽

Genome Wide Association Studies ◽

Graphics Processors ◽

Snp Detection ◽

Genome Wide ◽

Order Of Magnitude

Large-scale Genome-Wide Association Studies (GWAS) are a Big Data application due to the great amount of data to process and high computation intensity. Furthermore, numerical issues (e.g., floating point underflow) limit the data scale in some applications. Graphics Processors (GPUs) have been used to accelerate genomic data analytics, such as sequence alignment, single-Nucleotide Polymorphism (SNP) detection, and Minor Allele Frequency (MAF) computation. As MAF computation is the most time-consuming task in GWAS, the authors discuss in detail their techniques of accelerating this task using the GPU. They first present a reduction-based algorithm that better matches the GPU's data-parallelism feature than the original algorithm implemented in the CPU-based tool. Then they implement this algorithm on the GPU efficiently by carefully optimizing local memory utilization and avoiding user-level synchronization. As the MAF computation suffers from floating point underflow, the authors transform the computation to logarithm space. In addition to the MAF computation, they briefly introduce the GPU-accelerated sequence alignment and SNP detection. The experimental results show that the GPU-based GWAS implementations can accelerate state-of-the-art CPU-based tools by up to an order of magnitude.

Download Full-text

SNP detection and prediction of variability between chicken lines using genome resequencing of DNA pools

BMC Genomics ◽

10.1186/1471-2164-11-665 ◽

2010 ◽

Vol 11 (1) ◽

pp. 665 ◽

Cited By ~ 11

Author(s):

Stefan Marklund ◽

Örjan Carlborg

Keyword(s):

Genome Resequencing ◽

Snp Detection ◽

Dna Pools

Download Full-text

Sequence Alignment

10.1525/9780520943742 ◽

2009 ◽

Cited By ~ 6

Keyword(s):

Sequence Alignment

Download Full-text

Multiple Sequence Alignment and Profile Analysis of Protein Family Utsing Hidden Markov Model

International Journal of Scientific Research ◽

10.15373/22778179/june2013/66 ◽

2012 ◽

Vol 2 (6) ◽

pp. 208-211

Author(s):

Navjot Kaur ◽

◽

Rajbir Singh Cheema ◽

Harmandeep Singh Harmandeep Singh

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Profile Analysis ◽

Hidden Markov ◽

Protein Family ◽

Multiple Sequence

Download Full-text

Faculty Opinions recommendation of MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.731078852.793536612 ◽

2017 ◽

Author(s):

Feng Gao

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Online Service ◽

Multiple Sequence

Download Full-text

SNP Detection and Analysis of Genes for Flavonoid Pathway in Yellow- and Black-SeededBrassica napusL.

ACTA AGRONOMICA SINICA ◽

10.3724/sp.j.1006.2014.01914 ◽

2014 ◽

Vol 40 (11) ◽

pp. 1914

Author(s):

Cun-Min QU ◽

Kun LU ◽

Shui-Yan LIU ◽

Hai-Dong BU ◽

Fu-You FU ◽

...

Keyword(s):

Flavonoid Pathway ◽

Snp Detection

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text