scholarly journals Fast and SNP-aware short read alignment with SALT

2021 ◽  
Vol 22 (S9) ◽  
Author(s):  
Wei Quan ◽  
Bo Liu ◽  
Yadong Wang

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.

PLoS ONE ◽  
2013 ◽  
Vol 8 (4) ◽  
pp. e61033 ◽  
Author(s):  
Yaoliang Chen ◽  
Ji Hong ◽  
Wanyun Cui ◽  
Jacques Zaneveld ◽  
Wei Wang ◽  
...  

2021 ◽  
Author(s):  
Chiann-Ling Cindy Yeh ◽  
Clara J. Amorosi ◽  
Soyeon Showman ◽  
Maitreya J. Dunham

Motivation: Use of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. PacBio sequencing is useful in linking variant alleles in a library with their associated barcode tag. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Results: We developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT. Availability and Implementation: PacRAT is written in Python and is freely available on Github (https://github.com/dunhamlab/PacRAT).


2021 ◽  
Author(s):  
William J Bolosky ◽  
Arun Subramaniyan ◽  
Matei Zaharia ◽  
Ravi Pandya ◽  
Taylor Sittler ◽  
...  

Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.


2018 ◽  
Vol 17 (2) ◽  
pp. 237-240 ◽  
Author(s):  
Farzaneh Zokaee ◽  
Hamid R. Zarandi ◽  
Lei Jiang

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Maryam AlJame ◽  
Imtiaz Ahmad

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.


2011 ◽  
Vol 27 (10) ◽  
pp. 1351-1358 ◽  
Author(s):  
Jochen Blom ◽  
Tobias Jakobi ◽  
Daniel Doppmeier ◽  
Sebastian Jaenicke ◽  
Jörn Kalinowski ◽  
...  

Author(s):  
James Arram ◽  
Thomas Kaplan ◽  
Wayne Luk ◽  
Peiyong Jiang

Sign in / Sign up

Export Citation Format

Share Document