Fast and SNP-aware short read alignment with SALT

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.

Download Full-text

CGAP-Align: A High Performance DNA Short Read Alignment Tool

PLoS ONE ◽

10.1371/journal.pone.0061033 ◽

2013 ◽

Vol 8 (4) ◽

pp. e61033 ◽

Cited By ~ 4

Author(s):

Yaoliang Chen ◽

Ji Hong ◽

Wanyun Cui ◽

Jacques Zaneveld ◽

Wei Wang ◽

...

Keyword(s):

High Performance ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Alignment Tool

Download Full-text

SALT: a fast, memory-efficient and SNP-aware short read alignment tool

2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm47256.2019.8983162 ◽

2019 ◽

Author(s):

Wei Quan ◽

Bo Liu ◽

Yadong Wang

Keyword(s):

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Alignment Tool ◽

Fast Memory ◽

Memory Efficient

Download Full-text

PacRAT: A program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment

10.1101/2021.11.06.467314 ◽

2021 ◽

Author(s):

Chiann-Ling Cindy Yeh ◽

Clara J. Amorosi ◽

Soyeon Showman ◽

Maitreya J. Dunham

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Genetic Variants ◽

Pacbio Sequencing ◽

Multiple Sequence ◽

Read Alignment ◽

Long Reads ◽

Alignment Tool ◽

Variant Alleles

Motivation: Use of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. PacBio sequencing is useful in linking variant alleles in a library with their associated barcode tag. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Results: We developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT. Availability and Implementation: PacRAT is written in Python and is freely available on Github (https://github.com/dunhamlab/PacRAT).

Download Full-text

Fuzzy set intersection based paired-end short-read alignment

10.1101/2021.11.23.469039 ◽

2021 ◽

Author(s):

William J Bolosky ◽

Arun Subramaniyan ◽

Matei Zaharia ◽

Ravi Pandya ◽

Taylor Sittler ◽

...

Keyword(s):

Fuzzy Set ◽

Genetic Material ◽

Genomic Data ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Set Intersection

Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.

Download Full-text

AligneR: A Process-in-Memory Architecture for Short Read Alignment in ReRAMs

IEEE Computer Architecture Letters ◽

10.1109/lca.2018.2854700 ◽

2018 ◽

Vol 17 (2) ◽

pp. 237-240 ◽

Cited By ~ 7

Author(s):

Farzaneh Zokaee ◽

Hamid R. Zarandi ◽

Lei Jiang

Keyword(s):

Memory Architecture ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment

Download Full-text

A SURVEY ON NGS - SHORT READ ALIGNMENT IN HIGH PERFORMANCE COMPUTING

International Journal of Research in Engineering and Technology ◽

10.15623/ijret.2014.0327016 ◽

2014 ◽

Vol 03 (27) ◽

pp. 84-88

Author(s):

G. Raja .

Keyword(s):

High Performance Computing ◽

High Performance ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Performance Computing

Download Full-text

Evolution of Methods for NGS Short Read Alignment and Analysis of the NGS Sequences for Medical Applications

Computer Aided Intervention and Diagnostics in Clinical and Medical Images - Lecture Notes in Computational Vision and Biomechanics ◽

10.1007/978-3-030-04061-1_13 ◽

2019 ◽

pp. 135-142

Author(s):

J. A. M. Rexie ◽

Kumudha Raimond

Keyword(s):

Medical Applications ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment

Download Full-text

DNA short read alignment on apache spark

Applied Computing and Informatics ◽

10.1016/j.aci.2019.04.002 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Maryam AlJame ◽

Imtiaz Ahmad

Keyword(s):

Cluster Computing ◽

Empirical Evaluation ◽

Biological Data ◽

Apache Spark ◽

Short Read ◽

Read Alignment ◽

Short Reads ◽

Short Read Alignment ◽

Alignment Problem ◽

Amazon Web Services

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.

Download Full-text