BWA-MEME: BWA-MEM emulated with a machine learning approach

Mapping Intimacies ◽

10.1101/2021.09.01.457579 ◽

2021 ◽

Author(s):

Youngmok Jung ◽

Dongsu Han

Keyword(s):

Search Algorithm ◽

Search Problem ◽

Learning Approach ◽

Exact Match ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Machine Learning Approach ◽

Memory Accesses ◽

Generation Sequencing

The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses. This paper presents BWA-MEME, the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45x speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60x, memory accesses by 8.77x, and LLC misses by 2.21x, while ensuring the identical SAM output to BWA-MEM2.

Download Full-text

Fuzzy set intersection based paired-end short-read alignment

10.1101/2021.11.23.469039 ◽

2021 ◽

Author(s):

William J Bolosky ◽

Arun Subramaniyan ◽

Matei Zaharia ◽

Ravi Pandya ◽

Taylor Sittler ◽

...

Keyword(s):

Fuzzy Set ◽

Genetic Material ◽

Genomic Data ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Set Intersection

Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.

Download Full-text

AligneR: A Process-in-Memory Architecture for Short Read Alignment in ReRAMs

IEEE Computer Architecture Letters ◽

10.1109/lca.2018.2854700 ◽

2018 ◽

Vol 17 (2) ◽

pp. 237-240 ◽

Cited By ~ 7

Author(s):

Farzaneh Zokaee ◽

Hamid R. Zarandi ◽

Lei Jiang

Keyword(s):

Memory Architecture ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment

Download Full-text

A SURVEY ON NGS - SHORT READ ALIGNMENT IN HIGH PERFORMANCE COMPUTING

International Journal of Research in Engineering and Technology ◽

10.15623/ijret.2014.0327016 ◽

2014 ◽

Vol 03 (27) ◽

pp. 84-88

Author(s):

G. Raja .

Keyword(s):

High Performance Computing ◽

High Performance ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Performance Computing

Download Full-text

Evolution of Methods for NGS Short Read Alignment and Analysis of the NGS Sequences for Medical Applications

Computer Aided Intervention and Diagnostics in Clinical and Medical Images - Lecture Notes in Computational Vision and Biomechanics ◽

10.1007/978-3-030-04061-1_13 ◽

2019 ◽

pp. 135-142

Author(s):

J. A. M. Rexie ◽

Kumudha Raimond

Keyword(s):

Medical Applications ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment

Download Full-text

DNA short read alignment on apache spark

Applied Computing and Informatics ◽

10.1016/j.aci.2019.04.002 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Maryam AlJame ◽

Imtiaz Ahmad

Keyword(s):

Cluster Computing ◽

Empirical Evaluation ◽

Biological Data ◽

Apache Spark ◽

Short Read ◽

Read Alignment ◽

Short Reads ◽

Short Read Alignment ◽

Alignment Problem ◽

Amazon Web Services

The evolution of technologies has unleashed a wealth of challenges by generating massive amount of data. Recently, biological data has increased exponentially, which has introduced several computational challenges. DNA short read alignment is an important problem in bioinformatics. The exponential growth in the number of short reads has increased the need for an ideal platform to accelerate the alignment process. Apache Spark is a cluster-computing framework that involves data parallelism and fault tolerance. In this article, we proposed a Spark-based algorithm to accelerate DNA short reads alignment problem, and it is called Spark-DNAligning. Spark-DNAligning exploits Apache Spark ’s performance optimizations such as broadcast variable, join after partitioning, caching, and in-memory computations. Spark-DNAligning is evaluated in term of performance by comparing it with SparkBWA tool and a MapReduce based algorithm called CloudBurst. All the experiments are conducted on Amazon Web Services (AWS). Results demonstrate that Spark-DNAligning outperforms both tools by providing a speedup in the range of 101–702 in aligning gigabytes of short reads to the human genome. Empirical evaluation reveals that Apache Spark offers promising solutions to DNA short reads alignment problem.

Download Full-text