read alignment Latest Research Papers

Abstract Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.

Download Full-text

Fuzzy set intersection based paired-end short-read alignment

10.1101/2021.11.23.469039 ◽

2021 ◽

Author(s):

William J Bolosky ◽

Arun Subramaniyan ◽

Matei Zaharia ◽

Ravi Pandya ◽

Taylor Sittler ◽

...

Keyword(s):

Fuzzy Set ◽

Genetic Material ◽

Genomic Data ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Set Intersection

Much genomic data comes in the form of paired-end reads: two reads that represent genetic material with a small gap between. We present a new algorithm for aligning both reads in a pair simultaneously by fuzzily intersecting the sets of candidate alignment locations for each read. This algorithm is often much faster and produces alignments that result in variant calls having roughly the same concordance as the best competing aligners.

Download Full-text

PacRAT: A program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment

10.1101/2021.11.06.467314 ◽

2021 ◽

Author(s):

Chiann-Ling Cindy Yeh ◽

Clara J. Amorosi ◽

Soyeon Showman ◽

Maitreya J. Dunham

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Genetic Variants ◽

Pacbio Sequencing ◽

Multiple Sequence ◽

Read Alignment ◽

Long Reads ◽

Alignment Tool ◽

Variant Alleles

Motivation: Use of PacBio sequencing for characterizing barcoded libraries of genetic variants is on the rise. PacBio sequencing is useful in linking variant alleles in a library with their associated barcode tag. However, current approaches in resolving PacBio sequencing artifacts can result in a high number of incorrectly identified or unusable reads. Results: We developed a PacBio Read Alignment Tool (PacRAT) that improves the accuracy of barcode-variant mapping through several steps of read alignment and consensus calling. To quantify the performance of our approach, we simulated PacBio reads from eight variant libraries of various lengths and showed that PacRAT improves the accuracy in pairing barcodes and variants across these libraries. Analysis of real (non-simulated) libraries also showed an increase in the number of reads that can be used for downstream analyses when using PacRAT. Availability and Implementation: PacRAT is written in Python and is freely available on Github (https://github.com/dunhamlab/PacRAT).

Download Full-text

Polypolish: short-read polishing of long-read bacterial genome assemblies

10.1101/2021.10.14.464465 ◽

2021 ◽

Author(s):

Ryan R Wick ◽

Kathryn E Holt

Keyword(s):

Bacterial Genome ◽

Short Read ◽

Read Alignment ◽

Short Reads ◽

Repeat Sequences ◽

Short Read Alignment ◽

Long Read ◽

Genome Assemblies ◽

Residual Errors

Long-read-only bacterial genome assemblies usually contain residual errors, most commonly homopolymer-length errors. Short-read polishing tools can use short reads to fix these errors, but most rely on short-read alignment which is unreliable in repeat regions. Errors in such regions are therefore challenging to fix and often remain after short-read polishing. Here we introduce Polypolish, a new short-read polisher which uses all-per-read alignments to repair errors in repeat sequences that other polishers cannot. In benchmarking tests using both simulated and real reads, we find that Polypolish performs well, and the best results are achieved by using Polypolish in combination with other short-read polishers.

Download Full-text

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

Nucleic Acids Research ◽

10.1093/nar/gkab824 ◽

2021 ◽

Author(s):

Kaihao Tang ◽

Weiquan Wang ◽

Yamin Sun ◽

Yiqing Zhou ◽

Pengxia Wang ◽

...

Keyword(s):

Performance Testing ◽

Sequencing Data ◽

Associated Bacteria ◽

Read Alignment ◽

Phage Gene ◽

Short Read Sequencing ◽

Split Read ◽

Prokaryotic Genomes ◽

Mining Tool ◽

Gene Similarity

Abstract The life cycle of temperate phages includes a lysogenic cycle stage when the phage integrates into the host genome and becomes a prophage. However, the identification of prophages that are highly divergent from known phages remains challenging. In this study, by taking advantage of the lysis-lysogeny switch of temperate phages, we designed Prophage Tracer, a tool for recognizing active prophages in prokaryotic genomes using short-read sequencing data, independent of phage gene similarity searching. Prophage Tracer uses the criterion of overlapping split-read alignment to recognize discriminative reads that contain bacterial (attB) and phage (attP) att sites representing prophage excision signals. Performance testing showed that Prophage Tracer could predict known prophages with precise boundaries, as well as novel prophages. Two novel prophages, dsDNA and ssDNA, encoding highly divergent major capsid proteins, were identified in coral-associated bacteria. Prophage Tracer is a reliable data mining tool for the identification of novel temperate phages and mobile genetic elements. The code for the Prophage Tracer is publicly available at https://github.com/WangLab-SCSIO/Prophage_Tracer.

Download Full-text

BWA-MEME: BWA-MEM emulated with a machine learning approach

10.1101/2021.09.01.457579 ◽

2021 ◽

Author(s):

Youngmok Jung ◽

Dongsu Han

Keyword(s):

Search Algorithm ◽

Search Problem ◽

Learning Approach ◽

Exact Match ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Machine Learning Approach ◽

Memory Accesses ◽

Generation Sequencing

The growing use of next-generation sequencing and enlarged sequencing throughput require efficient short-read alignment, where seeding is one of the major performance bottlenecks. The key challenge in the seeding phase is searching for exact matches of substrings of short reads in the reference DNA sequence. Existing algorithms, however, present limitations in performance due to their frequent memory accesses. This paper presents BWA-MEME, the first full-fledged short read alignment software that leverages learned indices for solving the exact match search problem for efficient seeding. BWA-MEME is a practical and efficient seeding algorithm based on a suffix array search algorithm that solves the challenges in utilizing learned indices for SMEM search which is extensively used in the seeding phase. Our evaluation shows that BWA-MEME achieves up to 3.45x speedup in seeding throughput over BWA-MEM2 by reducing the number of instructions by 4.60x, memory accesses by 8.77x, and LLC misses by 2.21x, while ensuring the identical SAM output to BWA-MEM2.

Download Full-text

Technology dictates algorithms: recent developments in read alignment

Genome Biology ◽

10.1186/s13059-021-02443-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mohammed Alser ◽

Jeremy Rotman ◽

Dhrithi Deshpande ◽

Kodi Taraszka ◽

Huwenbo Shi ◽

...

Keyword(s):

Experimental Evaluation ◽

Genomic Analysis ◽

Computational Algorithms ◽

Read Alignment ◽

Systematic Survey ◽

Essential Step ◽

Technological Advances ◽

Alignment Algorithms ◽

Long Reads ◽

Recent Developments

AbstractAligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.

Download Full-text

Fast and SNP-aware short read alignment with SALT

BMC Bioinformatics ◽

10.1186/s12859-021-04088-6 ◽

2021 ◽

Vol 22 (S9) ◽

Author(s):

Wei Quan ◽

Bo Liu ◽

Yadong Wang

Keyword(s):

Sequence Alignment ◽

Genetic Variants ◽

High Throughput Sequencing ◽

Reference Genome ◽

Graph Model ◽

Sequence Alignments ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Alignment Tool

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.

Download Full-text

PAC: Highly accurate quantification of allelic gene expression for population and disease genetics

10.1101/2021.07.13.452202 ◽

2021 ◽

Author(s):

Anna Saukkonen ◽

Helena Kilpinen ◽

Alan Hodgkinson

Keyword(s):

Gene Expression ◽

Gene Regulation ◽

Rna Sequencing ◽

Specific Gene ◽

Read Alignment ◽

Specific Gene Expression ◽

Powerful Approach ◽

Allelic Gene ◽

Allele Specific ◽

Accurate Quantification

Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation. However, detection of ASE events relies on accurate alignment of RNA-sequencing reads, where challenges still remain. We have developed PAC, a method that combines multiple steps to improve the quantification of allelic reads, including personalised (i.e. diploid) read alignment with improved allocation of multi-mapping reads. We show that PAC outperforms standard alignment approaches for ASE detection in both accuracy and in the number of sites it can reliably quantify.

Download Full-text

SRAMM: Short Read Alignment Mapping Metrics

International Journal on Bioinformatics & Biosciences ◽

10.5121/ijbb.2021.11201 ◽

2021 ◽

Vol 11 (02) ◽

pp. 01-07

Author(s):

Alvin Chon ◽

Xiaoqiu Huang

Keyword(s):

Third Party ◽

Read Mapping ◽

Short Read ◽

Read Alignment ◽

Short Read Mapping ◽

Short Read Alignment ◽

Command Line Tool ◽

Short Read Aligner ◽

Quality Programs ◽

Computationally Intensive

Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationally intensive or designed for specific applications. SRAMM efficiently generates multiple different concept-based mapping scores to provide for an informative post alignment examination and filtering process of aligned short reads for various downstream applications. SRAMM is compatible with Python 2.6+ and Python 3.6+ on all operating systems. It works with any short read aligner that generates SAM/BAM/CRAM file outputs and reports 'AS' tags. It is freely available under the MIT license at http://github.com/achon/sramm.

Download Full-text

read alignment
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fuzzy set intersection based paired-end short-read alignment

Fuzzy set intersection based paired-end short-read alignment

PacRAT: A program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment

Polypolish: short-read polishing of long-read bacterial genome assemblies

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

BWA-MEME: BWA-MEM emulated with a machine learning approach

Technology dictates algorithms: recent developments in read alignment

Fast and SNP-aware short read alignment with SALT

PAC: Highly accurate quantification of allelic gene expression for population and disease genetics

SRAMM: Short Read Alignment Mapping Metrics

Export Citation Format

read alignmentRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fuzzy set intersection based paired-end short-read alignment

Fuzzy set intersection based paired-end short-read alignment

PacRAT: A program to improve barcode-variant mapping from PacBio long reads using multiple sequence alignment

Polypolish: short-read polishing of long-read bacterial genome assemblies

Prophage Tracer: precisely tracing prophages in prokaryotic genomes using overlapping split-read alignment

BWA-MEME: BWA-MEM emulated with a machine learning approach

Technology dictates algorithms: recent developments in read alignment

Fast and SNP-aware short read alignment with SALT

PAC: Highly accurate quantification of allelic gene expression for population and disease genetics

SRAMM: Short Read Alignment Mapping Metrics

read alignment
Recently Published Documents