short read aligner Latest Research Papers

Short Read Alignment Mapping Metrics (SRAMM): is an efficient and versatile command line tool providing additional short read mapping metrics, filtering, and graphs. Short read aligners report MAPing Quality (MAPQ), but these methods generally are neither standardized nor well described in literature or software manuals. Additionally, third party mapping quality programs are typically computationally intensive or designed for specific applications. SRAMM efficiently generates multiple different concept-based mapping scores to provide for an informative post alignment examination and filtering process of aligned short reads for various downstream applications. SRAMM is compatible with Python 2.6+ and Python 3.6+ on all operating systems. It works with any short read aligner that generates SAM/BAM/CRAM file outputs and reports 'AS' tags. It is freely available under the MIT license at http://github.com/achon/sramm.

Download Full-text

Faster short-read mapping with strobemer seeds in syncmer space

10.1101/2021.06.18.449070 ◽

2021 ◽

Author(s):

Kristoffer Sahlin

Keyword(s):

High Speed ◽

Mapping Accuracy ◽

Original Sequence ◽

Short Read ◽

Short Read Mapping ◽

Alignment Algorithms ◽

Reverse Complement ◽

Short Read Aligner ◽

Candidate Regions ◽

Burrows Wheeler Transform

Short-read genome alignment is a fundamental computational step used in many bioinformatic analyses. It is therefore desirable to align such data as fast as possible. Most alignment algorithms consider a seed-and-extend approach. Several popular programs perform the seeding step based on the Burrows-Wheeler Transform with a low memory footprint, but they are relatively slow compared to more recent approaches that use a minimizer-based seeding-and-chaining strategy. Recently, syncmers and strobemers were proposed for sequence comparison. Both protocols were designed for improved conservation of matches between sequences under mutations. Syncmers is a thinning protocol proposed as an alternative to minimizers, while strobemers is a linking protocol for gapped sequences and was proposed as an alternative to k-mers. The main contribution in this work is a new seeding approach that combines syncmers and strobemers. We use a strobemer protocol (randstrobes) to link together syncmers (i.e., in syncmer-space) instead of over the original sequence. Our protocol allows us to create longer seeds while preserving mapping accuracy. A longer seed length reduces the number of candidate regions which allows faster mapping and alignment. We also contribute the insight that speed-wise, this protocol is particularly effective when syncmers are canonical. Canonical syncmers can be created for specific parameter combinations and reduce the computational burden of computing the non-canonical randstrobes in reverse complement. We implement our idea in a proof-of-concept short-read aligner strobealign that aligns short reads 3-4x faster than minimap2 and 15-23x faster than BWA and Bowtie2. Many implementation versions of, e.g., BWA, achieve high speed on specific hardware. Our contribution is algorithmic and requires no hardware architecture or system-specific instructions. Strobealign is available at https://github.com/ksahlin/StrobeAlign.

Download Full-text

BWTaligner: a genome short-read aligner

Vietnam Journal of Science Technology and Engineering ◽

10.31276/vjste.60(2).73 ◽

2018 ◽

Vol 60 (2) ◽

pp. 73-77

Author(s):

Lam Nguyen ◽

Xuan Thi Trinh ◽

Hien Trinh ◽

Dang Hung Tran ◽

Cuong Nguyen ◽

...

Keyword(s):

Short Read ◽

A Genome ◽

Short Read Aligner

Download Full-text

Highly accurate and sensitive short read aligner

TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES ◽

10.3906/elk-1703-251 ◽

2018 ◽

Vol 26 (2) ◽

pp. 721-731

Author(s):

MEHMET YAĞMUR GÖK ◽

SEZER GÖREN UĞURDAĞ ◽

CEM ÜNSALAN ◽

MAHMUT ŞAMİL SAĞIROĞLU

Keyword(s):

Short Read ◽

Short Read Aligner

Download Full-text

GPU-accelerated alignment of bisulfite-treated short-read sequences

10.1101/175729 ◽

2017 ◽

Author(s):

Richard Wilton ◽

Xin Li ◽

Andrew P. Feinberg ◽

Alexander S. Szalay

Keyword(s):

Dna Sequences ◽

Graphics Processing Unit ◽

General Purpose ◽

Processing Unit ◽

Short Read ◽

Wide Range ◽

Programming Logic ◽

Short Read Aligner ◽

Graphics Processing ◽

Better Than

AbstractThe alignment of bisulfite-treated DNA sequences (BS-seq reads) to a large genome involves a significant computational burden beyond that required to align non-bisulfite-treated reads. In the analysis of BS-seq data, this can present an important performance bottleneck that can potentially be addressed by appropriate software-engineering and algorithmic improvements. One strategy is to integrate this additional programming logic into the read-alignment implementation in a way that the software becomes amenable to optimizations that lead to both higher speed and greater sensitivity than can be achieved without this integration.We have evaluated this approach using Arioc, a short-read aligner that uses GPU (general-purpose graphics processing unit) hardware to accelerate computationally-expensive programming logic. We integrated the BS-seq computational logic into both GPU and CPU code throughout the Arioc implementation. We then carried out a read-by-read comparison of Arioc's reported alignments with the alignments reported by the most widely used BS-seq read aligners. With simulated reads, Arioc's accuracy is equal to or better than the other read aligners we evaluated. With human sequencing reads, Arioc's throughput is at least 10 times faster than existing BS-seq aligners across a wide range of sensitivity settings.The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license.

Download Full-text

CHIC: a short read aligner for pan-genomic references

10.1101/178129 ◽

2017 ◽

Cited By ~ 5

Author(s):

Daniel Valenzuela ◽

Veli Mäkinen

Keyword(s):

Computer Science ◽

Open Source ◽

Bioinformatic Analysis ◽

Hybrid Technique ◽

Short Read ◽

Pan Genome ◽

Link Type ◽

Short Read Aligner ◽

Science Community

AbstractRecently the topic of computational pan-genomics has gained increasing attention, and particularly the problem of moving from a single-reference paradigm to a pan-genomic one. Perhaps the simplest way to represent a pan-genome is to represent it as a set of sequences. While indexing highly repetitive collections has been intensively studied in the computer science community, the research has focused on efficient indexing and exact pattern patching, making most solutions not yet suitable to be used in bioinformatic analysis pipelines.Results:We present CHIC, a short-read aligner that indexes very large and repetitive references using a hybrid technique that combines Lempel-Ziv compression with Burrows-Wheeler read aligners.Availability:Our tool is open source and available online at https://gitlab.com/dvalenzu/CHIC

Download Full-text

Simple scalable nucleotic FPGA based short read aligner for exhaustive search of substitution errors

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2015-0017 ◽

2015 ◽

Vol 7 (2) ◽

pp. 151-185

Author(s):

Péter Fehér ◽

Ágnes Fülöp ◽

Gergely Debreczeni ◽

Máté Nagy-Egri ◽

György Vesztergombi

Keyword(s):

Simple Algorithm ◽

Exhaustive Search ◽

Test Results ◽

Gate Arrays ◽

Dna Sequence Alignment ◽

Field Programmable ◽

Short Read Aligner ◽

Programmable Gate Arrays ◽

Efficient Alternative ◽

Predetermined Number

Abstract With the advent of the new and continuously improving technologies, in a couple of years DNA sequencing can be as commonplace as a simple blood test. The growth of sequencing efficiency has a larger exponent than the Moore’s law of standard processors, hence alignment and further processing of sequenced data is the bottleneck. The usage of FPGA (Field Programmable Gate Arrays) technology may provide an efficient alternative. We propose a simple algorithm for DNA sequence alignment, which can be realized efficiently by nucleotic principal agents of Non.Neumann nature. The prototype FPGA implementation runs on a small Terasic DE1-SoC demo board with a Cyclone V chip. We present test results and furthermore analyse the theoretical scalability of this system, showing that the execution time is independent of the length of reference genome sequences. A special advantage of this parallel algorithm is that it performs exhaustive search producing all match variants up to a predetermined number of point (mutation) errors.

Download Full-text