A dynamic hashing approach to build the de bruijn graph for genome assembly

Abstract Motivation Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps—called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. Availability and implementation The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Assembly of Long Error-Prone Reads Using de Bruijn Graphs

10.1101/048413 ◽

2016 ◽

Cited By ~ 6

Author(s):

Yu Lin ◽

Jeffrey Yuan ◽

Mikhail Kolmogorov ◽

Max W. Shen ◽

Pavel A. Pevzner

Keyword(s):

Real Time ◽

Single Molecule ◽

Genome Assembly ◽

State Of The Art ◽

De Bruijn Graph ◽

Consensus Approach ◽

De Bruijn Graphs ◽

De Bruijn

AbstractThe recent breakthroughs in assembling long error-prone reads (such as reads generated by Single Molecule Real Time technology) were based on the overlap-layout-consensus approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the overlap-layout-consensus approach is the only practical paradigm for assembling long error-prone reads. Below we show how to generalize de Bruijn graphs to assemble long error-prone reads and describe the ABruijn assembler, which results in more accurate genome reconstructions than the existing state-of-the-art algorithms.

Download Full-text

De Bruijn Graph based De novo Genome Assembly

Journal of Software ◽

10.4304/jsw.9.8.2160-2168 ◽

2014 ◽

Vol 9 (8) ◽

Author(s):

Mohammad Ibrahim Khan ◽

Md Sarwar Kamal

Keyword(s):

Genome Assembly ◽

De Novo ◽

De Bruijn Graph ◽

De Novo Genome Assembly ◽

De Bruijn

Download Full-text

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

International Journal of Genomics ◽

10.1155/2017/6120980 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Mahfuzer Rahman Limon ◽

Ratul Sharker ◽

Sajib Biswas ◽

M. Sohel Rahman

Keyword(s):

Data Structure ◽

Genome Assembly ◽

Hash Table ◽

De Bruijn Graph ◽

False Positive Error ◽

Running Time ◽

Sequencing Technologies ◽

Noteworthy Feature ◽

De Bruijn ◽

Main Barrier

Background. The rapid advancement of sequencing technologies has made it possible to regularly produce millions of high-quality reads from the DNA samples in the sequencing laboratories. To this end, the de Bruijn graph is a popular data structure in the genome assembly literature for efficient representation and processing of data. Due to the number of nodes in a de Bruijn graph, the main barrier here is the memory and runtime. Therefore, this area has received significant attention in contemporary literature. Results. In this paper, we present an approach called HaVec that attempts to achieve a balance between the memory consumption and the running time. HaVec uses a hash table along with an auxiliary vector data structure to store the de Bruijn graph thereby improving the total memory usage and the running time. A critical and noteworthy feature of HaVec is that it exhibits no false positive error. Conclusions. In general, the graph construction procedure takes the major share of the time involved in an assembly process. HaVec can be seen as a significant advancement in this aspect. We anticipate that HaVec will be extremely useful in the de Bruijn graph-based genome assembly.

Download Full-text

GPU-Accelerated Bidirected De Bruijn Graph Construction for Genome Assembly

Web Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37401-2_8 ◽

2013 ◽

pp. 51-62 ◽

Cited By ~ 4

Author(s):

Mian Lu ◽

Qiong Luo ◽

Bingqiang Wang ◽

Junkai Wu ◽

Jiuxin Zhao

Keyword(s):

Genome Assembly ◽

De Bruijn Graph ◽

De Bruijn

Download Full-text

Assembly of long error-prone reads using de Bruijn graphs

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1604560113 ◽

2016 ◽

Vol 113 (52) ◽

pp. E8396-E8405 ◽

Cited By ~ 85

Author(s):

Yu Lin ◽

Jeffrey Yuan ◽

Mikhail Kolmogorov ◽

Max W. Shen ◽

Mark Chaisson ◽

...

Keyword(s):

Genome Assembly ◽

De Bruijn Graph ◽

De Bruijn Graphs ◽

De Bruijn

The recent breakthroughs in assembling long error-prone reads were based on the overlap-layout-consensus (OLC) approach and did not utilize the strengths of the alternative de Bruijn graph approach to genome assembly. Moreover, these studies often assume that applications of the de Bruijn graph approach are limited to short and accurate reads and that the OLC approach is the only practical paradigm for assembling long error-prone reads. We show how to generalize de Bruijn graphs for assembling long error-prone reads and describe the ABruijn assembler, which combines the de Bruijn graph and the OLC approaches and results in accurate genome reconstructions.

Download Full-text

A dynamic hashing approach to build the de bruijn graph for genome assembly

Improved Parallel Processing of Massive De Bruijn Graph for Genome Assembly

RMI-DBG Algorithm: A more agile Iterative de Bruijn Graph Algorithm in Short Read Genome Assembly

Efficient de Bruijn graph construction for genome assembly using a hash table and auxiliary vector data structures

Accelerating De Bruijn Graph-Based Genome Assembly for High-Throughput Short Read Data

Aligning optical maps to de Bruijn graphs

Assembly of Long Error-Prone Reads Using de Bruijn Graphs

De Bruijn Graph based De novo Genome Assembly

HaVec: An Efficient de Bruijn Graph Construction Algorithm for Genome Assembly

GPU-Accelerated Bidirected De Bruijn Graph Construction for Genome Assembly

Assembly of long error-prone reads using de Bruijn graphs

Export Citation Format