Efficient Indexed Alignment of Contigs to Optical Maps

Author(s):  
Martin D. Muggli ◽  
Simon J. Puglisi ◽  
Christina Boucher
Keyword(s):  
2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohammadali Faraji ◽  
Andrea Borsato ◽  
Silvia Frisia ◽  
John C. Hellstrom ◽  
Andrew Lorrey ◽  
...  

AbstractTropical Pacific stalagmites are commonly affected by dating uncertainties because of their low U concentration and/or elevated initial 230Th content. This poses problems in establishing reliable trends and periodicities for droughts and pluvial episodes in a region vulnerable to climate change. Here we constrain the chronology of a Cook Islands stalagmite using synchrotron µXRF two-dimensional mapping of Sr concentrations coupled with growth laminae optical imaging constrained by in situ monitoring. Unidimensional LA-ICP-MS-generated Mg, Sr, Ba and Na variability series were anchored to the 2D Sr and optical maps. The annual hydrological significance of Mg, Sr, Ba and Na was tested by principal component analysis, which revealed that Mg and Na are related to dry-season, wind-transported marine aerosols, similar to the host-rock derived Sr and Ba signatures. Trace element annual banding was then used to generate a calendar-year master chronology with a dating uncertainty maximum of ± 15 years over 336 years. Our approach demonstrates that accurate chronologies and coupled hydroclimate proxies can be obtained from speleothems formed in tropical settings where low seasonality and problematic U–Th dating would discourage the use of high-resolution climate proxies datasets.


Genome ◽  
2018 ◽  
Vol 61 (8) ◽  
pp. 559-565 ◽  
Author(s):  
Tingting Zhu ◽  
Zhaorong Hu ◽  
Juan C. Rodriguez ◽  
Karin R. Deal ◽  
Jan Dvorak ◽  
...  

Brachypodium distachyon (n = 5) is a diploid and has been widely used as a genetic model. Brachypodium stacei (n = 10) and B. hybridum (n = 15) are species that are related to B. distachyon, leading to an hypothesis that they are part of a polyploid series based on x = 5. Several lines of evidence suggest that this hypothesis is incorrect and that the genomes of the three taxa may have evolved by a more complex process. We constructed an optical whole-genome BioNano genome (BNG) map for each species and did pairwise alignment of the BNG maps. The maps showed that B. distachyon and B. stacei are both diploid, in spite of B. stacei having twice as many chromosomes as B. distachyon, and that B. hybridum is an allopolyploid formed from hybridization between B. distachyon and B. stacei. This study also demonstrated the use of BNG maps in the detection and quantification of structural variants among the genomes.


2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Martin D. Muggli ◽  
Simon J. Puglisi ◽  
Christina Boucher

Abstract Background Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Results We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Conclusion we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time.


2019 ◽  
pp. g3.200902.2018 ◽  
Author(s):  
Tingting Zhu ◽  
Le Wang ◽  
Juan C. Rodriguez ◽  
Karin R. Deal ◽  
Raz Avni ◽  
...  

Microbiology ◽  
2007 ◽  
Vol 153 (6) ◽  
pp. 1720-1733 ◽  
Author(s):  
Michael L. Kotewicz ◽  
Scott A. Jackson ◽  
J. Eugene. LeClerc ◽  
Thomas A. Cebula

2018 ◽  
Author(s):  
Stáphane Deschamps ◽  
Yun Zhang ◽  
Victor Llaca ◽  
Liang Ye ◽  
Gregory May ◽  
...  

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.


2019 ◽  
Author(s):  
Prashant S. Hosmani ◽  
Mirella Flores-Gonzalez ◽  
Henri van de Geest ◽  
Florian Maumus ◽  
Linda V. Bakker ◽  
...  

AbstractThe original Heinz 1706 reference genome was produced by a large team of scientists from across the globe from a variety of input sources that included 454 sequences in addition to full-length BACs, BAC and fosmid ends sequenced with Sanger technology. We present here the latest tomato reference genome (SL4.0) assembled de novo from PacBio long reads and scaffolded using Hi-C contact maps. The assembly was validated using Bionano optical maps and 10X linked-read sequences. This assembly is highly contiguous with fewer gaps compared to previous genome builds and almost all scaffolds have been anchored and oriented to the 12 tomato chromosomes. We have found more repeats compared to the previous versions and one of the largest repeat classes identified are the LTR retrotransposons. We also describe updates to the reference genome and annotation since the last publication. The corresponding ITAG4.0 annotation has 4,794 novel genes along with 29,281 genes preserved from ITAG2.4. Most of the updated genes have extensions in the 5’ and 3’ UTRs resulting in doubling of annotated UTRs per gene. The genome and annotation can be accessed using SGN through BLAST database, Pathway database (SolCyc), Apollo, JBrowse genome browser and FTP available at https://solgenomics.net.


2019 ◽  
Author(s):  
Weihua Pan ◽  
Tao Jiang ◽  
Stefano Lonardi

AbstractDue to the current limitations of sequencing technologies,de novogenome assembly is typically carried out in two stages, namely contig (sequence) assembly and scaffolding. While scaffolding is computationally easier than sequence assembly, the scaffolding problem can be challenging due to the high repetitive content of eukaryotic genomes, possible mis-joins in assembled contigs and inaccuracies in the linkage information. Genome scaffolding tools either use paired-end/mate-pair/linked/Hi-C reads or genome-wide maps (optical, physical or genetic) as linkage information. Optical maps (in particular Bionano Genomics maps) have been extensively used in many recent large-scale genome assembly projects (e.g., goat, apple, barley, maize, quinoa, sea bass, among others). However, the most commonly used scaffolding tools have a serious limitation: they can only deal with one optical map at a time, forcing users to alternate or iterate over multiple maps. In this paper, we introduce a novel scaffolding algorithm called OMGS that for the first time can take advantages of multiple optical maps. OMGS solves several optimization problems to generate scaffolds with optimal contiguity and correctness. Extensive experimental results demonstrate that our tool outperforms existing methods when multiple optical maps are available, and produces comparable scaffolds using a single optical map. OMGS can be obtained fromhttps://github.com/ucrbioinfo/OMGS


Sign in / Sign up

Export Citation Format

Share Document