FaNDOM: Fast Nested Distance-Based Seeding of Optical Maps

2021 ◽  
Author(s):  
Siavash Raeisi Dehkordi ◽  
Jens Luebeck ◽  
Vineet Bafna
Keyword(s):  

Patterns ◽  
2021 ◽  
pp. 100248
Author(s):  
Siavash Raeisi Dehkordi ◽  
Jens Luebeck ◽  
Vineet Bafna
Keyword(s):  


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.



2020 ◽  
Vol 178 (3-4) ◽  
pp. 1125-1172
Author(s):  
Julio Backhoff-Veraguas ◽  
Daniel Bartl ◽  
Mathias Beiglböck ◽  
Manu Eder

Abstract A number of researchers have introduced topological structures on the set of laws of stochastic processes. A unifying goal of these authors is to strengthen the usual weak topology in order to adequately capture the temporal structure of stochastic processes. Aldous defines an extended weak topology based on the weak convergence of prediction processes. In the economic literature, Hellwig introduced the information topology to study the stability of equilibrium problems. Bion–Nadal and Talay introduce a version of the Wasserstein distance between the laws of diffusion processes. Pflug and Pichler consider the nested distance (and the weak nested topology) to obtain continuity of stochastic multistage programming problems. These distances can be seen as a symmetrization of Lassalle’s causal transport problem, but there are also further natural ways to derive a topology from causal transport. Our main result is that all of these seemingly independent approaches define the same topology in finite discrete time. Moreover we show that this ‘weak adapted topology’ is characterized as the coarsest topology that guarantees continuity of optimal stopping problems for continuous bounded reward functions.



2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohammadali Faraji ◽  
Andrea Borsato ◽  
Silvia Frisia ◽  
John C. Hellstrom ◽  
Andrew Lorrey ◽  
...  

AbstractTropical Pacific stalagmites are commonly affected by dating uncertainties because of their low U concentration and/or elevated initial 230Th content. This poses problems in establishing reliable trends and periodicities for droughts and pluvial episodes in a region vulnerable to climate change. Here we constrain the chronology of a Cook Islands stalagmite using synchrotron µXRF two-dimensional mapping of Sr concentrations coupled with growth laminae optical imaging constrained by in situ monitoring. Unidimensional LA-ICP-MS-generated Mg, Sr, Ba and Na variability series were anchored to the 2D Sr and optical maps. The annual hydrological significance of Mg, Sr, Ba and Na was tested by principal component analysis, which revealed that Mg and Na are related to dry-season, wind-transported marine aerosols, similar to the host-rock derived Sr and Ba signatures. Trace element annual banding was then used to generate a calendar-year master chronology with a dating uncertainty maximum of ± 15 years over 336 years. Our approach demonstrates that accurate chronologies and coupled hydroclimate proxies can be obtained from speleothems formed in tropical settings where low seasonality and problematic U–Th dating would discourage the use of high-resolution climate proxies datasets.



Genome ◽  
2018 ◽  
Vol 61 (8) ◽  
pp. 559-565 ◽  
Author(s):  
Tingting Zhu ◽  
Zhaorong Hu ◽  
Juan C. Rodriguez ◽  
Karin R. Deal ◽  
Jan Dvorak ◽  
...  

Brachypodium distachyon (n = 5) is a diploid and has been widely used as a genetic model. Brachypodium stacei (n = 10) and B. hybridum (n = 15) are species that are related to B. distachyon, leading to an hypothesis that they are part of a polyploid series based on x = 5. Several lines of evidence suggest that this hypothesis is incorrect and that the genomes of the three taxa may have evolved by a more complex process. We constructed an optical whole-genome BioNano genome (BNG) map for each species and did pairwise alignment of the BNG maps. The maps showed that B. distachyon and B. stacei are both diploid, in spite of B. stacei having twice as many chromosomes as B. distachyon, and that B. hybridum is an allopolyploid formed from hybridization between B. distachyon and B. stacei. This study also demonstrated the use of BNG maps in the detection and quantification of structural variants among the genomes.



2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Martin D. Muggli ◽  
Simon J. Puglisi ◽  
Christina Boucher

Abstract Background Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Results We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Conclusion we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time.



2019 ◽  
pp. g3.200902.2018 ◽  
Author(s):  
Tingting Zhu ◽  
Le Wang ◽  
Juan C. Rodriguez ◽  
Karin R. Deal ◽  
Raz Avni ◽  
...  


Microbiology ◽  
2007 ◽  
Vol 153 (6) ◽  
pp. 1720-1733 ◽  
Author(s):  
Michael L. Kotewicz ◽  
Scott A. Jackson ◽  
J. Eugene. LeClerc ◽  
Thomas A. Cebula


2018 ◽  
Author(s):  
Stáphane Deschamps ◽  
Yun Zhang ◽  
Victor Llaca ◽  
Liang Ye ◽  
Gregory May ◽  
...  

The advent of long-read sequencing technologies has greatly facilitated assemblies of large eukaryotic genomes. In this paper, Oxford Nanopore sequences generated on a MinION sequencer were combined with BioNano Genomics Direct Label and Stain (DLS) optical maps to generate a chromosome-scale de novo assembly of the repeat-rich Sorghum bicolor Tx430 genome. The final hybrid assembly consists of 29 scaffolds, encompassing in most cases entire chromosome arms. It has a scaffold N50 value of 33.28Mbps and covers >90% of Sorghum bicolor expected genome length. A sequence accuracy of 99.67% was obtained in unique regions after aligning contigs against Illumina Tx430 data. Alignments showed that 99.4% of the 34,211 public gene models are present in the assembly, including 94.2% mapping end-to-end. Comparisons of the DLS optical maps against the public Sorghum Bicolor v3.0.1 BTx623 genome assembly suggest the presence of substantial genomic rearrangements whose origin remains to be determined.



Sign in / Sign up

Export Citation Format

Share Document