optical maps
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 22)

H-INDEX

13
(FIVE YEARS 3)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Bin Huang ◽  
Guozheng Wei ◽  
Bing Wang ◽  
Fusong Ju ◽  
Yi Zhong ◽  
...  

Abstract Background Optical maps record locations of specific enzyme recognition sites within long genome fragments. This long-distance information enables aligning genome assembly contigs onto optical maps and ordering contigs into scaffolds. The generated scaffolds, however, often contain a large amount of gaps. To fill these gaps, a feasible way is to search genome assembly graph for the best-matching contig paths that connect boundary contigs of gaps. The combination of searching and evaluation procedures might be “searching followed by evaluation”, which is infeasible for long gaps, or “searching by evaluation”, which heavily relies on heuristics and thus usually yields unreliable contig paths. Results We here report an accurate and efficient approach to filling gaps of genome scaffolds with aids of optical maps. Using simulated data from 12 species and real data from 3 species, we demonstrate the successful application of our approach in gap filling with improved accuracy and completeness of genome scaffolds. Conclusion Our approach applies a sequential Bayesian updating technique to measure the similarity between optical maps and candidate contig paths. Using this similarity to guide path searching, our approach achieves higher accuracy than the existing “searching by evaluation” strategy that relies on heuristics. Furthermore, unlike the “searching followed by evaluation” strategy enumerating all possible paths, our approach prunes the unlikely sub-paths and extends the highly-probable ones only, thus significantly increasing searching efficiency.


2021 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Daniel Dole-Muinos ◽  
Ayomide Ajayi ◽  
Mattia Prosperi ◽  
...  

Optical mapping is a method for creating high resolution restriction maps of an entire genome. Optical mapping has been largely automated, and first produces single molecule restriction maps, called Rmaps, which are assembled to generate genome wide optical maps. Since the location and orientation of each Rmap is unknown, the first problem in the analysis of this data is finding related Rmaps, i.e., pairs of Rmaps that share the same orientation and have significant overlap in their genomic location. Although heuristics for identifying related Rmaps exist, they all require quantization of the data which leads to a loss in the precision. In this paper, we propose a Gaussian mixture modelling clustering based method, which we refer to as OMclust, that finds overlapping Rmaps without quantization. Using both simulated and real datasets, we show that OMclust substantially improves the precision (from 48.3% to 73.3%) over the state-of-the art methods while also reducing CPU time and memory consumption. Further, we integrated OMclust into the error correction methods (Elmeri and cOMet) to demonstrate the increase in the performance of these methods. When OMclust was combined with cOMet to error correct Rmap data generated from human DNA, it was able to error correct close to 3x more Rmaps, and reduced the CPU time by more than 35x. Our software is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/OMclust


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.


Patterns ◽  
2021 ◽  
pp. 100248
Author(s):  
Siavash Raeisi Dehkordi ◽  
Jens Luebeck ◽  
Vineet Bafna
Keyword(s):  

2021 ◽  
Author(s):  
Tingting Zhu ◽  
Le Wang ◽  
Hélène Rimbert ◽  
Juan C. Rodriguez ◽  
Karin R. Deal ◽  
...  

2021 ◽  
Author(s):  
Aurélie Canaguier ◽  
Romane Guilbaud ◽  
Erwan Denis ◽  
Ghislaine Magdelenat ◽  
Caroline Belser ◽  
...  

AbstractBackgroundStructural Variations (SVs) are very diverse genomic rearrangements. In the past, their detection was restricted to cytological approaches, then to NGS read size and partitionned assemblies. Due to the current capabilities of technologies such as long read sequencing and optical mapping, larger SVs detection are becoming more and more accessible.This study proposes a comparison in SVs detection and characterization from long-read sequencing obtained with the MinION device developed by Oxford Nanopore Technologies and from optical mapping produced by the Saphyr device commercialized by Bionano Genomics. The genomes of the two Arabidopsis thaliana ecotypes Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1) were chosen to guide the use of one or the other technology.ResultsWe described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 on the public reference Col-0 TAIR10.1. After filtering, 1 184 and 591 Ler-1 SVs were retained from ONT and BioNano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations in both technologies. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted.ConclusionsStructural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference, most of detected SVs were found in same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the later being more efficient to characterize large SVs. Even if both technologies are obvious complementary approaches, ONT data appears to be more adapted to large scale populations study, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohammadali Faraji ◽  
Andrea Borsato ◽  
Silvia Frisia ◽  
John C. Hellstrom ◽  
Andrew Lorrey ◽  
...  

AbstractTropical Pacific stalagmites are commonly affected by dating uncertainties because of their low U concentration and/or elevated initial 230Th content. This poses problems in establishing reliable trends and periodicities for droughts and pluvial episodes in a region vulnerable to climate change. Here we constrain the chronology of a Cook Islands stalagmite using synchrotron µXRF two-dimensional mapping of Sr concentrations coupled with growth laminae optical imaging constrained by in situ monitoring. Unidimensional LA-ICP-MS-generated Mg, Sr, Ba and Na variability series were anchored to the 2D Sr and optical maps. The annual hydrological significance of Mg, Sr, Ba and Na was tested by principal component analysis, which revealed that Mg and Na are related to dry-season, wind-transported marine aerosols, similar to the host-rock derived Sr and Ba signatures. Trace element annual banding was then used to generate a calendar-year master chronology with a dating uncertainty maximum of ± 15 years over 336 years. Our approach demonstrates that accurate chronologies and coupled hydroclimate proxies can be obtained from speleothems formed in tropical settings where low seasonality and problematic U–Th dating would discourage the use of high-resolution climate proxies datasets.


2021 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

Abstract Genome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there exists very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary method that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics' Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as Rmapper, and compare its performance against the assembler of Valouev et al. (2006) and Solve by Bionano Genomics on data from three genomes - E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al.(2006) only successfully ran on E. coli. Moreover, on the human genome Rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, RMAPPER is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.


2021 ◽  
Author(s):  
Siavash Raeisi Dehkordi ◽  
Jens Luebeck ◽  
Vineet Bafna
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document