scholarly journals Large-scale structural variation detection in subterranean clover subtypes using optical mapping validated at nucleotide level

2017 ◽  
Author(s):  
Yuxuan Yuan ◽  
Zbyněk Milec ◽  
Philipp E. Bayer ◽  
Jan Vrána ◽  
Jaroslav Doležel ◽  
...  

AbstractWhole genome sequencing has been widely used to detect structural variations (SVs). However, the limited single molecule size makes it difficult to characterize large-scale SVs in a genome because they cannot fully cover such vast and complex regions. Recently, optical mapping in nanochannels has provided novel resolution to detect large-scale SVs by comparing the physical location of the nickase recognition sequence in genomes. Other than in humans, SVs discovered in plants by optical mapping have not been validated. To assess the accuracy of SV calling in plants by optical mapping, we selected two genetically diverse subspecies of the Trifolium model species, subterranean clover cvs. Daliak and Yarloop. The SVs discovered by BioNano optical mapping (BOM) were validated using Illumina short reads. In the analysis, BOM identified 12 large-scale regions containing deletions and 19 containing insertions in Yarloop. The 12 large-scale regions contained 71 small deletions when validated by Illumina short reads. The results suggest that BOM could detect the total size of deletions and insertions, but it could not precisely report the location and actual quantity of SVs in the genome. Nucleotide-level validation is crucial to confirm and characterize SVs reported by optical mapping. The accuracy of SV detection by BOM is highly dependent on the quality of reference genomes and the density of selected nickases.

2017 ◽  
Author(s):  
Patrick Marks ◽  
Sarah Garcia ◽  
Alvaro Martinez Barrio ◽  
Kamila Belhocine ◽  
Jorge Bernate ◽  
...  

AbstractLarge-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short read whole genome sequencing. However, standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the advantages of short reads. Starting from only ∼1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as ‘Linked-Reads’. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes (Weisenfeld et al. 2017). In this manuscript, we show the advantages of Linked-Reads over standard short read approaches for reference based analysis. We demonstrate the ability of Linked-Reads to reconstruct megabase scale haplotypes and to recover parts of the genome that are typically inaccessible to short reads, including phenotypically important genes such as STRC, SMN1 and SMN2. We demonstrate the ability of both lrWGS and Linked-Read Whole Exome Sequencing (lrWES) to identify complex structural variations, including balanced events, single exon deletions, and single exon duplications. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.


2021 ◽  
Author(s):  
Aurélie Canaguier ◽  
Romane Guilbaud ◽  
Erwan Denis ◽  
Ghislaine Magdelenat ◽  
Caroline Belser ◽  
...  

AbstractBackgroundStructural Variations (SVs) are very diverse genomic rearrangements. In the past, their detection was restricted to cytological approaches, then to NGS read size and partitionned assemblies. Due to the current capabilities of technologies such as long read sequencing and optical mapping, larger SVs detection are becoming more and more accessible.This study proposes a comparison in SVs detection and characterization from long-read sequencing obtained with the MinION device developed by Oxford Nanopore Technologies and from optical mapping produced by the Saphyr device commercialized by Bionano Genomics. The genomes of the two Arabidopsis thaliana ecotypes Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1) were chosen to guide the use of one or the other technology.ResultsWe described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 on the public reference Col-0 TAIR10.1. After filtering, 1 184 and 591 Ler-1 SVs were retained from ONT and BioNano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations in both technologies. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted.ConclusionsStructural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference, most of detected SVs were found in same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the later being more efficient to characterize large SVs. Even if both technologies are obvious complementary approaches, ONT data appears to be more adapted to large scale populations study, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference.


2021 ◽  
Author(s):  
Cai Chen ◽  
Enrico D'Alessandro ◽  
Eduard Murani ◽  
Yao Zheng ◽  
Domenico Giosa ◽  
...  

Abstract Background: Molecular markers based on retrotransposon insertion polymorphisms (RIPs) have been developed and are widely used in plants and animals. Short interspersed nuclear elements (SINEs) exert wide impacts on gene activity and even on phenotypes. However, SINE RIP profiles in livestock remain largely unknown, and not be revealed in pigs. Results: Our data revealed that SINEA1 displayed the most polymorphic insertions (22.5% intragenic and 26.5% intergenic), followed by SINEA2 (10.5% intragenic and 9% intergenic) and SINEA3 (12.5% intragenic and 5.0% intergenic). We developed a genome-wide SINE RIP mining protocol and obtained a large number of SINE RIPs (36,284), with over 80% accuracy and an even distribution in chromosomes (14.5/Mb), and 74.34% of SINE RIPs generated by SINEA1 element. Over 65% of pig SINE RIPs overlap with genes, with significant enrichment in the first and second introns of protein-coding and long non-coding RNA genes. Nearly half of the RIPs are common in these pig breeds. Sixteen SINE RIPs were applied for population genetic analysis in 23 pig breeds, the phylogeny tree and cluster analysis were generally consistent with the geographical distributions of native pig breeds in China. Conclusions: Our analysis revealed that SINEA1–3 elements, particularly SINEA1, are high polymorphic across different pig breeds, and generate large-scale structural variations in the pig genomes. And over 35, 000 SINE RIP markers were obtained. These data indicate that young SINE elements play important roles in creating new genetic variations and shaping the evolution of pig genome, and also provide strong evidences to support the great potential of SINE RIPs as genetic markers, which can be used for population genetic analysis and quantitative trait locus (QTL) mapping in pig.


2015 ◽  
Vol 112 (25) ◽  
pp. 7689-7694 ◽  
Author(s):  
Aditya Gupta ◽  
Michael Place ◽  
Steven Goldstein ◽  
Deepayan Sarkar ◽  
Shiguo Zhou ◽  
...  

Multiple myeloma (MM), a malignancy of plasma cells, is characterized by widespread genomic heterogeneity and, consequently, differences in disease progression and drug response. Although recent large-scale sequencing studies have greatly improved our understanding of MM genomes, our knowledge about genomic structural variation in MM is attenuated due to the limitations of commonly used sequencing approaches. In this study, we present the application of optical mapping, a single-molecule, whole-genome analysis system, to discover new structural variants in a primary MM genome. Through our analysis, we have identified and characterized widespread structural variation in this tumor genome. Additionally, we describe our efforts toward comprehensive characterization of genome structure and variation by integrating our findings from optical mapping with those from DNA sequencing-based genomic analysis. Finally, by studying this MM genome at two time points during tumor progression, we have demonstrated an increase in mutational burden with tumor progression at all length scales of variation.


Genes ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1444
Author(s):  
Nazeefa Fatima ◽  
Anna Petri ◽  
Ulf Gyllensten ◽  
Lars Feuk ◽  
Adam Ameur

Long-read single molecule sequencing is increasingly used in human genomics research, as it allows to accurately detect large-scale DNA rearrangements such as structural variations (SVs) at high resolution. However, few studies have evaluated the performance of different single molecule sequencing platforms for SV detection in human samples. Here we performed Oxford Nanopore Technologies (ONT) whole-genome sequencing of two Swedish human samples (average 32× coverage) and compared the results to previously generated Pacific Biosciences (PacBio) data for the same individuals (average 66× coverage). Our analysis inferred an average of 17k and 23k SVs from the ONT and PacBio data, respectively, with a majority of them overlapping with an available multi-platform SV dataset. When comparing the SV calls in the two Swedish individuals, we find a higher concordance between ONT and PacBio SVs detected in the same individual as compared to SVs detected by the same technology in different individuals. Downsampling of PacBio reads, performed to obtain similar coverage levels for all datasets, resulted in 17k SVs per individual and improved overlap with the ONT SVs. Our results suggest that ONT and PacBio have a similar performance for SV detection in human whole genome sequencing data, and that both technologies are feasible for population-scale studies.


2018 ◽  
Vol 9 ◽  
Author(s):  
Yuxuan Yuan ◽  
Zbyněk Milec ◽  
Philipp E. Bayer ◽  
Jan Vrána ◽  
Jaroslav Doležel ◽  
...  

2018 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Darshan Washimkar ◽  
Martin D. Muggli ◽  
Leena Salmela ◽  
Christina Boucher

AbstractOptical mapping is a unique system that is capable of producing high-resolution, high-throughput genomic map data that gives information about the structure of a genome [21]. Recently it has been used for scaffolding contigs and assembly validation for large-scale sequencing projects, including the maize [32], goat [6], and amborella [4] genomes. However, a major impediment in the use of this data is the variety and quantity of errors in the raw optical mapping data, which are called Rmaps. The challenges associated with using Rmap data are analogous to dealing with insertions and deletions in the alignment of long reads. Moreover, they are arguably harder to tackle since the data is numerical and susceptible to inaccuracy. We develop cOMet to error correct Rmap data, which to the best of our knowledge is the only optical mapping error correction method. Our experimental results demonstrate that cOMet has high prevision and corrects 82.49% of insertion errors and 77.38% of deletion errors in Rmap data generated from the E. coli K-12 reference genome. Out of the deletion errors corrected, 98.26% are true errors. Similarly, out of the insertion errors corrected, 82.19% are true errors. It also successfully scales to large genomes, improving the quality of 78% and 99% of the Rmaps in the plum and goat genomes, respectively. Lastly, we show the utility of error correction by demonstrating how it improves the assembly of Rmap data. Error corrected Rmap data results in an assembly that is more contiguous, and covers a larger fraction of the genome.


2022 ◽  
Author(s):  
Mehmet Akdel ◽  
Dick de Ridder

Detecting structural variation (SV) in eukaryotic genomes is of broad interest due to its often dramatic phenotypic effects, but remains a major, costly challenge based on DNA sequencing data. A cost-effective alternative in detecting large-scale SV has become available with advances in optical mapping technology. However, the algorithmic approaches to identifying SVs from optical mapping data are limited. Here, we propose a novel, open-source SV detection tool, OptiDiff, which employs a single molecule based approach to detect and classify homozygous and heterozygous SVs at coverages as low as 20x, showing better performance than the state of the art.


2019 ◽  
Vol 35 (18) ◽  
pp. 3250-3256 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Bahar Alipanahi ◽  
Tamer Kahveci ◽  
Leena Salmela ◽  
Christina Boucher

Abstract Motivation Optical maps are high-resolution restriction maps (Rmaps) that give a unique numeric representation to a genome. Used in concert with sequence reads, they provide a useful tool for genome assembly and for discovering structural variations and rearrangements. Although they have been a regular feature of modern genome assembly projects, optical maps have been mainly used in post-processing step and not in the genome assembly process itself. Several methods have been proposed for pairwise alignment of single molecule optical maps—called Rmaps, or for aligning optical maps to assembled reads. However, the problem of aligning an Rmap to a graph representing the sequence data of the same genome has not been studied before. Such an alignment provides a mapping between two sets of data: optical maps and sequence data which will facilitate the usage of optical maps in the sequence assembly step itself. Results We define the problem of aligning an Rmap to a de Bruijn graph and present the first algorithm for solving this problem which is based on a seed-and-extend approach. We demonstrate that our method is capable of aligning 73% of Rmaps generated from the Escherichia coli genome to the de Bruijn graph constructed from short reads generated from the same genome. We validate the alignments and show that our method achieves an accuracy of 99.6%. We also show that our method scales to larger genomes. In particular, we show that 76% of Rmaps can be aligned to the de Bruijn graph in the case of human data. Availability and implementation The software for aligning optical maps to de Bruijn graph, omGraph is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/omGraph. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 65 (1) ◽  
pp. 51-66 ◽  
Author(s):  
Jonathan Jeffet ◽  
Sapir Margalit ◽  
Yael Michaeli ◽  
Yuval Ebenstein

Abstract The human genome contains multiple layers of information that extend beyond the genetic sequence. In fact, identical genetics do not necessarily yield identical phenotypes as evident for the case of two different cell types in the human body. The great variation in structure and function displayed by cells with identical genetic background is attributed to additional genomic information content. This includes large-scale genetic aberrations, as well as diverse epigenetic patterns that are crucial for regulating specific cell functions. These genetic and epigenetic patterns operate in concert in order to maintain specific cellular functions in health and disease. Single-molecule optical genome mapping is a high-throughput genome analysis method that is based on imaging long chromosomal fragments stretched in nanochannel arrays. The access to long DNA molecules coupled with fluorescent tagging of various genomic information presents a unique opportunity to study genetic and epigenetic patterns in the genome at a single-molecule level over large genomic distances. Optical mapping entwines synergistically chemical, physical, and computational advancements, to uncover invaluable biological insights, inaccessible by sequencing technologies. Here we describe the method’s basic principles of operation, and review the various available mechanisms to fluorescently tag genomic information. We present some of the recent biological and clinical impact enabled by optical mapping and present recent approaches for increasing the method’s resolution and accuracy. Finally, we discuss how multiple layers of genomic information may be mapped simultaneously on the same DNA molecule, thus paving the way for characterizing multiple genomic observables on individual DNA molecules.


Sign in / Sign up

Export Citation Format

Share Document