Validation of Genomic Structural Variants Through Long Sequencing Technologies

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

Download Full-text

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision

Bioinformatics ◽

10.1093/bioinformatics/btq713 ◽

2011 ◽

Vol 27 (5) ◽

pp. 595-603 ◽

Cited By ~ 62

Author(s):

Alexej Abyzov ◽

Mark Gerstein

Keyword(s):

Structural Variants ◽

Single Nucleotide ◽

Optimal Alignments ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution ◽

Genomic Structural Variants

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

Identification of Structural Variants in Two Novel Genomes of Maize Inbred Lines Possibly Related to Glyphosate Tolerance

Plants ◽

10.3390/plants9040523 ◽

2020 ◽

Vol 9 (4) ◽

pp. 523

Author(s):

Medhat Mahmoud ◽

Joanna Gracz-Bernaciak ◽

Marek Żywicki ◽

Wojciech Karłowski ◽

Tomasz Twardowski ◽

...

Keyword(s):

Gene Expression ◽

Single Molecule ◽

Zea Mays L ◽

Shikimate Pathway ◽

High Impact ◽

Maize Genome ◽

Structural Variants ◽

Shikimate Dehydrogenase ◽

Epsps Gene ◽

Sequencing Technologies

To study genetic variations between genomes of plants that are naturally tolerant and sensitive to glyphosate, we used two Zea mays L. lines traditionally bred in Poland. To overcome the complexity of the maize genome, two sequencing technologies were employed: Illumina and Single Molecule Real-Time (SMRT) PacBio. Eleven thousand structural variants, 4 million SNPs and approximately 800 thousand indels differentiating the two genomes were identified. Detailed analyses allowed to identify 20 variations within the EPSPS gene, but all of them were predicted to have moderate or unknown effects on gene expression. Other genes of the shikimate pathway encoding bifunctional 3-dehydroquinate dehydratase/shikimate dehydrogenase and chorismate synthase were altered by variants predicted to have a high impact on gene expression. Additionally, high-impact variants located within the genes involved in the active transport of glyphosate through the cell membrane encoding phosphate transporters as well as multidrug and toxic compound extrusion have been identified.

Download Full-text

The design and construction of reference pangenome graphs with minigraph

Genome Biology ◽

10.1186/s13059-020-02168-z ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 7

Author(s):

Heng Li ◽

Xiaowen Feng ◽

Chong Chu

Keyword(s):

Data Model ◽

Reference Genome ◽

Structural Variants ◽

Current Reference ◽

Sequencing Technologies ◽

Recent Advances ◽

Multiple Genomes ◽

Design And Construction

Abstract The recent advances in sequencing technologies enable the assembly of individual genomes to the quality of the reference genome. How to integrate multiple genomes from the same species and make the integrated representation accessible to biologists remains an open challenge. Here, we propose a graph-based data model and associated formats to represent multiple genomes while preserving the coordinate of the linear reference genome. We implement our ideas in the minigraph toolkit and demonstrate that we can efficiently construct a pangenome graph and compactly encode tens of thousands of structural variants missing from the current reference genome.

Download Full-text

Rare Genomic Structural Variants in Complex Disease: Lessons from the Replication of Associations with Obesity

PLoS ONE ◽

10.1371/journal.pone.0058048 ◽

2013 ◽

Vol 8 (3) ◽

pp. e58048 ◽

Cited By ~ 24

Author(s):

Robin G. Walters ◽

Lachlan J. M. Coin ◽

Aimo Ruokonen ◽

Adam J. de Smith ◽

Julia S. El-Sayed Moustafa ◽

...

Keyword(s):

Complex Disease ◽

Structural Variants ◽

Genomic Structural Variants

Download Full-text

Genomic structural variants involved in local adaptation of the European plaice

Peer Community in Evolutionary Biology ◽

10.24072/pci.evolbiol.100095 ◽

2020 ◽

pp. 100095

Author(s):

Maren Wellenreuther

Keyword(s):

Local Adaptation ◽

Structural Variants ◽

Genomic Structural Variants

Download Full-text

Atypical face shape and genomic structural variants in epilepsy

Brain ◽

10.1093/brain/aws232 ◽

2012 ◽

Vol 135 (10) ◽

pp. 3101-3114 ◽

Cited By ~ 12

Author(s):

Krishna Chinthapalli ◽

Emanuele Bartolini ◽

Jan Novy ◽

Michael Suttie ◽

Carla Marini ◽

...

Keyword(s):

Structural Variants ◽

Face Shape ◽

Genomic Structural Variants

Download Full-text

Highly-accurate long-read sequencing improves variant detection and assembly of a human genome

10.1101/519025 ◽

2019 ◽

Cited By ~ 27

Author(s):

Aaron M. Wenger ◽

Paul Peluso ◽

William J. Rowell ◽

Pi-Chuan Chang ◽

Richard J. Hall ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Short Reads ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Variant Detection ◽

High Quality Genome ◽

Circular Consensus Sequencing

AbstractThe major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.

Download Full-text

Network-based analysis of allele frequency distribution among multiple populations identifies adaptive genomic structural variants

10.1101/2021.01.25.428140 ◽

2021 ◽

Author(s):

Marie. Saitou ◽

Naoki Masuda ◽

Omer. Gokcumen

Keyword(s):

Negative Selection ◽

Evolutionary History ◽

Population Distribution ◽

Genomic Diversity ◽

Allele Frequency Distribution ◽

Evolutionary Models ◽

Structural Variants ◽

Considerable Impact ◽

Human Genomic ◽

Genomic Structural Variants

AbstractStructural variants have a considerable impact on human genomic diversity. However, their evolutionary history remains mostly unexplored. Here, we developed a new method to identify potentially adaptive structural variants based on a network-based analysis that incorporates genotype frequency data from 26 populations simultaneously. Using this method, we analyzed 57,629 structural variants and identified 577 structural variants that show high population distribution. We further showed that 39 and 20 of these putatively adaptive structural variants overlap with coding sequences or are significantly associated with GWAS traits, respectively. Closer inspection of the haplotypic variation associated with these putatively adaptive and functional structural variants reveals deviations from neutral expectations due to (i) population differentiation of rapidly evolving multi-allelic variants, (ii) incomplete sweeps, and (iii) recent population-specific negative selection. Overall, our study provides new methodological insights, documents hundreds of putatively adaptive variants, and introduces evolutionary models that may better explain the complex evolution of structural variants.

Download Full-text