Identification of Structural Variants in Two Novel Genomes of Maize Inbred Lines Possibly Related to Glyphosate Tolerance

To study genetic variations between genomes of plants that are naturally tolerant and sensitive to glyphosate, we used two Zea mays L. lines traditionally bred in Poland. To overcome the complexity of the maize genome, two sequencing technologies were employed: Illumina and Single Molecule Real-Time (SMRT) PacBio. Eleven thousand structural variants, 4 million SNPs and approximately 800 thousand indels differentiating the two genomes were identified. Detailed analyses allowed to identify 20 variations within the EPSPS gene, but all of them were predicted to have moderate or unknown effects on gene expression. Other genes of the shikimate pathway encoding bifunctional 3-dehydroquinate dehydratase/shikimate dehydrogenase and chorismate synthase were altered by variants predicted to have a high impact on gene expression. Additionally, high-impact variants located within the genes involved in the active transport of glyphosate through the cell membrane encoding phosphate transporters as well as multidrug and toxic compound extrusion have been identified.

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

Highly-accurate long-read sequencing improves variant detection and assembly of a human genome

10.1101/519025 ◽

2019 ◽

Cited By ~ 27

Author(s):

Aaron M. Wenger ◽

Paul Peluso ◽

William J. Rowell ◽

Pi-Chuan Chang ◽

Richard J. Hall ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Short Reads ◽

Sequencing Technologies ◽

Long Reads ◽

Long Read ◽

Variant Detection ◽

High Quality Genome ◽

Circular Consensus Sequencing

AbstractThe major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.

Download Full-text

SVIM: structural variant identification using mapped long reads

Bioinformatics ◽

10.1093/bioinformatics/btz041 ◽

2019 ◽

Vol 35 (17) ◽

pp. 2907-2915 ◽

Cited By ~ 32

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Single Molecule ◽

Simulated Data ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Structural Variants ◽

Human Phenotype ◽

Structural Variant ◽

Pacific Biosciences ◽

Sequencing Technologies ◽

Long Read

Abstract Motivation Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines. Availability and implementation The source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package Index. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SVIM: Structural Variant Identification using Mapped Long Reads

10.1101/494096 ◽

2018 ◽

Cited By ~ 2

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Single Molecule ◽

Simulated Data ◽

Structural Variants ◽

Human Phenotype ◽

Structural Variant ◽

Small Indels ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read

AbstractMotivationStructural variants are defined as genomic variants larger than 50bp. They have been shown to affect more bases in any given genome than SNPs or small indels. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities.ResultsWe present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from PacBio and Nanopore sequencing machines.Availability and implementationThe source code and executables of SVIM are available on Github: github.com/eldariont/svim. SVIM has been implemented in Python 3 and published on bioconda and the Python Package [email protected]

Download Full-text

Identification of genetic characteristics of maize (Zea mays L) using genetic markers

Zbornik Matice srpske za prirodne nauke ◽

10.2298/zmspn0201047z ◽

2002 ◽

pp. 47-56 ◽

Cited By ~ 1

Author(s):

Marija Zlokolica ◽

Mirjana Milosevic ◽

Zorica Nikolic ◽

Vladislava Galovic

Keyword(s):

Gene Expression ◽

Genetic Markers ◽

Zea Mays L ◽

Agronomic Traits ◽

Maize Genome ◽

Seed Traits ◽

Genetic Characteristics ◽

Seed Technology ◽

Breeding Material

Different genetic markers are used for estimation of breeding material, its characteristics and potential for ultimate aim - heterosis of hybrids. They also point out to the qualitative seed traits at the level of linkage with genes responsible for desirable agronomic traits. This program encompasses testing methodologies for the new seed technology. Genetic analysis of breeding material during certain phases is comprised of isozymic gene expression and degrees of their variability, but it is continued (in order to be evaluated) until determination of presence or absence of some genes existing or introduced for certain traits. Using combination of different molecular methods such as PCR, RAPD and AFLP based on polymorphism of DNA fragments, the definite aim - identification of newly created products of improvement is achieved. Testing of traits of breeding material, its genetic variability and diversity is the first stage in analysis of the maize genome. It is also the condition for determination of presence of certain genes, used for obtaining the ultimate aim - attest of identity of the genotype.

Download Full-text

Unusual patterns of genetic diversity and gene expression in the maize genome

10.31274/etd-180810-1013 ◽

2009 ◽

Author(s):

Li Li

Keyword(s):

Gene Expression ◽

Genetic Diversity ◽

Maize Genome

Download Full-text

Faculty Opinions recommendation of Single-molecule analysis of gene expression using two-color RNA labeling in live yeast.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717974583.793472027 ◽

2013 ◽

Author(s):

Rosemary Clyne

Keyword(s):

Gene Expression ◽

Single Molecule ◽

Rna Labeling ◽

Single Molecule Analysis ◽

Live Yeast

Download Full-text

Nebula: ultra-efficient mapping-free structural variant genotyper

Nucleic Acids Research ◽

10.1093/nar/gkab025 ◽

2021 ◽

Author(s):

Parsoa Khorsand ◽

Fereydoun Hormozdiari

Keyword(s):

Large Scale ◽

Structural Variants ◽

Sequencing Technologies ◽

Generic Framework ◽

Common Genetic Variants ◽

Order Of Magnitude ◽

Complex Events ◽

Comparable Accuracy ◽

Using Data ◽

Computational Resources

Abstract Large scale catalogs of common genetic variants (including indels and structural variants) are being created using data from second and third generation whole-genome sequencing technologies. However, the genotyping of these variants in newly sequenced samples is a nontrivial task that requires extensive computational resources. Furthermore, current approaches are mostly limited to only specific types of variants and are generally prone to various errors and ambiguities when genotyping complex events. We are proposing an ultra-efficient approach for genotyping any type of structural variation that is not limited by the shortcomings and complexities of current mapping-based approaches. Our method Nebula utilizes the changes in the count of k-mers to predict the genotype of structural variants. We have shown that not only Nebula is an order of magnitude faster than mapping based approaches for genotyping structural variants, but also has comparable accuracy to state-of-the-art approaches. Furthermore, Nebula is a generic framework not limited to any specific type of event. Nebula is publicly available at https://github.com/Parsoa/Nebula.

Download Full-text

Amplification-free gene expression analysis of formalin-fixed paraffin-embedded samples using scanning single-molecule counting

Analytical Biochemistry ◽

10.1016/j.ab.2021.114220 ◽

2021 ◽

Vol 625 ◽

pp. 114220

Author(s):

Hidetaka Nakata ◽

Mitsushiro Yamaguchi ◽

Takuya Hanashi ◽

Seiji Kondo ◽

Tetsuya Tanabe

Keyword(s):

Gene Expression ◽

Expression Analysis ◽

Single Molecule ◽

Gene Expression Analysis ◽

Formalin Fixed Paraffin ◽

Formalin Fixed Paraffin Embedded ◽

Free Gene ◽

Formalin Fixed

Download Full-text

Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding

Molecular Biology and Evolution ◽

10.1093/molbev/msab037 ◽

2021 ◽

Cited By ~ 1

Author(s):

Chen Cao ◽

Jingni He ◽

Lauren Mak ◽

Deshan Perera ◽

Devin Kwok ◽

...

Keyword(s):

Single Molecule ◽

Human Genetics ◽

Real Data ◽

Sequencing Technologies ◽

Bacterial Genomics ◽

Physical Linkage ◽

Pooled Sequencing ◽

Computational Reconstruction ◽

Host Genetic ◽

Host Evolution

Abstract DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or “haplotypes.” However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.

Download Full-text