Pedigree-based estimation of human mobile element retrotransposition rates

AbstractGermline mutation rates in humans have been estimated for a variety of mutation types, including single nucleotide and large structural variants. Here we directly measure the germline retrotransposition rate for the three active retrotransposon elements: L1, Alu, and SVA. We utilized three tools for calling Mobile Element Insertions (MEIs) (MELT, RUFUS, and TranSurVeyor) on blood-derived whole genome sequence (WGS) data from 603 CEPH individuals, comprising 33 three-generation pedigrees. We identified 27 de novo MEIs in 440 births. The retrotransposition rate estimates for Alu elements, one in 40, is roughly half the rate estimated using phylogenetic analyses, a difference in magnitude similar to that observed for single nucleotide variants. The L1 retrotransposition rate is one in 62 births and is within range of previous estimates (1:20-1:200 births). The SVA retrotransposition rate, one in 55 births, is much higher than the previous estimate of one in 900 births. Our large, three-generation pedigrees allowed us to assess parent-of-origin effects and the timing of insertion events in either gametogenesis or early embryonic development. We find a statistically significant paternal bias in Alu retrotransposition. Our study represents the first in-depth analysis of the rate and dynamics of human retrotransposition from WGS data in three-generation human pedigrees.

Download Full-text

Structural variants shape driver combinations and outcomes in pediatric high-grade glioma

10.21203/rs.3.rs-389596/v1 ◽

2021 ◽

Author(s):

Frank Dubois ◽

Ofer Shapira ◽

Noah Greenwald ◽

Travis Zack ◽

...

Keyword(s):

Tyrosine Kinases ◽

De Novo ◽

High Grade Glioma ◽

Tumor Evolution ◽

High Grade ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Pediatric Glioma ◽

Treatment Naïve

Abstract Pediatric high-grade gliomas (pHGGs), encompassing hemispheric and diffuse midline gliomas (DMGs), remain a devastating disease. The last decade has revealed oncogenic drivers including single nucleotide variants (SNVs) in histones. However, the contribution of structural variants (SVs) to gliomagenesis has not been systematically explored due to limitations in early SV analysis approaches. Using SV algorithms, we recently created, we analyzed SVs in whole-genome sequences of 179 pHGGs including a novel cohort of treatment naïve samples–the largest WGS cohort assembled in adult or pediatric glioma. The most recurrent SVs targeted MYC isoforms and receptor tyrosine kinases, including a novel SV amplifying a MYC enhancer in the lncRNA CCDC26 in 12% of DMGs and revealing a more central role for MYC in these cancers than previously known. Applying de novo SV signature discovery, we identified five signatures including three (SVsig1-3) involving primarily simple SVs, and two (SVsig4-5) involving complex, clustered SVs. These SV signatures associated with genetic variants that differed from what was observed for SV signatures in other cancers, suggesting different links to underlying biology. Tumors with simple SV signatures were TP53 wild-type but were enriched with alterations in TP53 pathway members PPM1D and MDM4. Complex signatures were associated with direct aberrations in TP53, CDKN2A, and RB1 early in tumor evolution, and with extrachromosomal amplicons that likely occurred later. All pHGGs exhibited at least one simple SV signature but complex SV signatures were primarily restricted to subsets of H3.3K27M DMGs and hemispheric pHGGs. Importantly, DMGs with the complex SV signatures SVsig4-5 were associated with shorter overall survival independent of histone type and TP53 status. These data inform the role and impact of SVs in gliomagenesis and mechanisms that shape them.

Download Full-text

A reference dataset of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree

10.1101/055541 ◽

2016 ◽

Cited By ~ 18

Author(s):

Michael A. Eberle ◽

Epameinondas Fritzilas ◽

Peter Krusche ◽

Morten Källberg ◽

Benjamin L. Moore ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Objective Assessment ◽

Variant Calling ◽

Whole Genome Sequence ◽

Reference Dataset ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Transmission Information

AbstractImprovement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalogue of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of seventeen individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased “platinum” variant catalogue of 4.7 million single nucleotide variants (SNVs) plus 0.7 million small (1-50bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and eleven children of this pedigree. Platinum genotypes are highly concordant with the current catalogue of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%), and add a validated truth catalogue that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission (“non-platinum”) revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.

Download Full-text

Re-evaluation of single nucleotide variants and identification of structural variants in a cohort of 45 sudden unexplained death cases

International Journal of Legal Medicine ◽

10.1007/s00414-021-02580-5 ◽

2021 ◽

Author(s):

Jacqueline Neubauer ◽

Shouyu Wang ◽

Giancarlo Russo ◽

Cordula Haas

Keyword(s):

Sudden Death ◽

Cardiac Diseases ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Sudden Unexplained Death ◽

Unexplained Death ◽

Pathogenic Variants ◽

The Impact ◽

Death Cases

AbstractSudden unexplained death (SUD) takes up a considerable part in overall sudden death cases, especially in adolescents and young adults. During the past decade, many channelopathy- and cardiomyopathy-associated single nucleotide variants (SNVs) have been identified in SUD studies by means of postmortem molecular autopsy, yet the number of cases that remain inconclusive is still high. Recent studies had suggested that structural variants (SVs) might play an important role in SUD, but there is no consensus on the impact of SVs on inherited cardiac diseases. In this study, we searched for potentially pathogenic SVs in 244 genes associated with cardiac diseases. Whole-exome sequencing and appropriate data analysis were performed in 45 SUD cases. Re-analysis of the exome data according to the current ACMG guidelines identified 14 pathogenic or likely pathogenic variants in 10 (22.2%) out of the 45 SUD cases, whereof 2 (4.4%) individuals had variants with likely functional effects in the channelopathy-associated genes SCN5A and TRDN and 1 (2.2%) individual in the cardiomyopathy-associated gene DTNA. In addition, 18 structural variants (SVs) were identified in 15 out of the 45 individuals. Two SVs with likely functional impairment were found in the coding regions of PDSS2 and TRPM4 in 2 SUD cases (4.4%). Both were identified as heterozygous deletions, which were confirmed by multiplex ligation-dependent probe amplification. In conclusion, our findings support that SVs could contribute to the pathology of the sudden death event in some of the cases and therefore should be investigated on a routine basis in suspected SUD cases.

Download Full-text

An integrative approach to investigate the respective roles of single-nucleotide variants and copy-number variants in Attention-Deficit/Hyperactivity Disorder

Scientific Reports ◽

10.1038/srep22851 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 9

Author(s):

Leandro de Araújo Lima ◽

Ana Cecília Feio-dos-Santos ◽

Sintia Iole Belangero ◽

Ary Gadelha ◽

Rodrigo Affonseca Bressan ◽

...

Keyword(s):

Attention Deficit Hyperactivity Disorder ◽

Attention Deficit ◽

Copy Number ◽

De Novo ◽

Copy Number Variants ◽

Integrative Approach ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Hyperactivity Disorder ◽

New Genes

Abstract Many studies have attempted to investigate the genetic susceptibility of Attention-Deficit/Hyperactivity Disorder (ADHD), but without much success. The present study aimed to analyze both single-nucleotide and copy-number variants contributing to the genetic architecture of ADHD. We generated exome data from 30 Brazilian trios with sporadic ADHD. We also analyzed a Brazilian sample of 503 children/adolescent controls from a High Risk Cohort Study for the Development of Childhood Psychiatric Disorders, and also previously published results of five CNV studies and one GWAS meta-analysis of ADHD involving children/adolescents. The results from the Brazilian trios showed that cases with de novo SNVs tend not to have de novo CNVs and vice-versa. Although the sample size is small, we could also see that various comorbidities are more frequent in cases with only inherited variants. Moreover, using only genes expressed in brain, we constructed two “in silico” protein-protein interaction networks, one with genes from any analysis, and other with genes with hits in two analyses. Topological and functional analyses of genes in this network uncovered genes related to synapse, cell adhesion, glutamatergic and serotoninergic pathways, both confirming findings of previous studies and capturing new genes and genetic variants in these pathways.

Download Full-text

Implications of Genetic Distance to Reference and De Novo Genome Assembly for Clinical Genomics in Africans

10.1101/2020.09.25.20201780 ◽

2020 ◽

Author(s):

Daniel Shriner ◽

Adebowale Adeyemo ◽

Charles Rotimi

Keyword(s):

Genetic Distance ◽

De Novo ◽

Reference Sequence ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

De Novo Genome Assembly ◽

Single Nucleotide ◽

Clinical Genomics ◽

Advantages And Disadvantages ◽

False Discovery

In clinical genomics, variant calling from short-read sequencing data typically relies on a pan-genomic, universal human reference sequence. A major limitation of this approach is that the number of reads that incorrectly map or fail to map increase as the reads diverge from the reference sequence. In the context of genome sequencing of genetically diverse Africans, we investigate the advantages and disadvantages of using a de novo assembly of the read data as the reference sequence in single sample calling. Conditional on sufficient read depth, the alignment-based and assembly-based approaches yielded comparable sensitivity and false discovery rates for single nucleotide variants when benchmarked against a gold standard call set. The alignment-based approach yielded coverage of an additional 270.8 Mb over which sensitivity was lower and the false discovery rate was higher. Although both approaches detected and missed clinically relevant variants, the assembly-based approach identified more such variants than the alignment-based approach. Of particular relevance to individuals of African descent, the assembly-based approach identified four heterozygous genotypes containing the sickle allele whereas the alignment-based approach identified no occurrences of the sickle allele. Variant annotation using dbSNP and gnomAD identified systematic biases in these databases due to underrepresentation of Africans. Using the counts of homozygous alternate genotypes from the alignment-based approach as a measure of genetic distance to the reference sequence GRCh38.p12, we found that the numbers of misassemblies, total variant sites, potentially novel single nucleotide variants (SNVs), and certain variant classes (e.g., splice acceptor variants, stop loss variants, missense variants, synonymous variants, and variants absent from gnomAD) were significantly correlated with genetic distance. In contrast, genomic coverage and other variant classes (e.g., ClinVar pathogenic or likely pathogenic variants, start loss variants, stop gain variants, splice donor variants, incomplete terminal codons, variants with CADD score ≥20) were not correlated with genetic distance. With improvement in coverage, the assembly-based approach can offer a viable alternative to the alignment-based approach, with the advantage that it can obviate the need to generate diverse human reference sequences or collections of alternate scaffolds.

Download Full-text

HGG-41. STRUCTURAL VARIANT DRIVERS IN PEDIATRIC HIGH-GRADE GLIOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa222.322 ◽

2020 ◽

Vol 22 (Supplement_3) ◽

pp. iii351-iii351

Author(s):

Frank Dubois ◽

Ofer Shapira ◽

Noah Greenwald ◽

Travis Zack ◽

Jessica W Tsai ◽

...

Keyword(s):

Copy Number ◽

High Grade Glioma ◽

High Grade ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Effector Domains ◽

Topologically Associating Domains ◽

Genome Wide ◽

Pediatric High Grade Glioma

Abstract BACKGROUND Driver single nucleotide variants (SNV) and somatic copy number aberrations (SCNA) of pediatric high-grade glioma (pHGGs), including Diffuse Midline Gliomas (DMGs) are characterized. However, structural variants (SVs) in pHGGs and the mechanisms through which they contribute to glioma formation have not been systematically analyzed genome-wide. METHODS Using SvABA for SVs as well as the latest pipelines for SCNAs and SNVs we analyzed whole-genome sequencing from 174 patients. This includes 60 previously unpublished samples, 43 of which are DMGs. Signature analysis allowed us to define pHGG groups with shared SV characteristics. Significantly recurring SV breakpoints and juxtapositions were identified with algorithms we recently developed and the findings were correlated with RNAseq and H3K27ac ChIPseq. RESULTS The SV characteristics in pHGG showed three groups defined by either complex, intermediate or simple signature activities. These associated with distinct combinations of known driver oncogenes. Our statistical analysis revealed recurring SVs in the topologically associating domains of MYCN, MYC, EGFR, PDGFRA & MET. These correlated with increased mRNA expression and amplification of H3K27ac peaks. Complex recurring amplifications showed characteristics of extrachromosomal amplicons and were enriched in coding SVs splitting protein regulatory from effector domains. Integrative analysis of all SCNAs, SNVs & SVs revealed patterns of characteristic combinations between potential drivers and signatures. This included two distinct groups of H3K27M DMGs with either complex or simple signatures and different combinations of associated variants. CONCLUSION Recurrent SVs associate with signatures shaped by an underlying process, which can lead to distinct mechanisms to activate the same oncogene.

Download Full-text

A large deletion in the COL2A1 gene expands the spectrum of pathogenic variants causing bulldog calf syndrome in cattle

Acta Veterinaria Scandinavica ◽

10.1186/s13028-020-00548-w ◽

2020 ◽

Vol 62 (1) ◽

Cited By ~ 1

Author(s):

Joana Gonçalves Pontes Jacinto ◽

Irene Monika Häfliger ◽

Anna Letko ◽

Cord Drögemüller ◽

Jørgen Steen Agerholm

Keyword(s):

De Novo ◽

Genetic Disorders ◽

Polymerase Chain Reaction Analysis ◽

Large Deletion ◽

Whole Genome Sequence ◽

De Novo Mutation ◽

Loss Of Function ◽

Single Nucleotide Variants ◽

Pathogenic Variants ◽

Base Pair Deletion

Abstract Background Congenital bovine chondrodysplasia, also known as bulldog calf syndrome, is characterized by disproportionate growth of bones resulting in a shortened and compressed body, mainly due to reduced length of the spine and the long bones of the limbs. In addition, severe facial dysmorphisms including palatoschisis and shortening of the viscerocranium are present. Abnormalities in the gene collagen type II alpha 1 chain (COL2A1) have been associated with some cases of the bulldog calf syndrome. Until now, six pathogenic single-nucleotide variants have been found in COL2A1. Here we present a novel variant in COL2A1 of a Holstein calf and provide an overview of the phenotypic and allelic heterogeneity of the COL2A1-related bulldog calf syndrome in cattle. Case presentation The calf was aborted at gestation day 264 and showed generalized disproportionate dwarfism, with a shortened compressed body and limbs, and dysplasia of the viscerocranium; a phenotype resembling bulldog calf syndrome due to an abnormality in COL2A1. Whole-genome sequence (WGS) data was obtained and revealed a heterozygous 3513 base pair deletion encompassing 10 of the 54 coding exons of COL2A1. Polymerase chain reaction analysis and Sanger sequencing confirmed the breakpoints of the deletion and its absence in the genomes of both parents. Conclusions The pathological and genetic findings were consistent with a case of “bulldog calf syndrome”. The identified variant causing the syndrome was the result of a de novo mutation event that either occurred post-zygotically in the developing embryo or was inherited because of low-level mosaicism in one of the parents. The identified loss-of-function variant is pathogenic due to COL2A1 haploinsufficiency and represents the first structural variant causing bulldog calf syndrome in cattle. Furthermore, this case report highlights the utility of WGS-based precise diagnostics for understanding congenital disorders in cattle and the need for continued surveillance for genetic disorders in cattle.

Download Full-text

Recent Advances in Understanding the Genetic Architecture of Autism

Annual Review of Genomics and Human Genetics ◽

10.1146/annurev-genom-121219-082309 ◽

2020 ◽

Vol 21 (1) ◽

pp. 289-304 ◽

Cited By ~ 1

Author(s):

Caroline M. Dias ◽

Christopher A. Walsh

Keyword(s):

Genetic Architecture ◽

De Novo ◽

Clinical Care ◽

Copy Number Variants ◽

Autism Spectrum ◽

Loss Of Function ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Recent Advances ◽

Insight Into

Recent advances in understanding the genetic architecture of autism spectrum disorder have allowed for unprecedented insight into its biological underpinnings. New studies have elucidated the contributions of a variety of forms of genetic variation to autism susceptibility. While the roles of de novo copy number variants and single-nucleotide variants—causing loss-of-function or missense changes—have been increasingly recognized and refined, mosaic single-nucleotide variants have been implicated more recently in some cases. Moreover, inherited variants (including common variants) and, more recently, rare recessive inherited variants have come into greater focus. Finally, noncoding variants—both inherited and de novo—have been implicated in the last few years. This work has revealed a convergence of diverse genetic drivers on common biological pathways and has highlighted the ongoing importance of increasing sample size and experimental innovation. Continuing to synthesize these genetic findings with functional and phenotypic evidence and translating these discoveries to clinical care remain considerable challenges for the field.

Download Full-text

Targeted long-read sequencing resolves complex structural variants and identifies missing disease-causing variants

10.1101/2020.11.03.365395 ◽

2020 ◽

Author(s):

Danny E. Miller ◽

Arvis Sulovari ◽

Tianyun Wang ◽

Hailey Loucks ◽

Kendra Hoekzema ◽

...

Keyword(s):

Copy Number ◽

Genetic Diagnosis ◽

Clinical Testing ◽

Structural Variants ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Pathogenic Variants ◽

Long Read ◽

Repeat Expansions ◽

Complex Structural

ABSTRACTBACKGROUNDDespite widespread availability of clinical genetic testing, many individuals with suspected genetic conditions do not have a precise diagnosis. This limits their opportunity to take advantage of state-of-the-art treatments. In such instances, testing sometimes reveals difficult-to-evaluate complex structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in specific genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted.METHODSTargeted long-read sequencing (T-LRS) was performed on 33 individuals using Read Until on the Oxford Nanopore platform. This method allowed us to computationally target up to 100 Mbp of sequence per experiment, resulting in an average of 20x coverage of target regions, a 500% increase over background. We analyzed patient DNA for pathogenic substitutions, structural variants, and methylation differences using a single data source.RESULTSThe effectiveness of T-LRS was validated by detecting all genomic aberrations, including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences, previously identified by prior clinical testing. In 6/7 individuals who had complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, which led, in one case, to a change in clinical management. In nine individuals with suspected Mendelian conditions who lacked a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in five and variants of uncertain significance in two others.CONCLUSIONST-LRS can accurately predict pathogenic copy number variants and triplet repeat expansions, resolve complex rearrangements, and identify single-nucleotide variants not detected by other technologies, including short-read sequencing. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority candidate genes and regions or to further evaluate complex clinical testing results. The application of T-LRS will likely increase the diagnostic rate of rare disorders.

Download Full-text

Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks

10.1101/2021.03.04.433952 ◽

2021 ◽

Author(s):

Kishwar Shafin ◽

Trevor Pesout ◽

Pi-Chuan Chang ◽

Maria Nattestad ◽

Alexey Kolesnikov ◽

...

Keyword(s):

De Novo ◽

Sequence Data ◽

Variant Calling ◽

High Accuracy ◽

Superior Performance ◽

Read Length ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Short Read ◽

Long Read

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read based phasing. Third-generation nanopore sequence data has demonstrated a long read length, but current interpretation methods for its novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline PEPPER-Margin-DeepVariant that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single nucleotide variant identification method at the whole genome-scale and produces high-quality single nucleotide variants in segmental duplications and low-mappability regions where short-read based genotyping fails. We show that our pipeline can provide highly-contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% to 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance than the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio-HiFi-polished).

Download Full-text