4478 Not just GLUT1: genome sequencing reveals genetic heterogeneity in Doose syndrome

OBJECTIVES/GOALS: Epilepsy with myoclonic-atonic seizures (EMAS) is a childhood onset epilepsy disorder characterized by seizures with sudden loss of posture, or drop seizures. Our objective was to use short-read genome sequencing in 40 EMAS trios to better understand variants contributing to the development of EMAS. METHODS/STUDY POPULATION: Eligibility for the cohort included a potential diagnosis of EMAS by child neurology faculty at Children’s Hospital Colorado. Exclusion criteria included lack of drop seizures upon chart review or structural abnormality on MRI. Some individuals had prior genetic testing and priority for genome sequencing was given to individuals without clear genetic diagnosis based on previous testing. We analyzed single nucleotide variants (SNVs), small insertions and deletions (INDELs), and larger structural variants (SVs) from trio genomes and determined those that were likely contributory based on standardized American College of Medical Genetics (ACMG) criteria. RESULTS/ANTICIPATED RESULTS: Our initial analysis focused on variants in coding regions of known epilepsy-associated genes. We identified pathogenic or likely pathogenic variants in 6 different individuals involving 6 unique genes. Of these, 5 are de novo SNVs or INDELs and 1 is a de novo SV. One of these involve a de novo heterozygous variant in an X-linked gene (ARHGEF9) in a female individual. We hypothesize the skewed X-inactivation may result in primarily expression of the pathogenic variant. We anticipate identifying additional candidate variants in coding regions of genes previously not associated with EMAS or pediatric epilepsies as well as in noncoding regions of the genome. DISCUSSION/SIGNIFICANCE OF IMPACT: Despite the genetic heterogeneity of EMAS, our initial analysis identified de novo pathogenic or likely pathogenic variants in 15% (6/40) of our cohort. As the cost continues to decline, short read genome sequencing represents a promising diagnostic tool for EMAS and other pediatric onset epilepsy syndromes. CONFLICT OF INTEREST DESCRIPTION: The authors have no conflicts of interest to disclose. SD has consulted for Upsher-Smith, Biomarin and Neurogene on an unrelated subject matter. GLC holds a research collaborative grant with Stoke therapeutics on unrelated subject matter.

Download Full-text

A combined RNA-seq and whole genome sequencing approach for identification of non-coding pathogenic variants in single families

Human Molecular Genetics ◽

10.1093/hmg/ddaa016 ◽

2020 ◽

Vol 29 (6) ◽

pp. 967-979 ◽

Cited By ~ 6

Author(s):

Revital Bronstein ◽

Elizabeth E Capowski ◽

Sudeep Mehrotra ◽

Alex D Jansen ◽

Daniel Navarro-Gomez ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Diagnosis ◽

Genetic Diagnostics ◽

Whole Genome ◽

Unique Challenge ◽

Coding Regions ◽

Pathogenic Variants ◽

Number Of Patients ◽

Coding Variants

Abstract Inherited retinal degenerations (IRDs) are at the focus of current genetic therapeutic advancements. For a genetic treatment such as gene therapy to be successful, an accurate genetic diagnostic is required. Genetic diagnostics relies on the assessment of the probability that a given DNA variant is pathogenic. Non-coding variants present a unique challenge for such assessments as compared to coding variants. For one, non-coding variants are present at much higher number in the genome than coding variants. In addition, our understanding of the rules that govern the non-coding regions of the genome is less complete than our understanding of the coding regions. Methods that allow for both the identification of candidate non-coding pathogenic variants and their functional validation may help overcome these caveats allowing for a greater number of patients to benefit from advancements in genetic therapeutics. We present here an unbiased approach combining whole genome sequencing (WGS) with patient-induced pluripotent stem cell (iPSC)-derived retinal organoids (ROs) transcriptome analysis. With this approach, we identified and functionally validated a novel pathogenic non-coding variant in a small family with a previously unresolved genetic diagnosis.

Download Full-text

New genes involved in Angelman syndrome-like: expanding the genetic spectrum

10.1101/2020.07.16.206052 ◽

2020 ◽

Author(s):

Cinthia Aguilera ◽

Elisabeth Gabau ◽

Ariadna Ramirez-Mallafré ◽

Carme Brun-Gasca ◽

Jana Dominguez-Carral ◽

...

Keyword(s):

Differential Diagnosis ◽

Genetic Heterogeneity ◽

Angelman Syndrome ◽

De Novo ◽

System Development ◽

Neuron System ◽

Pathogenic Variants ◽

New Genes ◽

Whole Exome ◽

Eeg Abnormalities

AbstractAngelman syndrome (AS) is a neurogenetic disorder characterized by severe developmental delay with absence of speech, happy disposition, frequent laughter, hyperactivity, stereotypies, ataxia and seizures with specific EEG abnormalities. There is a 10-15% of patients with an AS phenotype whose genetic cause remains unknown (Angelman-like syndrome, AS-like). Whole-exome sequencing (WES) was performed on a cohort of 14 patients with clinical features of AS and no molecular diagnosis. As a result, we identified 10 de novo and 1 X-linked pathogenic/likely pathogenic variants in 10 neurodevelopmental genes (SYNGAP1, VAMP2, TBL1XR1, ASXL3, SATB2, SMARCE1, SPTAN1, KCNQ3, SLC6A1 and LAS1L) and one deleterious de novo variant in a candidate gene (HSF2). Our results highlight the wide genetic heterogeneity in AS-like patients and expands the differential diagnosis. New AS-like genes do not interact directly with UBE3A gene product but are involved in synapsis and neuron system development.

Download Full-text

Long-read genome sequencing for the diagnosis of neurodevelopmental disorders

10.1101/2020.07.02.185447 ◽

2020 ◽

Author(s):

Susan M. Hiatt ◽

James M.J. Lawlor ◽

Lori H. Handley ◽

Ryne C. Ramaker ◽

Brianne B. Rogers ◽

...

Keyword(s):

Genome Sequencing ◽

Neurodevelopmental Disorders ◽

De Novo ◽

Genomic Analysis ◽

Data Sets ◽

Short Read ◽

Long Read ◽

Variant Detection ◽

Circular Consensus Sequencing

AbstractPurposeExome and genome sequencing have proven to be effective tools for the diagnosis of neurodevelopmental disorders (NDDs), but large fractions of NDDs cannot be attributed to currently detectable genetic variation. This is likely, at least in part, a result of the fact that many genetic variants are difficult or impossible to detect through typical short-read sequencing approaches.MethodsHere, we describe a genomic analysis using Pacific Biosciences circular consensus sequencing (CCS) reads, which are both long (>10 kb) and accurate (>99% bp accuracy). We used CCS on six proband-parent trios with NDDs that were unexplained despite extensive testing, including genome sequencing with short reads.ResultsWe identified variants and created de novo assemblies in each trio, with global metrics indicating these data sets are more accurate and comprehensive than those provided by short-read data. In one proband, we identified a likely pathogenic (LP), de novo L1-mediated insertion in CDKL5 that results in duplication of exon 3, leading to a frameshift. In a second proband, we identified multiple large de novo structural variants, including insertion-translocations affecting DGKB and MLLT3, which we show disrupt MLLT3 transcript levels. We consider this extensive structural variation likely pathogenic.ConclusionThe breadth and quality of variant detection, coupled to finding variants of clinical and research interest in two of six probands with unexplained NDDs strongly support the value of long-read genome sequencing for understanding rare disease.

Download Full-text

Whole-genome Sequencing Reveals De-novo Mutations Associated with Nonsyndromic Cleft Lip/Palate

10.21203/rs.3.rs-1064924/v1 ◽

2021 ◽

Author(s):

Waheed Awotoye ◽

Peter A. Mossey ◽

Jacqueline B. Hetmanski ◽

Lord Jephthah Joojo Gowans ◽

Mekonen A. Eshete ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Cleft Lip ◽

De Novo ◽

Association Studies ◽

Whole Genome ◽

Loss Of Function ◽

De Novo Mutations ◽

Pathogenic Variants ◽

Cleft Lip Palate

Abstract The majority (85%) of nonsyndromic cleft lip with or without cleft palate (nsCL/P) cases occur sporadically, suggesting a role for de novo mutations (DNMs) in the etiology of nsCL/P. To identify high impact DNMs that contribute to the risk of nsCL/P, we conducted whole genome sequencing (WGS) analyses in 130 African case-parent trios (affected probands and unaffected parents). We identified 162 high confidence protein-altering DNMs that contribute to the risk of nsCL/P. These include novel loss-of-function DNMs in the ACTL6A, ARHGAP10, MINK1, TMEM5 and TTN genes; as well as missense variants in ACAN, DHRS3, DLX6, EPHB2, FKBP10, KMT2D, RECQL4, SEMA3C, SEMA4D, SHH, TP63, and TULP4. Experimental evidence showed that ACAN, DHRS3, DLX6, EPHB2, FKBP10, KMT2D, MINK1, RECQL4, SEMA3C, SEMA4D, SHH, TP63, and TTN genes contribute to facial development and mutations in these genes could contribute to CL/P. Association studies have identified TULP4 as a potential cleft candidate gene, while ARHGAP10 interacts with CTNNB1 to control WNT signaling. DLX6, EPHB2, SEMA3C and SEMA4D harbor novel damaging DNMs that may affect their role in neural crest migration and palatal development. This discovery of pathogenic DNMs also confirms the power of WGS analysis of trios in the discovery of potential pathogenic variants.

Download Full-text

Evaluating the performance of a clinical genome sequencing program for diagnosis of rare genetic disease, seen through the lens of craniosynostosis

Genetics in Medicine ◽

10.1038/s41436-021-01297-5 ◽

2021 ◽

Author(s):

Zerin Hyder ◽

Eduardo Calpena ◽

Yang Pei ◽

Rebecca S. Tooze ◽

Helen Brittain ◽

...

Keyword(s):

Genome Sequencing ◽

Genetic Disease ◽

Research Team ◽

De Novo ◽

Diagnostic Sensitivity ◽

Structural Variants ◽

Rare Genetic Disease ◽

Pathogenic Variants ◽

Clinical Genome Sequencing ◽

Research Analysis

Abstract Purpose Genome sequencing (GS) for diagnosis of rare genetic disease is being introduced into the clinic, but the complexity of the data poses challenges for developing pipelines with high diagnostic sensitivity. We evaluated the performance of the Genomics England 100,000 Genomes Project (100kGP) panel-based pipelines, using craniosynostosis as a test disease. Methods GS data from 114 probands with craniosynostosis and their relatives (314 samples), negative on routine genetic testing, were scrutinized by a specialized research team, and diagnoses compared with those made by 100kGP. Results Sixteen likely pathogenic/pathogenic variants were identified by 100kGP. Eighteen additional likely pathogenic/pathogenic variants were identified by the research team, indicating that for craniosynostosis, 100kGP panels had a diagnostic sensitivity of only 47%. Measures that could have augmented diagnoses were improved calling of existing panel genes (+18% sensitivity), review of updated panels (+12%), comprehensive analysis of de novo small variants (+29%), and copy-number/structural variants (+9%). Recent NHS England recommendations that partially incorporate these measures should achieve 85% overall sensitivity (+38%). Conclusion GS identified likely pathogenic/pathogenic variants in 29.8% of previously undiagnosed patients with craniosynostosis. This demonstrates the value of research analysis and the importance of continually improving algorithms to maximize the potential of clinical GS.

Download Full-text

PlasmidSeeker: identification of known plasmids from bacterial whole genome sequencing reads

PeerJ ◽

10.7717/peerj.4588 ◽

2018 ◽

Vol 6 ◽

pp. e4588 ◽

Cited By ~ 26

Author(s):

Märt Roosaare ◽

Mikk Puustusmaa ◽

Märt Möls ◽

Mihkel Vaher ◽

Maido Remm

Keyword(s):

Antibiotic Resistance ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Sequence Data ◽

Source Code ◽

Complex Problem ◽

Whole Genome ◽

Short Read ◽

Plasmid Sequence

BackgroundPlasmids play an important role in the dissemination of antibiotic resistance, making their detection an important task. Using whole genome sequencing (WGS), it is possible to capture both bacterial and plasmid sequence data, but short read lengths make plasmid detection a complex problem.ResultsWe developed a tool named PlasmidSeeker that enables the detection of plasmids from bacterial WGS data without read assembly. The PlasmidSeeker algorithm is based onk-mers and usesk-mer abundance to distinguish between plasmid and bacterial sequences. We tested the performance of PlasmidSeeker on a set of simulated and real bacterial WGS samples, resulting in 100% sensitivity and 99.98% specificity.ConclusionPlasmidSeeker enables quick detection of known plasmids and complements existing tools that assemble plasmids de novo. The PlasmidSeeker source code is stored on GitHub:https://github.com/bioinfo-ut/PlasmidSeeker.

Download Full-text

Complex Structural Variants Resolved by Short-Read and Long-Read Whole Genome Sequencing in Mendelian Disorders

10.1101/281683 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alba Sanchis-Juan ◽

Jonathan Stephens ◽

Courtney E French ◽

Nicholas Gleadall ◽

Karyn Mégy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Genomic Variation ◽

Mendelian Disease ◽

Whole Genome ◽

Structural Variants ◽

Short Read ◽

Long Read ◽

Complex Structural

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.

Download Full-text

A statistical framework for mapping risk genes from de novo mutations in whole-genome sequencing studies

10.1101/077578 ◽

2016 ◽

Author(s):

Yuwen Liu ◽

Yanyu Liang ◽

A. Ercument Cicek ◽

Zhongshan Li ◽

Jinchen Li ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Major Advance ◽

Risk Genes ◽

Coding Sequences ◽

Coding Regions ◽

Statistical Framework

AbstractAnalysis of de novo mutations (DNMs) from sequencing data of nuclear families has identified risk genes for many complex diseases, including multiple neurodevelopmental and psychiatric disorders. Most of these efforts have focused on mutations in protein-coding sequences. Evidence from genome-wide association studies (GWAS) strongly suggests that variants important to human diseases often lie in non-coding regions. Extending DNM-based approaches to non-coding sequences is, however, challenging because the functional significance of non-coding mutations is difficult to predict. We propose a new statistical framework for analyzing DNMs from whole-genome sequencing (WGS) data. This method, TADA-Annotations (TADA-A), is a major advance of the TADA method we developed earlier for DNM analysis in coding regions. TADA-A is able to incorporate many functional annotations such as conservation and enhancer marks, learn from data which annotations are informative of pathogenic mutations and combine both coding and non-coding mutations at the gene level to detect risk genes. It also supports meta-analysis of multiple DNM studies, while adjusting for study-specific technical effects. We applied TADA-A to WGS data of ∼300 autism family trios across five studies, and discovered several new autism risk genes. The software is freely available for all research uses.

Download Full-text

De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data

Genes ◽

10.3390/genes9100486 ◽

2018 ◽

Vol 9 (10) ◽

pp. 486 ◽

Cited By ~ 22

Author(s):

Adam Ameur ◽

Huiwen Che ◽

Marcel Martin ◽

Ignas Bunikis ◽

Johan Dahlberg ◽

...

Keyword(s):

Genome Sequencing ◽

De Novo Assembly ◽

De Novo ◽

Variant Calling ◽

Whole Genome Sequencing Data ◽

Personal Genome ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Population Scale

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

Download Full-text

Chloroplast Genomes of Two Species of Cypripedium: Expanded Genome Size and Proliferation of AT-Biased Repeat Sequences

Frontiers in Plant Science ◽

10.3389/fpls.2021.609729 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yan-Yan Guo ◽

Jia-Xing Yang ◽

Hong-Kun Li ◽

Hu-Sheng Zhao

Keyword(s):

Genome Size ◽

De Novo ◽

Gc Content ◽

Single Copy ◽

Sequencing Data ◽

Short Read ◽

Coding Regions ◽

Repeat Sequences ◽

Sequencing Technologies ◽

Chloroplast Genomes

The size of the chloroplast genome (plastome) of autotrophic angiosperms is generally conserved. However, the chloroplast genomes of some lineages are greatly expanded, which may render assembling these genomes from short read sequencing data more challenging. Here, we present the sequencing, assembly, and annotation of the chloroplast genomes of Cypripedium tibeticum and Cypripedium subtropicum. We de novo assembled the chloroplast genomes of the two species with a combination of short-read Illumina data and long-read PacBio data. The plastomes of the two species are characterized by expanded genome size, proliferated AT-rich repeat sequences, low GC content and gene density, as well as low substitution rates of the coding genes. The plastomes of C. tibeticum (197,815 bp) and C. subtropicum (212,668 bp) are substantially larger than those of the three species sequenced in previous studies. The plastome of C. subtropicum is the longest one of Orchidaceae to date. Despite the increase in genome size, the gene order and gene number of the plastomes are conserved, with the exception of an ∼75 kb large inversion in the large single copy (LSC) region shared by the two species. The most striking is the record-setting low GC content in C. subtropicum (28.2%). Moreover, the plastome expansion of the two species is strongly correlated with the proliferation of AT-biased non-coding regions: the non-coding content of C. subtropicum is in excess of 57%. The genus provides a typical example of plastome expansion induced by the expansion of non-coding regions. Considering the pros and cons of different sequencing technologies, we recommend hybrid assembly based on long and short reads applied to the sequencing of plastomes with AT-biased base composition.

Download Full-text