Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease

AbstractBackground: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinlON was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8x coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >800x coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >99% G4C2 content, though we cannot rule out small interruptions.Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.

Download Full-text

Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases

Journal of Human Genetics ◽

10.1038/s10038-018-0551-7 ◽

2018 ◽

Vol 64 (3) ◽

pp. 191-197 ◽

Cited By ~ 12

Author(s):

Takeshi Mizuguchi ◽

Tomoko Toyota ◽

Hiroaki Adachi ◽

Noriko Miyake ◽

Naomichi Matsumoto ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Repeat Expansion ◽

Whole Genome ◽

Smrt Sequencing ◽

Long Read

Download Full-text

Whole-Genome Sequencing of a Human Clinical Isolate of the Novel Species Klebsiella quasivariicola sp. nov

Genome Announcements ◽

10.1128/genomea.01057-17 ◽

2017 ◽

Vol 5 (42) ◽

Cited By ~ 15

Author(s):

S. Wesley Long ◽

Sarah E. Linson ◽

Matthew Ojeda Saavedra ◽

Concepcion Cantu ◽

James J. Davis ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Clinical Isolate ◽

Genome Sequencing ◽

Novel Species ◽

Whole Genome ◽

The Novel ◽

Short Read ◽

Oxford Nanopore ◽

Long Read

ABSTRACT In a study of 1,777 Klebsiella strains, we discovered KPN1705, which was distinct from all recognized Klebsiella spp. We closed the genome of strain KPN1705 using a hybrid of Illumina short-read and Oxford Nanopore long-read technologies. For this novel species, we propose the name Klebsiella quasivariicola sp. nov.

Download Full-text

Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences

Genome Biology ◽

10.1186/s13059-021-02447-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Readman Chiu ◽

Indhu-Shree Rajan-Babu ◽

Jan M. Friedman ◽

Inanc Birol

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Tandem Repeat ◽

Neurological Disorders ◽

Software Tool ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Long Read ◽

Repeat Expansions

AbstractTandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.

Download Full-text

Computational Methods for Chromosome-Scale Haplotype Reconstruction

10.20944/preprints202101.0116.v1 ◽

2021 ◽

Author(s):

Shilpa Garg

Keyword(s):

Genetic Variation ◽

Whole Genome ◽

Haplotype Reconstruction ◽

High Quality ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Evolutionary Studies ◽

Long Read ◽

Haplotype Information

High-quality chromosome-scale haplotype sequences— of diploid genomes, polyploid genomes and metagenomes — provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information that spans whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent methodological progress in these areas and discuss perspectives that could enable routine high-quality haplotype reconstruction in clinical and evolutionary studies.

Download Full-text

Computational methods for chromosome-scale haplotype reconstruction

Genome Biology ◽

10.1186/s13059-021-02328-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shilpa Garg

Keyword(s):

Genetic Variation ◽

Computational Methods ◽

Whole Genome ◽

Haplotype Reconstruction ◽

High Quality ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Long Read ◽

Haplotype Information

AbstractHigh-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.

Download Full-text

Deciphering complex genome rearrangements in C. elegans using short-read whole genome sequencing

Scientific Reports ◽

10.1038/s41598-021-97764-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tatiana Maroilley ◽

Xiao Li ◽

Matthew Oldach ◽

Francesca Jean ◽

Susan J. Stasiuk ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Chromosomal Rearrangements ◽

Large Deletion ◽

Genomic Rearrangements ◽

Model Organisms ◽

Whole Genome ◽

Short Read ◽

C Elegans ◽

Long Read

AbstractGenomic rearrangements cause congenital disorders, cancer, and complex diseases in human. Yet, they are still understudied in rare diseases because their detection is challenging, despite the advent of whole genome sequencing (WGS) technologies. Short-read (srWGS) and long-read WGS approaches are regularly compared, and the latter is commonly recommended in studies focusing on genomic rearrangements. However, srWGS is currently the most economical, accurate, and widely supported technology. In Caenorhabditis elegans (C. elegans), such variants, induced by various mutagenesis processes, have been used for decades to balance large genomic regions by preventing chromosomal crossover events and allowing the maintenance of lethal mutations. Interestingly, those chromosomal rearrangements have rarely been characterized on a molecular level. To evaluate the ability of srWGS to detect various types of complex genomic rearrangements, we sequenced three balancer strains using short-read Illumina technology. As we experimentally validated the breakpoints uncovered by srWGS, we showed that, by combining several types of analyses, srWGS enables the detection of a reciprocal translocation (eT1), a free duplication (sDp3), a large deletion (sC4), and chromoanagenesis events. Thus, applying srWGS to decipher real complex genomic rearrangements in model organisms may help designing efficient bioinformatics pipelines with systematic detection of complex rearrangements in human genomes.

Download Full-text

Complex Structural Variants Resolved by Short-Read and Long-Read Whole Genome Sequencing in Mendelian Disorders

10.1101/281683 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alba Sanchis-Juan ◽

Jonathan Stephens ◽

Courtney E French ◽

Nicholas Gleadall ◽

Karyn Mégy ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

De Novo ◽

Genomic Variation ◽

Mendelian Disease ◽

Whole Genome ◽

Structural Variants ◽

Short Read ◽

Long Read ◽

Complex Structural

AbstractComplex structural variants (cxSVs) are genomic rearrangements comprising multiple structural variants, typically involving three or more breakpoint junctions. They contribute to human genomic variation and can cause Mendelian disease, however they are not typically considered during genetic testing. Here, we investigate the role of cxSVs in Mendelian disease using short-read whole genome sequencing (WGS) data from 1,324 individuals with neurodevelopmental or retinal disorders from the NIHR BioResource project. We present four cases of individuals with a cxSV affecting Mendelian disease-associated genes. Three of the cxSVs are pathogenic: a de novo duplication-inversion-inversion-deletion affecting ARID1B in an individual with Coffin-Siris syndrome, a deletion-inversion-duplication affecting HNRNPU in an individual with intellectual disability and seizures, and a homozygous deletion-inversion-deletion affecting CEP78 in an individual with cone-rod dystrophy. Additionally, we identified a de novo duplication-inversion-duplication overlapping CDKL5 in an individual with neonatal hypoxic-ischaemic encephalopathy. Long-read sequencing technology used to resolve the breakpoints demonstrated the presence of both a disrupted and an intact copy of CDKL5 on the same allele; therefore, it was classified as a variant of uncertain significance. Analysis of sequence flanking all breakpoint junctions in all the cxSVs revealed both microhomology and longer repetitive sequences, suggesting both replication and homology based processes. Accurate resolution of cxSVs is essential for clinical interpretation, and here we demonstrate that long-read WGS is a powerful technology by which to achieve this. Our results show cxSVs are an important although rare cause of Mendelian disease, and we therefore recommend their consideration during research and clinical investigations.

Download Full-text

Whole genome sequencing for diagnosis of neurological repeat expansion disorders

10.1101/2020.11.06.371716 ◽

2020 ◽

Author(s):

Kristina Ibanez ◽

James Polke ◽

Tanner Hagelstrom ◽

Egor Dolzhenko ◽

Dorota Pasko ◽

...

Keyword(s):

Family History ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Neurological Disorders ◽

Repeat Expansion ◽

Whole Genome ◽

Dna Repeats ◽

Paediatric Patients ◽

Health And Social Care ◽

Repeat Expansions

ABSTRACTBackgroundRepeat expansion (RE) disorders affect ~1 in 3000 individuals and are clinically heterogeneous diseases caused by expansions of short tandem DNA repeats. Genetic testing is often locus-specific, resulting in under diagnosis of atypical clinical presentations, especially in paediatric patients without a prior positive family history. Whole genome sequencing (WGS) is emerging as a first-line test for rare genetic disorders, but until recently REs were thought to be undetectable by this approach.MethodsWGS pipelines for RE disorder detection were deployed by the 100,000 Genomes Project and Illumina Clinical Services Laboratory. Performance was retrospectively assessed across the 13 most common neurological RE loci using 793 samples with prior orthogonal testing (182 with expanded alleles and 611 with alleles within normal size) and prospectively interrogated in 13,331 patients with suspected genetic neurological disorders.FindingsWGS RE detection showed minimum 97·3% sensitivity and 99·6% specificity across all 13 disease-associated loci. Applying the pipeline to patients from the 100,000 Genomes Project identified pathogenic repeat expansions which were confirmed in 69 patients, including seven paediatric patients with no reported family history of RE disorders, with a 0.09% false positive rate.InterpretationWe show here for the first time that WGS enables the detection of causative repeat expansions with high sensitivity and specificity, and that it can be used to resolve previously undiagnosed neurological disorders. This includes children with no prior suspicion of a RE disorder. These findings are leading to diagnostic implementation of this analytical pipeline in the NHS Genomic Medicine Centres in England.FundingMedical Research Council, Department of Health and Social Care, National Health Service England, National Institute for Health Research, Illumina Inc

Download Full-text

Comprehensive analysis of GBA using a novel algorithm for Illumina whole-genome sequence data or targeted Nanopore sequencing

10.1101/2021.11.12.21266253 ◽

2021 ◽

Author(s):

Marco Toffoli ◽

Xiao Chen ◽

Fritz J Sedlazeck ◽

Chiao-Yin Lee ◽

Stephen Mullin ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Sequence Data ◽

Whole Genome Sequence ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Short Read ◽

Increased Risk ◽

Long Read

GBA variants cause the autosomal recessive Gaucher disease, and carriers are at increased risk of Parkinson disease (PD) and Lewy body dementia (LBD). The presence of a highly homologous nearby pseudogene (GBAP1) predisposes to a range of structural variants arising from either gene conversion or reciprocal recombination, the latter resulting in copy number gains or losses, complicating genetic testing and analysis. To date, short-read sequencing has not been able to fully resolve these or other variants in the key homology region, and targeted long-read sequencing has not previously resolved reciprocal recombinants. We present and validate two independent methods to resolve recombinant alleles and other variants in GBA: Gauchian, a novel bioinformatics tool for short-read, whole-genome sequencing data analysis, and Oxford Nanopore long-read sequencing after enrichment with appropriate PCR. The methods were concordant for 42 samples including 30 with a range of recombinants and GBAP1-related mutations, and Gauchian outperforms the GATK Best Practices pipeline. Applying Gauchian to Illumina sequencing of over 10,000 individuals from publicly available cohorts shows that copy number variants (CNVs) spanning GBAP1 are relatively common in Africans. CNV frequencies in PD and LBD are similar to controls, but gains may coexist with other mutations in patients, and a modifying effect cannot be excluded. Gauchian detects a higher frequency of GBA variants in LBD than PD, especially severe ones. These findings highlight the importance of accurate GBA mutation detection in these patients, which is possible by either Gauchian analysis of short-read whole genome sequencing, or targeted long-read sequencing.

Download Full-text

Potential of whole-genome sequencing-based pharmacogenetic profiling

Pharmacogenomics ◽

10.2217/pgs-2020-0155 ◽

2021 ◽

Author(s):

Sylvan Manuel Caspar ◽

Timo Schneider ◽

Patricia Stoll ◽

Janine Meienberg ◽

Gabor Matyas

Keyword(s):

Next Generation Sequencing ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome ◽

Next Generation ◽

Short Read ◽

Long Read ◽

Improving Patient Care ◽

Gene Diagnostics ◽

Generation Sequencing

Pharmacogenetics represents a major driver of precision medicine, promising individualized drug selection and dosing. Traditionally, pharmacogenetic profiling has been performed using targeted genotyping that focuses on common/known variants. Recently, whole-genome sequencing (WGS) is emerging as a more comprehensive short-read next-generation sequencing approach, enabling both gene diagnostics and pharmacogenetic profiling, including rare/novel variants, in a single assay. Using the example of the pharmacogene CYP2D6, we demonstrate the potential of WGS-based pharmacogenetic profiling as well as emphasize the limitations of short-read next-generation sequencing. In the near future, we envision a shift toward long-read sequencing as the predominant method for gene diagnostics and pharmacogenetic profiling, providing unprecedented data quality and improving patient care.

Download Full-text