scholarly journals Accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION

2018 ◽  
Author(s):  
Arne De Roeck ◽  
Wouter De Coster ◽  
Liene Bossaerts ◽  
Rita Cacace ◽  
Tim De Pooter ◽  
...  

AbstractTandem repeats (TRs) can cause disease through their length, sequence motif interruptions, and nucleotide modifications. For many TRs, however, these features are very difficult - if not impossible - to assess, requiring low-throughput and labor-intensive assays. One example is a VNTR in ABCA7 for which we recently discovered that expanded alleles strongly increase risk of Alzheimer’s disease. Here, we investigated the potential of long-read whole genome sequencing to surmount these challenges, using the high-throughput PromethION platform from Oxford Nanopore Technologies. To overcome the limitations of conventional base calling and alignment, we developed an algorithm to study the TR size and sequence directly on raw PromethION current data.We report the long-read sequencing of multiple human genomes (n = 11) using only a single sequencing run and flow cell per individual. With the use of fresh DNA extractions, DNA shearing to approximately 20kb and size selection, we obtained an average output of 70 gigabases (Gb) per flow cell, corresponding to a 21x genome coverage, and a maximum yield of 98 Gb (30x genome coverage). All ABCA7 VNTR alleles, including expansions up to 10,000 bases, were spanned by long sequencing reads, validated by Southern blotting. Classical approaches of TR length estimation suffered from low accuracy, low precision, DNA strand effects and/or inability to call pathogenic repeat expansions. In contrast, our novel NanoSatellite algorithm, which circumvents base calling by using dynamic time warping on raw PromethION current data, achieved more than 90% accuracy and high precision (5.6% relative standard deviation) of TR length estimation, and detected all clinically relevant repeat expansions. In addition, we identified alternative TR sequence motifs with high consistency, allowing determination of TR sequence and distinction of VNTR alleles with homozygous length.In conclusion, we validated the robustness of single-experiment whole genome long-read sequencing on PromethION, a prerequisite for application of long-read sequencing in the clinic. In addition, we outperformed Southern blotting, enabling improved characterization of the role of expanded ABCA7 VNTR alleles in Alzheimer’s disease, and opening new opportunities for TR research.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Weili Cai ◽  
Schyler Nunziata ◽  
John Rascoe ◽  
Michael J. Stulberg

AbstractHuanglongbing (HLB) is a worldwide deadly citrus disease caused by the phloem-limited bacteria ‘Candidatus Liberibacter asiaticus’ (CLas) vectored by Asian citrus psyllids. In order to effectively manage this disease, it is crucial to understand the relationship among the bacterial isolates from different geographical locations. Whole genome sequencing approaches will provide more precise molecular characterization of the diversity among populations. Due to the lack of in vitro culture, obtaining the whole genome sequence of CLas is still a challenge, especially for medium to low titer samples. Hundreds of millions of sequencing reads are needed to get good coverage of CLas from an HLB positive citrus sample. In order to overcome this limitation, we present here a new method, Agilent SureSelect XT HS target enrichment, which can specifically enrich CLas from a metagenomic sample while greatly reducing cost and increasing whole genome coverage of the pathogen. In this study, the CLas genome was successfully sequenced with 99.3% genome coverage and over 72X sequencing coverage from low titer tissue samples (equivalent to 28.52 Cq using Li 16 S qPCR). More importantly, this method also effectively captures regions of diversity in the CLas genome, which provides precise molecular characterization of different strains.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Arne De Roeck ◽  
Wouter De Coster ◽  
Liene Bossaerts ◽  
Rita Cacace ◽  
Tim De Pooter ◽  
...  

AbstractTechnological limitations have hindered the large-scale genetic investigation of tandem repeats in disease. We show that long-read sequencing with a single Oxford Nanopore Technologies PromethION flow cell per individual achieves 30× human genome coverage and enables accurate assessment of tandem repeats including the 10,000-bp Alzheimer’s disease-associated ABCA7 VNTR. The Guppy “flip-flop” base caller and tandem-genotypes tandem repeat caller are efficient for large-scale tandem repeat assessment, but base calling and alignment challenges persist. We present NanoSatellite, which analyzes tandem repeats directly on electric current data and improves calling of GC-rich tandem repeats, expanded alleles, and motif interruptions.


2018 ◽  
Author(s):  
Mark T. W. Ebbert ◽  
Stefan Farrugia ◽  
Jonathon Sens ◽  
Karen Jansen-West ◽  
Tania F. Gendron ◽  
...  

AbstractBackground: Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Results: Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinlON was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8x coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >800x coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >99% G4C2 content, though we cannot rule out small interruptions.Conclusions: Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.


Foods ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 2637
Author(s):  
Jolien D’aes ◽  
Marie-Alice Fraiture ◽  
Bert Bogaerts ◽  
Sigrid C. J. De Keersmaecker ◽  
Nancy H. C. Roosens ◽  
...  

Despite their presence being unauthorized on the European market, contaminations with genetically modified (GM) microorganisms have repeatedly been reported in diverse commercial microbial fermentation produce types. Several of these contaminations are related to a GM Bacillus velezensis used to synthesize a food enzyme protease, for which genomic characterization remains currently incomplete, and it is unknown whether these contaminations have a common origin. In this study, GM B. velezensis isolates from multiple food enzyme products were characterized by short- and long-read whole-genome sequencing (WGS), demonstrating that they harbor a free recombinant pUB110-derived plasmid carrying antimicrobial resistance genes. Additionally, single-nucleotide polymorphism (SNP) and whole-genome based comparative analyses showed that the isolates likely originate from the same parental GM strain. This study highlights the added value of a hybrid WGS approach for accurate genomic characterization of GMM (e.g., genomic location of the transgenic construct), and of SNP-based phylogenomic analysis for source-tracking of GMM.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Readman Chiu ◽  
Indhu-Shree Rajan-Babu ◽  
Jan M. Friedman ◽  
Inanc Birol

AbstractTandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.


2021 ◽  
Author(s):  
Xinzhuang Yang ◽  
Dingding Zhang ◽  
Pidong Li ◽  
Jingwen Niu ◽  
Dan Xu ◽  
...  

AbstractOculopharyngodistal myopathy is an adult-onset degenerative muscle disorder characterized by ptosis, ophthalmoplegia and weakness of the facial, pharyngeal and limb muscles. Trinucleotide repeat expansions in non-coding regions of LRP12, G1PC1and NOTCH2NLC were recently reported to be the etiologies for OPDM. However, a significant portion of OPDM patients still have unknown genetic causes. In this study, we performed long-read whole-genome sequencing in a large five-generation family of 156 individuals, including 22 patients diagnosed with typical OPDM and identified CGG repeat expansions in RILPL1 gene in all patients we tested while not in unaffected family members. Methylation analysis indicated that methylation levels of the RILPL1 gene were unaltered in OPDM patients, which was in consistent with previous reports. Our findings first provided evidences that RILPL1 were associated OPDM which we suggested as OPDM type 4.


Neurology ◽  
2021 ◽  
pp. 10.1212/WNL.0000000000012466
Author(s):  
Yun Su ◽  
Liyuan Fan ◽  
Changhe Shi ◽  
Tai Wang ◽  
Huimin Zheng ◽  
...  

Neurodegenerative diseases exhibit chronic progressive lesions in the central and peripheral nervous systems with unclear causes. The search for pathogenic mutations in human neurodegenerative diseases has benefited from massively parallel short-read sequencers. However, genomic regions, including repetitive elements, especially with high/low GC content, are far beyond the capability of conventional approaches. Recently, long-read single-molecule DNA sequencing technologies have emerged and enabled researchers to study genomes, transcriptomes, and metagenomes at unprecedented resolutions. The identification of novel mutations in unresolved neurodegenerative disorders, the characterization of causative repeat expansions, and the direct detection of epigenetic modifications on naive DNA by virtue of long-read sequencers will further expand our understanding of neurodegenerative diseases. In this paper, we review and compare two prevailing long-read sequencing technologies, Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), and discuss their applications in neurodegenerative diseases.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0260837
Author(s):  
Eric D. Wieben ◽  
Ross A. Aleff ◽  
Tommy A. Rinkoski ◽  
Keith H. Baratz ◽  
Shubham Basu ◽  
...  

Expansion of CTG trinucleotide repeats (TNR) in the transcription factor 4 (TCF4) gene is highly associated with Fuchs Endothelial Corneal Dystrophy (FECD). Due to limitations in the availability of DNA from diseased corneal endothelium, sizing of CTG repeats in FECD patients has typically been determined using DNA samples isolated from peripheral blood leukocytes. However, it is non-feasible to extract enough DNA from surgically isolated FECD corneal endothelial tissue to determine repeat length based on current technology. To circumvent this issue, total RNA was isolated from FECD corneal endothelium and sequenced using long-read sequencing. Southern blotting of DNA samples isolated from primary cultures of corneal endothelium from these same affected individuals was also assessed. Both long read sequencing and Southern blot analysis showed significantly longer CTG TNR expansion (>1000 repeats) in the corneal endothelium from FECD patients than those characterized in leukocytes from the same individuals (<90 repeats). Our findings suggest that the TCF4 CTG repeat expansions in the FECD corneal endothelium are much longer than those found in leukocytes.


Author(s):  
Giulia Ascari ◽  
Nanna D. Rendtorff ◽  
Marieke De Bruyne ◽  
Julie De Zaeytijd ◽  
Michel Van Lint ◽  
...  

Inactivating variants as well as a missense variant in the centrosomal CEP78 gene have been identified in autosomal recessive cone-rod dystrophy with hearing loss (CRDHL), a rare syndromic inherited retinal disease distinct from Usher syndrome. Apart from this, a complex structural variant (SV) implicating CEP78 has been reported in CRDHL. Here we aimed to expand the genetic architecture of typical CRDHL by the identification of complex SVs of the CEP78 region and characterization of their underlying mechanisms. Approaches used for the identification of the SVs are shallow whole-genome sequencing (sWGS) combined with quantitative polymerase chain reaction (PCR) and long-range PCR, or ExomeDepth analysis on whole-exome sequencing (WES) data. Targeted or whole-genome nanopore long-read sequencing (LRS) was used to delineate breakpoint junctions at the nucleotide level. For all SVs cases, the effect of the SVs on CEP78 expression was assessed using quantitative PCR on patient-derived RNA. Apart from two novel canonical CEP78 splice variants and a frameshifting single-nucleotide variant (SNV), two SVs affecting CEP78 were identified in three unrelated individuals with CRDHL: a heterozygous total gene deletion of 235 kb and a partial gene deletion of 15 kb in a heterozygous and homozygous state, respectively. Assessment of the molecular consequences of the SVs on patient’s materials displayed a loss-of-function effect. Delineation and characterization of the 15-kb deletion using targeted LRS revealed the previously described complex CEP78 SV, suggestive of a recurrent genomic rearrangement. A founder haplotype was demonstrated for the latter SV in cases of Belgian and British origin, respectively. The novel 235-kb deletion was delineated using whole-genome LRS. Breakpoint analysis showed microhomology and pointed to a replication-based underlying mechanism. Moreover, data mining of bulk and single-cell human and mouse transcriptional datasets, together with CEP78 immunostaining on human retina, linked the CEP78 expression domain with its phenotypic manifestations. Overall, this study supports that the CEP78 locus is prone to distinct SVs and that SV analysis should be considered in a genetic workup of CRDHL. Finally, it demonstrated the power of sWGS and both targeted and whole-genome LRS in identifying and characterizing complex SVs in patients with ocular diseases.


2021 ◽  
Vol 12 ◽  
Author(s):  
Valentina Grosso ◽  
Luca Marcolungo ◽  
Simone Maestri ◽  
Massimiliano Alfano ◽  
Denise Lavezzari ◽  
...  

Traditional methods for the analysis of repeat expansions, which underlie genetic disorders, such as fragile X syndrome (FXS), lack single-nucleotide resolution in repeat analysis and the ability to characterize causative variants outside the repeat array. These drawbacks can be overcome by long-read and short-read sequencing, respectively. However, the routine application of next-generation sequencing in the clinic requires target enrichment, and none of the available methods allows parallel analysis of long-DNA fragments using both sequencing technologies. In this study, we investigated the use of indirect sequence capture (Xdrop technology) coupled to Nanopore and Illumina sequencing to characterize FMR1, the gene responsible of FXS. We achieved the efficient enrichment (&gt; 200×) of large target DNA fragments (~60–80 kbp) encompassing the entire FMR1 gene. The analysis of Xdrop-enriched samples by Nanopore long-read sequencing allowed the complete characterization of repeat lengths in samples with normal, pre-mutation, and full mutation status (&gt; 1 kbp), and correctly identified repeat interruptions relevant for disease prognosis and transmission. Single-nucleotide variants (SNVs) and small insertions/deletions (indels) could be detected in the same samples by Illumina short-read sequencing, completing the mutational testing through the identification of pathogenic variants within the FMR1 gene, when no typical CGG repeat expansion is detected. The study successfully demonstrated the parallel analysis of repeat expansions and SNVs/indels in the FMR1 gene at single-nucleotide resolution by combining Xdrop enrichment with two next-generation sequencing approaches. With the appropriate optimization necessary for the clinical settings, the system could facilitate both the study of genotype–phenotype correlation in FXS and enable a more efficient diagnosis and genetic counseling for patients and their relatives.


Sign in / Sign up

Export Citation Format

Share Document