Nanopore sequencing reveals endogenous NMD-targeted isoforms in human cells

Background: Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. Results: To identify and analyze endogenous targets of NMD, we applied cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identified, most derive from alternative exon usage. The isoform-aware analysis revealed many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD still seems to be ridding the transcriptome of isoforms resulting from spurious splicing events. Conclusions: Long-read sequencing enabled the identification of many novel NMD-sensitive mRNAs and revealed both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.

Download Full-text

Nanopore sequencing reveals endogenous NMD-targeted isoforms in human cells

Genome Biology ◽

10.1186/s13059-021-02439-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Evangelos D. Karousis ◽

Foivos Gypas ◽

Mihaela Zavolan ◽

Oliver Mühlemann

Keyword(s):

Regulation Of Gene Expression ◽

Degradation Pathway ◽

Human Cells ◽

Full Length ◽

Main Function ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Short Read ◽

Expression Levels ◽

Short Read Sequencing

Abstract Background Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. Results To identify and analyze endogenous targets of NMD, we apply cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identifies, most derive from alternative exon usage. The isoform-aware analysis reveals many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD seems to be ridding the transcriptome of isoforms resulting from spurious splicing events. Conclusions Long-read sequencing enables the identification of many novel NMD-sensitive mRNAs and reveals both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.

Download Full-text

High resolution copy number inference in cancer using short-molecule nanopore sequencing

10.1101/2020.12.28.424602 ◽

2020 ◽

Author(s):

Timour Baslan ◽

Sam Kovaka ◽

Fritz J. Sedlazeck ◽

Yanming Zhang ◽

Robert Wappel ◽

...

Keyword(s):

Copy Number ◽

Cost Effective ◽

Chromosome Analysis ◽

Ease Of Use ◽

Precision Oncology ◽

Nanopore Sequencing ◽

Dna Molecules ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing

ABSTRACTGenome copy number is an important source of genetic variation in health and disease. In cancer, clinically actionable Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in terms of the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms. This represents a challenge for applications that require high read counts such as CNA inference. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that sequencing short DNA molecules reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for the sequencing of relatively short DNA molecules on nanopore devices with applications in research and medicine, that include but are not limited to, CNAs.

Download Full-text

Detection of Clinically Relevant Molecular Alterations in Chronic Lymphocytic Leukemia (CLL) By Nanopore Sequencing

Blood ◽

10.1182/blood-2018-99-110948 ◽

2018 ◽

Vol 132 (Supplement 1) ◽

pp. 1847-1847 ◽

Cited By ~ 1

Author(s):

Adam Burns ◽

David Robert Bruce ◽

Pauline Robbe ◽

Adele Timbs ◽

Basile Stamatopoulos ◽

...

Keyword(s):

Error Correction ◽

Low Cost ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Mutation Status ◽

Short Read ◽

Short Read Sequencing ◽

Oxford Nanopore ◽

Low Coverage ◽

Oxford Nanopore Technologies

Abstract Introduction Chronic Lymphocytic Leukaemia (CLL) is the most prevalent leukaemia in the Western world and characterised by clinical heterogeneity. IgHV mutation status, mutations in the TP53 gene and deletions of the p-arm of chromosome 17 are currently used to predict an individual patient's response to therapy and give an indication as to their long-term prognosis. Current clinical guidelines recommend screening patients prior to initial, and any subsequent, treatment. Routine clinical laboratory practices for CLL involve three separate assays, each of which are time-consuming and require significant investment in equipment. Nanopore sequencing offers a rapid, low-cost alternative, generating a full prognostic dataset on a single platform. In addition, Nanopore sequencing also promises low failure rates on degraded material such as FFPE and excellent detection of structural variants due to long read length of sequencing. Importantly, Nanopore technology does not require expensive equipment, is low-maintenance and ideal for patient-near testing, making it an attractive DNA sequencing device for low-to-middle-income countries. Methods Eleven untreated CLL samples were selected for the analysis, harbouring both mutated (n=5) and unmutated (n=6) IgHV genes, seven TP53 mutations (five missense, one stop gain and one frameshift) and two del(17p) events. Primers were designed to amplify all exons of TP53, along with the IgHV locus, and each primer included universal tails for individual sample barcoding. The resulting PCR amplicons were prepared for sequencing using a ligation sequencing kit (SQK-LSK108, Oxford Nanopore Technologies, Oxford, UK). All IgHV libraries were pooled and sequenced on one R9.4 flowcell, with the TP53 libraries pooled and sequenced on a second R9.4 flowcell. Whole genome libraries were prepared from 400ng genomic DNA for each sample using a rapid sequencing kit (SQK-RAD004, Oxford Nanopore Technologies, Oxford, UK), and each sample sequenced on individual flowcells on a MinION mk1b instrument (Oxford Nanopore Technologies, Oxford, UK). We developed a bespoke bioinformatics pipeline to detect copy-number changes, TP53 mutations and IgHV mutation status from the Nanopore sequencing data. Results were compared to short-read sequencing data obtained earlier by targeted deep sequencing (MiSeq, Illumina Inc, San Diego, CA, USA) and whole genome sequencing (HiSeq 2500, Illumina Inc, San Diego CA, USA). Results Following basecalling and adaptor trimming, the raw data were submitted to the IMGT database. In the absence of error correction, it was possible to identify the correct VH family for each sample; however the germline homology was not sufficient to differentiate between IgHVmut and IgHVunmut CLL cases. Following bio-informatic error correction and consensus building, the percentage to germline homology was the same as that obtained from short-read sequencing and nanopore sequencing also called the same productive rearrangements in all cases. A total of 77 TP53 variants were identified, including 68 in non-coding regions, and three synonymous SNVs. The remaining 6 were predicted to be functional variants (eight missense and two stop-gains) and had all been identified in early MiSeq targeted sequencing. However, the frameshift mutation was not called by the analysis pipeline, although it is present in the aligned reads. Using the low-coverage WGS data, we were able to identify del(17p) events, of 19Mb and 20Mb length, in both patients with high confidence. Conclusions Here we demonstrate that characterization of the IgHV locus in CLL cases is possible using the MinION platform, provided sufficient downstream analysis, including error correction, is applied. Furthermore, somatic SNVs in TP53 can be identified, although similar to second generation sequencing, variant calling of small insertions and deletions is more problematic. Identification of del(17p) is possible from low-coverage WGS on the MinION and is inexpensive. Our data demonstrates that Nanopore sequencing can be a viable, patient-near, low-cost alternative to established screening methods, with the potential of diagnostic implementation in resource-poor regions of the world. Disclosures Schuh: Giles, Roche, Janssen, AbbVie: Honoraria.

Download Full-text

EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data

Genome Biology ◽

10.1186/gb-2011-12-5-r44 ◽

2011 ◽

Vol 12 (5) ◽

Cited By ~ 232

Author(s):

Christopher S Miller ◽

Brett J Baker ◽

Brian C Thomas ◽

Steven W Singer ◽

Jillian F Banfield

Keyword(s):

Microbial Community ◽

Ribosomal Genes ◽

Full Length ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing

Download Full-text

High resolution copy number inference in cancer using short-molecule nanopore sequencing

Nucleic Acids Research ◽

10.1093/nar/gkab812 ◽

2021 ◽

Author(s):

Timour Baslan ◽

Sam Kovaka ◽

Fritz J Sedlazeck ◽

Yanming Zhang ◽

Robert Wappel ◽

...

Keyword(s):

Copy Number ◽

Cost Effective ◽

Chromosome Analysis ◽

Ease Of Use ◽

Precision Oncology ◽

Nanopore Sequencing ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Platforms

Abstract Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.

Download Full-text

Biosynthetic potential of uncultured Antarctic soil bacteria revealed through long-read metagenomic sequencing

The ISME Journal ◽

10.1038/s41396-021-01052-3 ◽

2021 ◽

Author(s):

Valentin Waschulin ◽

Chiara Borsetto ◽

Robert James ◽

Kevin K. Newsham ◽

Stefano Donadio ◽

...

Keyword(s):

Genome Mining ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Full Length ◽

Metagenomic Sequencing ◽

Short Read ◽

Short Read Sequencing ◽

Rich Diversity ◽

Long Read ◽

The Rich

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.

Download Full-text

REscan: inferring repeat expansions and structural variation in paired-end short read sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa753 ◽

2020 ◽

Author(s):

Russell Lewis McLaughlin

Keyword(s):

Structural Variation ◽

Sequence Data ◽

Neurological Diseases ◽

Repeat Expansion ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Repeat Expansions ◽

Paired End Sequencing

Abstract Motivation Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing accurate and unique alignments for repetitive sequence. However, this latter property can be harnessed in paired-end sequencing data to infer the possible locations of repeat expansions and other structural variation. Results This article presents REscan, a command-line utility that infers repeat expansion loci from paired-end short read sequencing data by reporting the proportion of reads orientated towards a locus that do not have an adequately mapped mate. A high REscan statistic relative to a population of data suggests a repeat expansion locus for experimental follow-up. This approach is validated using genome sequence data for 259 cases of amyotrophic lateral sclerosis, of which 24 are positive for a large repeat expansion in C9orf72, showing that REscan statistics readily discriminate repeat expansion carriers from non-carriers. Availabilityand implementation C source code at https://github.com/rlmcl/rescan (GNU General Public Licence v3).

Download Full-text

Rapid Mycobacterium tuberculosis spoligotyping from uncorrected long reads using Galru

10.1101/2020.05.31.126490 ◽

2020 ◽

Author(s):

Andrew J. Page ◽

Nabil-Fareed Alikhan ◽

Michael Strinden ◽

Thanh Le Viet ◽

Timofey Skvortsov

Keyword(s):

Mycobacterium Tuberculosis ◽

State Of The Art ◽

Sequence Data ◽

Human Pathogen ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Long Reads ◽

Long Read

AbstractSpoligotyping of Mycobacterium tuberculosis provides a subspecies classification of this major human pathogen. Spoligotypes can be predicted from short read genome sequencing data; however, no methods exist for long read sequence data such as from Nanopore or PacBio. We present a novel software package Galru, which can rapidly detect the spoligotype of a Mycobacterium tuberculosis sample from as little as a single uncorrected long read. It allows for near real-time spoligotyping from long read data as it is being sequenced, giving rapid sample typing. We compare it to the existing state of the art software and find it performs identically to the results obtained from short read sequencing data. Galru is freely available from https://github.com/quadram-institute-bioscience/galru under the GPLv3 open source licence.

Download Full-text

A Transposon Story: From TE Content to TE Dynamic Invasion of Drosophila Genomes Using the Single-Molecule Sequencing Technology from Oxford Nanopore

Cells ◽

10.3390/cells9081776 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1776

Author(s):

Mourdas Mohamed ◽

Nguyet Thi-Minh Dang ◽

Yuki Ogyama ◽

Nelly Burlet ◽

Bruno Mugat ◽

...

Keyword(s):

Single Molecule ◽

Wild Type ◽

Sequencing Data ◽

Short Read ◽

Short Read Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

In The Wild ◽

Successive Generations ◽

Type Strains

Transposable elements (TEs) are the main components of genomes. However, due to their repetitive nature, they are very difficult to study using data obtained with short-read sequencing technologies. Here, we describe an efficient pipeline to accurately recover TE insertion (TEI) sites and sequences from long reads obtained by Oxford Nanopore Technology (ONT) sequencing. With this pipeline, we could precisely describe the landscapes of the most recent TEIs in wild-type strains of Drosophila melanogaster and Drosophila simulans. Their comparison suggests that this subset of TE sequences is more similar than previously thought in these two species. The chromosome assemblies obtained using this pipeline also allowed recovering piRNA cluster sequences, which was impossible using short-read sequencing. Finally, we used our pipeline to analyze ONT sequencing data from a D. melanogaster unstable line in which LTR transposition was derepressed for 73 successive generations. We could rely on single reads to identify new insertions with intact target site duplications. Moreover, the detailed analysis of TEIs in the wild-type strains and the unstable line did not support the trap model claiming that piRNA clusters are hotspots of TE insertions.

Download Full-text

Closed Genome Sequence of Salmonella enterica Serovar Richmond Strain CFSAN000191, Obtained with Nanopore Sequencing

Microbiology Resource Announcements ◽

10.1128/mra.01472-18 ◽

2018 ◽

Vol 7 (23) ◽

Cited By ~ 3

Author(s):

Narjol González-Escalona ◽

Kuan Yao ◽

Maria Hoffmann

Keyword(s):

Genome Sequence ◽

Salmonella Enterica ◽

Nanopore Sequencing ◽

Short Read ◽

Content Type ◽

Short Read Sequencing ◽

Long Read

Here we report the genome sequence of Salmonella enterica serovar Richmond strain CFSAN000191, isolated from tilapia from Thailand in 2005. The genome was determined by a combination of long-read and short-read sequencing.

Download Full-text