scholarly journals Direct Nanopore Sequencing of mRNA Reveals Landscape of Transcript Isoforms in Apicomplexan Parasites

mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
V. Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

ABSTRACT Alternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterized in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AU content of Plasmodium RNA, but also the limitations of short-read sequencing in deciphering complex splicing events. In this study, we utilized the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or nearly full-length transcripts with comparable quantification to Illumina sequencing. By comparing these data with available gene models, we find widespread alternative splicing, particularly intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This study highlights a strategy in using long-read sequencing to understand splicing events at the whole-transcript level and has implications in the future interpretation of transcriptome sequencing studies. IMPORTANCE We have used a novel nanopore sequencing technology to directly analyze parasite transcriptomes. The very long reads of this technology reveal the full-length genes of the parasites that cause malaria and toxoplasmosis. Gene transcripts must be processed in a process called splicing before they can be translated to protein. Our analysis reveals that these parasites very frequently only partially process their gene products, in a manner that departs dramatically from their human hosts.

2020 ◽  
Author(s):  
V Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

AbstractAlternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterised in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AT content of Plasmodium RNA, but also the limitations of short read sequencing in deciphering complex splicing events. In this study, we utilised the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies (ONT) to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or near full-length transcripts with comparable quantification to Illumina sequencing. By comparing this data with available gene models, we find widespread alternative splicing, particular intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This work highlights a strategy in using long read sequencing to understand splicing events at the whole transcript level, and has implications in future interpretation of RNA-seq studies.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jingli Yang ◽  
Wanqiu Lv ◽  
Liying Shao ◽  
Yanrui Fu ◽  
Haimei Liu ◽  
...  

In eukaryotes, alternative splicing (AS) is a crucial regulatory mechanism that modulates mRNA diversity and stability. The contribution of AS to stress is known in many species related to stress, but the posttranscriptional mechanism in poplar under cold stress is still unclear. Recent studies have utilized the advantages of single molecular real-time (SMRT) sequencing technology from Pacific Bioscience (PacBio) to identify full-length transcripts. We, therefore, used a combination of single-molecule long-read sequencing and Illumina RNA sequencing (RNA-Seq) for a global analysis of AS in two poplar species (Populus trichocarpa and P. ussuriensis) under cold stress. We further identified 1,261 AS events in P. trichocarpa and 2,101 in P. ussuriensis among which intron retention, with a frequency of more than 30%, was the most prominent type under cold stress. RNA-Seq data analysis and annotation revealed the importance of calcium, abscisic acid, and reactive oxygen species signaling in cold stress response. Besides, the low temperature rapidly induced multiple splicing factors, transcription factors, and differentially expressed genes through AS. In P. ussuriensis, there was a rapid occurrence of AS events, which provided a new insight into the complexity and regulation of AS during cold stress response in different poplar species for the first time.


Author(s):  
Jingli Yang ◽  
Wanqiu Lv ◽  
Minzhen Zeng ◽  
Yanrui Fu ◽  
Chenghao Li

In eukaryotes, alternative splicing (AS) is a crucial regulatory mechanism that modulates mRNA diversity and stability. The contribution of AS to stress are known in many species related to stress. But the post-transcriptional mechanism in poplar under cold stress is still unclear. Recent studies have utilized the advantages of Single Molecular Real Time (SMRT) sequencing technology from Pacific Bioscience (PacBio) to identify full-length transcripts. We, therefore, used a combination of single-molecule long-read sequencing and Illumina RNA sequencing (RNA-Seq) for a global analysis of AS in two poplar species (Populus trichocarpa and P. ussuriensis) under cold stress. We further identified 1261 AS events in P. trichocarpa and 2101 in P. ussuriensis, among which intron retention, with a frequency of more than 30%, was the most prominent type under cold stress. RNA-Seq data analysis and annotation revealed the importance of calcium, abscisic acid, and reactive oxygen species signaling in cold stress response. Besides, the low temperature rapidly induced multiple splicing factors, transcription factors, and differentially expressed genes through AS. In P. ussuriensis, there was a rapid occurrence of AS events. This study provides new insight into the complexity and regulation of AS during cold stress response in two poplar species.


2020 ◽  
Author(s):  
Alice S. Naftaly ◽  
Shana Pau ◽  
Michael A. White

AbstractAlternate isoforms contribute immensely to phenotypic diversity across eukaryotes. While short read RNA-sequencing has increased our understanding of isoform diversity, it is challenging to accurately detect full-length transcripts, preventing the identification of many alternate isoforms. Long-read sequencing technologies have made it possible to sequence full length alternative transcripts, accurately characterizing alternative splicing events, alternate transcription start and end sites, and differences in UTR regions. Here, we utilize PacBio long read RNA-sequencing (Iso-Seq) to examine the transcriptomes of five tissues in threespine stickleback fish (Gasterosteus aculeatus), a widely used genetic model species. The threespine stickleback fish has a refined genome assembly with gene annotations that are based on short-read RNA sequencing and predictions from coding sequence of other species. This suggests some of the existing annotations may be inaccurate or alternative transcripts may not be fully characterized. Using Iso-Seq we detected thousands of novel isoforms, indicating many isoforms are absent in the current Ensembl gene annotations. In addition, we refined many of the existing annotations within the genome. We noted many improperly positioned transcription start sites that were refined with long-read sequencing. The Iso-Seq predicted transcription start sites were more accurate, verified through ATAC-seq. We were also able to detect many alternative splicing events between sexes and across tissues. We found a substantial number of genes in both somatic and gonad tissue that had sex-specific isoforms. Our study highlights the power of long-read sequencing to study the complexity of transcriptomes, greatly improving genomic resources for the threespine stickleback fish.


DNA Research ◽  
2019 ◽  
Vol 26 (4) ◽  
pp. 301-311 ◽  
Author(s):  
Yue Zhang ◽  
Tonny Maraga Nyong'A ◽  
Tao Shi ◽  
Pingfang Yang

Abstract Alternative splicing (AS) plays a critical role in regulating different physiological and developmental processes in eukaryotes, by dramatically increasing the diversity of the transcriptome and the proteome. However, the saturation and complexity of AS remain unclear in lotus due to its limitation of rare obtainment of full-length multiple-splice isoforms. In this study, we apply a hybrid assembly strategy by combining single-molecule real-time sequencing and Illumina RNA-seq to get a comprehensive insight into the lotus transcriptomic landscape. We identified 211,802 high-quality full-length non-chimeric reads, with 192,690 non-redundant isoforms, and updated the lotus reference gene model. Moreover, our analysis identified a total of 104,288 AS events from 16,543 genes, with alternative 3ʹ splice-site being the predominant model, following by intron retention. By exploring tissue datasets, 370 tissue-specific AS events were identified among 12 tissues. Both the tissue-specific genes and isoforms might play important roles in tissue or organ development, and are suitable for ‘ABCE’ model partly in floral tissues. A large number of AS events and isoform variants identified in our study enhance the understanding of transcriptional diversity in lotus, and provide valuable resource for further functional genomic studies.


2018 ◽  
Vol 7 (23) ◽  
Author(s):  
Narjol González-Escalona ◽  
Kuan Yao ◽  
Maria Hoffmann

Here we report the genome sequence of Salmonella enterica serovar Richmond strain CFSAN000191, isolated from tilapia from Thailand in 2005. The genome was determined by a combination of long-read and short-read sequencing.


Author(s):  
Fairlie Reese ◽  
Ali Mortazavi

Abstract Motivation Long-read RNA-sequencing technologies such as PacBio and Oxford Nanopore have discovered an explosion of new transcript isoforms that are difficult to visually analyze using currently available tools. We introduce the Swan Python library, which is designed to analyze and visualize transcript models. Results Swan finds 4909 differentially expressed transcripts between cell lines HepG2 and HFFc6, including 279 that are differentially expressed even though the parent gene is not. Additionally, Swan discovers 285 reproducible exon skipping and 47 intron retention events not recorded in the GENCODE v29 annotation. Availability and implementation The Swan library for Python 3 is available on PyPi at https://pypi.org/project/swan-vis/ and on GitHub at https://github.com/mortazavilab/swan_vis.


2021 ◽  
Vol 12 ◽  
Author(s):  
Davide Bolognini ◽  
Alberto Magi

Structural variants (SVs) are genomic rearrangements that involve at least 50 nucleotides and are known to have a serious impact on human health. While prior short-read sequencing technologies have often proved inadequate for a comprehensive assessment of structural variation, more recent long reads from Oxford Nanopore Technologies have already been proven invaluable for the discovery of large SVs and hold the potential to facilitate the resolution of the full SV spectrum. With many long-read sequencing studies to follow, it is crucial to assess factors affecting current SV calling pipelines for nanopore sequencing data. In this brief research report, we evaluate and compare the performances of five long-read SV callers across four long-read aligners using both real and synthetic nanopore datasets. In particular, we focus on the effects of read alignment, sequencing coverage, and variant allele depth on the detection and genotyping of SVs of different types and size ranges and provide insights into precision and recall of SV callsets generated by integrating the various long-read aligners and SV callers. The computational pipeline we propose is publicly available at https://github.com/davidebolo1993/EViNCe and can be adjusted to further evaluate future nanopore sequencing datasets.


2018 ◽  
Author(s):  
Adrian Viehweger ◽  
Sebastian Krautwurst ◽  
Kevin Lamkiewicz ◽  
Ramakanth Madhugiri ◽  
John Ziebuhr ◽  
...  

Sequence analyses of RNA virus genomes remain challenging due to the exceptional genetic plasticity of these viruses. Because of high mutation and recombination rates, genome replication by viral RNA-dependent RNA polymerases leads to populations of closely related viruses, so-called 'quasispecies'. Standard (short-read) sequencing technologies are ill-suited to reconstruct large numbers of full-length haplotypes of (i) RNA virus genomes and (ii) subgenome-length (sg) RNAs comprised of noncontiguous genome regions. Here, we used a full-length, direct RNA sequencing (DRS) approach based on nanopores to characterize viral RNAs produced in cells infected with a human coronavirus. Using DRS, we were able to map the longest (~26 kb) contiguous read to the viral reference genome. By combining Illumina and nanopore sequencing, we reconstructed a highly accurate consensus sequence of the human coronavirus (HCoV) 229E genome (27.3 kb). Furthermore, using long reads that did not require an assembly step, we were able to identify, in infected cells, diverse and novel HCoV-229E sg RNAs that remain to be characterized. Also, the DRS approach, which circumvents reverse transcription and amplification of RNA, allowed us to detect methylation sites in viral RNAs. Our work paves the way for haplotype-based analyses of viral quasispecies by demonstrating the feasibility of intra-sample haplotype separation. Even though several technical challenges remain to be addressed to exploit the potential of the nanopore technology fully, our work illustrates that direct RNA sequencing may significantly advance genomic studies of complex virus populations, including predictions on long-range interactions in individual full-length viral RNA haplotypes.


2019 ◽  
Author(s):  
Karim Rahimi ◽  
Morten T. Venø ◽  
Daniel M. Dupont ◽  
Jørgen Kjems

AbstractCircular RNA (circRNA) is a poorly understood class of non-coding RNAs, some of which have been shown to be functional important for cell proliferation and development. CircRNAs mainly derive from back splicing events of coding mRNAs, making it difficult to distinguish the internal exon composition of circRNA from the linearly spliced mRNA. To examine the global exon composition of circRNAs, we performed long-read sequencing of single molecules using nanopore technology for human and mouse brain-derived RNA. By applying an optimized circRNA enrichment protocol prior to sequencing, we were able to detect 7,834 and 10,975 circRNAs in human and mouse brain, respectively, of which 2,945 and 7,052 are not currently found in circBase. Alternative splicing was more prevalent in circRNAs than in linear spliced transcripts, and notably >200 not previously annotated exons were used in circRNAs. This suggests that properties associated with circRNA- specific features, e.g. the unusual back-splicing step during biogenesis, increased stability and /or their lack of translation, alter the general exon usage at steady state. We conclude that the nanopore sequencing technology provides a fast and reliable method to map the specific exon composition of circRNA.


Sign in / Sign up

Export Citation Format

Share Document