scholarly journals Long-read RNA sequencing reveals widespread sex-specific alternative splicing in threespine stickleback fish

2020 ◽  
Author(s):  
Alice S. Naftaly ◽  
Shana Pau ◽  
Michael A. White

AbstractAlternate isoforms contribute immensely to phenotypic diversity across eukaryotes. While short read RNA-sequencing has increased our understanding of isoform diversity, it is challenging to accurately detect full-length transcripts, preventing the identification of many alternate isoforms. Long-read sequencing technologies have made it possible to sequence full length alternative transcripts, accurately characterizing alternative splicing events, alternate transcription start and end sites, and differences in UTR regions. Here, we utilize PacBio long read RNA-sequencing (Iso-Seq) to examine the transcriptomes of five tissues in threespine stickleback fish (Gasterosteus aculeatus), a widely used genetic model species. The threespine stickleback fish has a refined genome assembly with gene annotations that are based on short-read RNA sequencing and predictions from coding sequence of other species. This suggests some of the existing annotations may be inaccurate or alternative transcripts may not be fully characterized. Using Iso-Seq we detected thousands of novel isoforms, indicating many isoforms are absent in the current Ensembl gene annotations. In addition, we refined many of the existing annotations within the genome. We noted many improperly positioned transcription start sites that were refined with long-read sequencing. The Iso-Seq predicted transcription start sites were more accurate, verified through ATAC-seq. We were also able to detect many alternative splicing events between sexes and across tissues. We found a substantial number of genes in both somatic and gonad tissue that had sex-specific isoforms. Our study highlights the power of long-read sequencing to study the complexity of transcriptomes, greatly improving genomic resources for the threespine stickleback fish.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kie Kyon Huang ◽  
Jiawen Huang ◽  
Jeanie Kar Leng Wu ◽  
Minghui Lee ◽  
Su Ting Tay ◽  
...  

Abstract Background Deregulated gene expression is a hallmark of cancer; however, most studies to date have analyzed short-read RNA sequencing data with inherent limitations. Here, we combine PacBio long-read isoform sequencing (Iso-Seq) and Illumina paired-end short-read RNA sequencing to comprehensively survey the transcriptome of gastric cancer (GC), a leading cause of global cancer mortality. Results We performed full-length transcriptome analysis across 10 GC cell lines covering four major GC molecular subtypes (chromosomal unstable, Epstein-Barr positive, genome stable and microsatellite unstable). We identify 60,239 non-redundant full-length transcripts, of which > 66% are novel compared to current transcriptome databases. Novel isoforms are more likely to be cell line and subtype specific, expressed at lower levels with larger number of exons, with longer isoform/coding sequence lengths. Most novel isoforms utilize an alternate first exon, and compared to other alternative splicing categories, are expressed at higher levels and exhibit higher variability. Collectively, we observe alternate promoter usage in 25% of detected genes, with the majority (84.2%) of known/novel promoter pairs exhibiting potential changes in their coding sequences. Mapping these alternate promoters to TCGA GC samples, we identify several cancer-associated isoforms, including novel variants of oncogenes. Tumor-specific transcript isoforms tend to alter protein coding sequences to a larger extent than other isoforms. Analysis of outcome data suggests that novel isoforms may impart additional prognostic information. Conclusions Our results provide a rich resource of full-length transcriptome data for deeper studies of GC and other gastrointestinal malignancies.


2020 ◽  
Author(s):  
V Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

AbstractAlternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterised in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AT content of Plasmodium RNA, but also the limitations of short read sequencing in deciphering complex splicing events. In this study, we utilised the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies (ONT) to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or near full-length transcripts with comparable quantification to Illumina sequencing. By comparing this data with available gene models, we find widespread alternative splicing, particular intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This work highlights a strategy in using long read sequencing to understand splicing events at the whole transcript level, and has implications in future interpretation of RNA-seq studies.


2020 ◽  
Author(s):  
Wei Zhou ◽  
Yaxing Zhou ◽  
Guoli Zhu ◽  
Yun Wang ◽  
Zhibiao He ◽  
...  

AbstractBackground and ObjectivesCastor (Ricinus communis L.) is an important non-edible oilseed crop. Lm type female strains and normal amphiprotic strains are important castor cultivars, and are mainly different in inflorescence structures and leaf shapes. To better understand the mechanisums underling these differences at the molecular level, we performed comparative transcriptional analysis.Materials and MethodsFull-length transcriptome sequencing and short-read RNA sequencing were employed.ResultsA total of 76,068 and 44,223 non-redundant transcripts were obtained from high-quality transcripts of Lm type female strains and normal amphiprotic strains, respectively. In Lm female strain and normal amphiprotic strains 51,613 and 20,152 alternative splicing events were found, respectively. There were 13,239 transcription factors identified from the full-length transcriptomes. Comparative analysis showed great different gene expression of common and unique transcription factors between the two cultivars. Meanwhile, functional analysis of isoform was conducted. Full-length sequences were used as a reference genome, and short-read RNA sequencing analysis was performed to conduct differential gene analysis. Furthermore, the function of DEGs were performed to annotation analysis.ConclusionsThe results revealed considerable difference and expression diversity between two cultivars, well beyond what was reported in previous studies, likely reflecting the differences in architecture between these two cultivars.HighlightUsing the full-length transcriptome sequencing technology, we performed comparative analysis of transcription factors of two castor cultivars, analyzed alternative splicing events, and identified their lncRNAs.


mSystems ◽  
2021 ◽  
Vol 6 (2) ◽  
Author(s):  
V. Vern Lee ◽  
Louise M. Judd ◽  
Aaron R. Jex ◽  
Kathryn E. Holt ◽  
Christopher J. Tonkin ◽  
...  

ABSTRACT Alternative splicing is a widespread phenomenon in metazoans by which single genes are able to produce multiple isoforms of the gene product. However, this has been poorly characterized in apicomplexans, a major phylum of some of the most important global parasites. Efforts have been hampered by atypical transcriptomic features, such as the high AU content of Plasmodium RNA, but also the limitations of short-read sequencing in deciphering complex splicing events. In this study, we utilized the long read direct RNA sequencing platform developed by Oxford Nanopore Technologies to survey the alternative splicing landscape of Toxoplasma gondii and Plasmodium falciparum. We find that while native RNA sequencing has a reduced throughput, it allows us to obtain full-length or nearly full-length transcripts with comparable quantification to Illumina sequencing. By comparing these data with available gene models, we find widespread alternative splicing, particularly intron retention, in these parasites. Most of these transcripts contain premature stop codons, suggesting that in these parasites, alternative splicing represents a pathway to transcriptomic diversity, rather than expanding proteomic diversity. Moreover, alternative splicing rates are comparable between parasites, suggesting a shared splicing machinery, despite notable transcriptomic differences between the parasites. This study highlights a strategy in using long-read sequencing to understand splicing events at the whole-transcript level and has implications in the future interpretation of transcriptome sequencing studies. IMPORTANCE We have used a novel nanopore sequencing technology to directly analyze parasite transcriptomes. The very long reads of this technology reveal the full-length genes of the parasites that cause malaria and toxoplasmosis. Gene transcripts must be processed in a process called splicing before they can be translated to protein. Our analysis reveals that these parasites very frequently only partially process their gene products, in a manner that departs dramatically from their human hosts.


2021 ◽  
Author(s):  
Valentin Waschulin ◽  
Chiara Borsetto ◽  
Robert James ◽  
Kevin K. Newsham ◽  
Stefano Donadio ◽  
...  

AbstractThe growing problem of antibiotic resistance has led to the exploration of uncultured bacteria as potential sources of new antimicrobials. PCR amplicon analyses and short-read sequencing studies of samples from different environments have reported evidence of high biosynthetic gene cluster (BGC) diversity in metagenomes, indicating their potential for producing novel and useful compounds. However, recovering full-length BGC sequences from uncultivated bacteria remains a challenge due to the technological restraints of short-read sequencing, thus making assessment of BGC diversity difficult. Here, long-read sequencing and genome mining were used to recover >1400 mostly full-length BGCs that demonstrate the rich diversity of BGCs from uncultivated lineages present in soil from Mars Oasis, Antarctica. A large number of highly divergent BGCs were not only found in the phyla Acidobacteriota, Verrucomicrobiota and Gemmatimonadota but also in the actinobacterial classes Acidimicrobiia and Thermoleophilia and the gammaproteobacterial order UBA7966. The latter furthermore contained a potential novel family of RiPPs. Our findings underline the biosynthetic potential of underexplored phyla as well as unexplored lineages within seemingly well-studied producer phyla. They also showcase long-read metagenomic sequencing as a promising way to access the untapped genetic reservoir of specialised metabolite gene clusters of the uncultured majority of microbes.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Matthew T Parker ◽  
Katarzyna Knop ◽  
Anna V Sherwood ◽  
Nicholas J Schurch ◽  
Katarzyna Mackinnon ◽  
...  

Understanding genome organization and gene regulation requires insight into RNA transcription, processing and modification. We adapted nanopore direct RNA sequencing to examine RNA from a wild-type accession of the model plant Arabidopsis thaliana and a mutant defective in mRNA methylation (m6A). Here we show that m6A can be mapped in full-length mRNAs transcriptome-wide and reveal the combinatorial diversity of cap-associated transcription start sites, splicing events, poly(A) site choice and poly(A) tail length. Loss of m6A from 3’ untranslated regions is associated with decreased relative transcript abundance and defective RNA 3′ end formation. A functional consequence of disrupted m6A is a lengthening of the circadian period. We conclude that nanopore direct RNA sequencing can reveal the complexity of mRNA processing and modification in full-length single molecule reads. These findings can refine Arabidopsis genome annotation. Further, applying this approach to less well-studied species could transform our understanding of what their genomes encode.


2019 ◽  
Author(s):  
Andrew T. Ludlow ◽  
Mohammed E. Sayed ◽  
Aaron L. Slusher ◽  
Mark Ribick ◽  
Anisha Pancholi ◽  
...  

2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Yueming Hu ◽  
Xing-Sheng Shu ◽  
Jiaxian Yu ◽  
Ming-an Sun ◽  
Zewei Chen ◽  
...  

AbstractHuman genes form a large variety of isoforms after transcription, encoding distinct transcripts to exert different functions. Single-molecule RNA sequencing facilitates accurate identification of the isoforms by extending nucleotide read length significantly. However, the gene or isoform diversity is lowly represented by the mRNA molecules captured by single-molecule RNA sequencing. Here, we show that a cDNA normalization procedure before the library preparation for PacBio RS II sequencing captures 3.2–6.0 fold more full-length high-quality isoform species for different human samples, as compared to the non-normalized capture procedure. Many lowly expressed, functionally important isoforms can be detected. In addition, normalized PacBio RNA sequencing also resolves more allele-specific haplotype transcripts. Finally, we apply the cDNA normalization based long-read RNA sequencing method to profile the transcriptome of human gastric signet-ring cell carcinomas, identify new cancer-specific transcriptome signatures, and thus, bring out the utility of the improved protocols in gene expression studies.


Sign in / Sign up

Export Citation Format

Share Document