scholarly journals Single-molecule, full-length transcript isoform sequencing reveals disease-associated RNA isoforms in cardiomyocytes

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chenchen Zhu ◽  
Jingyan Wu ◽  
Han Sun ◽  
Francesca Briganti ◽  
Benjamin Meder ◽  
...  

AbstractAlternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we establish an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generate a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms (http://steinmetzlab.embl.de/iBrowser/). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identify 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by the discovery of IMMT isoforms mis-spliced by RBM20 mutations. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms, providing more immediate biological interpretation and higher resolution transcriptome comparisons.

2021 ◽  
Author(s):  
Chenchen Zhu ◽  
Jingyan Wu ◽  
Han Sun ◽  
Francesca Briganti ◽  
Benjamin Meder ◽  
...  

Alternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we established an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generated a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms (http://steinmetzlab.embl.de/iBrowser/). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identified 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by novel IMMT isoforms. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms as the functional unit, instead of genes or exons, providing more immediate biological interpretation and higher resolution transcriptome comparisons.


Forests ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 866
Author(s):  
Lei Kan ◽  
Qicong Liao ◽  
Zhiyao Su ◽  
Yushan Tan ◽  
Shuyu Wang ◽  
...  

Madhuca pasquieri (Dubard) Lam. is a tree on the International Union for Conservation of Nature Red List and a national key protected wild plant (II) of China, known for its seed oil and timber. However, lacking of genomic and transcriptome data for this species hampers study of its reproduction, utilization, and conservation. Here, single-molecule long-read sequencing (PacBio) and next-generation sequencing (Illumina) were combined to obtain the transcriptome from five developmental stages of M. pasquieri. Overall, 25,339 transcript isoforms were detected by PacBio, including 24,492 coding sequences (CDSs), 9440 simple sequence repeats (SSRs), 149 long non-coding RNAs (lncRNAs), and 182 alternative splicing (AS) events, a majority was retained intron (RI). A further 1058 transcripts were identified as transcriptional factors (TFs) from 51 TF families. PacBio recovered more full-length transcript isoforms with a longer length, and a higher expression level, whereas larger number of transcripts (124,405) was captured in de novo from Illumina. Using Nr, Swissprot, KOG, and KEGG databases, 24,405 transcripts (96.31%) were annotated by PacBio. Functional annotation revealed a role for the auxin, abscisic acid, gibberellin, and cytokinine metabolic pathways in seed germination and post-germination. These findings support further studies on seed germination mechanism and genome of M. pasquieri, and better protection of this endangered species.


2019 ◽  
Author(s):  
Anne Deslattes Mays ◽  
Marcel O. Schmidt ◽  
Garrett T. Graham ◽  
Elizabeth Tseng ◽  
Primo Baybayan ◽  
...  

AbstractHematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single molecule, real time (SMRT) full length RNA sequencing. This analysis revealed a ∼5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of mRNA isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was validated by mass spectrometry and validated previously unknown protein isoforms predicted e.g. for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g. CFD, GATA2, HLA-A, B & C) also distinguished cell subpopulations but were only detectable by full length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.


2020 ◽  
Vol 117 (11) ◽  
pp. 6145-6155 ◽  
Author(s):  
Jianbo Chen ◽  
Yang Liu ◽  
Bin Wu ◽  
Olga A. Nikolaitchik ◽  
Preeti R. Mohan ◽  
...  

HIV-1 full-length RNA (HIV-1 RNA) plays a central role in viral replication, serving as a template for Gag/Gag-Pol translation and as a genome for the progeny virion. To gain a better understanding of the regulatory mechanisms of HIV-1 replication, we adapted a recently described system to visualize and track translation from individual HIV-1 RNA molecules in living cells. We found that, on average, half of the cytoplasmic HIV-1 RNAs are being actively translated at a given time. Furthermore, translating and nontranslating RNAs are well mixed in the cytoplasm; thus, Gag biogenesis occurs throughout the cytoplasm without being constrained to particular subcellular locations. Gag is an RNA binding protein that selects and packages HIV-1 RNA during virus assembly. A long-standing question in HIV-1 gene expression is whether Gag modulates HIV-1 RNA translation. We observed that despite its RNA-binding ability, Gag expression does not alter the proportion of translating HIV-1 RNA. Using single-molecule tracking, we found that both translating and nontranslating RNAs exhibit dynamic cytoplasmic movement and can reach the plasma membrane, the major HIV-1 assembly site. However, Gag selectively packages nontranslating RNA into the assembly complex. These studies illustrate that although HIV-1 RNA serves two functions, as a translation template and as a viral genome, individual RNA molecules carry out only one function at a time. These studies shed light on previously unknown aspects of HIV-1 gene expression and regulation.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Lulu Deng ◽  
Long Li ◽  
Cheng Zou ◽  
Chengchi Fang ◽  
Changchun Li

Many increasing documents have proved that alternative polyadenylation (APA) events with different polyadenylation sites (PAS) contribute to posttranscriptional regulation. However, little is known about the detailed molecular features of PASs and its role in porcine fast and slow skeletal muscles through microRNAs (miRNAs) and RNA binding proteins (RBPs). In this study, we combined single-molecule real-time sequencing and Illumina RNA-seq datasets to comprehensively analyze polyadenylation in pigs. We identified a total of 10,334 PASs, of which 8734 were characterized by reference genome annotation. 32.86% of PAS-associated genes were determined to have more than one PAS. Further analysis demonstrated that tissue-specific PASs between fast and slow muscles were enriched in skeletal muscle development pathways. In addition, we obtained 1407 target genes regulated by APA events through potential binding 69 miRNAs and 28 RBPs in variable 3′ UTR regions and some are involved in myofiber transformation. Furthermore, the de novo motif search confirmed that the most common usage of canonical motif AAUAAA and three types of PASs may be related to the strength of motifs. In summary, our results provide a useful annotation of PASs for pig transcriptome and suggest that APA may serve as a role in fast and slow muscle development under the regulation of miRNAs and RBPs.


2017 ◽  
Vol 114 (46) ◽  
pp. E9873-E9882 ◽  
Author(s):  
Gal Haimovich ◽  
Christopher M. Ecker ◽  
Margaret C. Dunagin ◽  
Elliott Eggan ◽  
Arjun Raj ◽  
...  

RNAs have been shown to undergo transfer between mammalian cells, although the mechanism behind this phenomenon and its overall importance to cell physiology is not well understood. Numerous publications have suggested that RNAs (microRNAs and incomplete mRNAs) undergo transfer via extracellular vesicles (e.g., exosomes). However, in contrast to a diffusion-based transfer mechanism, we find that full-length mRNAs undergo direct cell–cell transfer via cytoplasmic extensions characteristic of membrane nanotubes (mNTs), which connect donor and acceptor cells. By employing a simple coculture experimental model and using single-molecule imaging, we provide quantitative data showing that mRNAs are transferred between cells in contact. Examples of mRNAs that undergo transfer include those encoding GFP, mouse β-actin, and human Cyclin D1, BRCA1, MT2A, and HER2. We show that intercellular mRNA transfer occurs in all coculture models tested (e.g., between primary cells, immortalized cells, and in cocultures of immortalized human and murine cells). Rapid mRNA transfer is dependent upon actin but is independent of de novo protein synthesis and is modulated by stress conditions and gene-expression levels. Hence, this work supports the hypothesis that full-length mRNAs undergo transfer between cells through a refined structural connection. Importantly, unlike the transfer of miRNA or RNA fragments, this process of communication transfers genetic information that could potentially alter the acceptor cell proteome. This phenomenon may prove important for the proper development and functioning of tissues as well as for host–parasite or symbiotic interactions.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yupeng Cui ◽  
Xinqiang Gao ◽  
Jianshe Wang ◽  
Zengzhen Shang ◽  
Zhibin Zhang ◽  
...  

Artemisia argyi is an important medicinal plant widely utilized for moxibustion heat therapy in China. The terpenoid biosynthesis process in A. argyi is speculated to play a key role in conferring its medicinal value. However, the molecular mechanism underlying terpenoid biosynthesis remains unclear, in part because the reference genome of A. argyi is unavailable. Moreover, the full-length transcriptome of A. argyi has not yet been sequenced. Therefore, in this study, de novo transcriptome sequencing of A. argyi's root, stem, and leaf tissues was performed to obtain those candidate genes related to terpenoid biosynthesis, by combining the PacBio single-molecule real-time (SMRT) and Illumina sequencing NGS platforms. And more than 55.4 Gb of sequencing data and 108,846 full-length reads (non-chimeric) were generated by the Illumina and PacBio platform, respectively. Then, 53,043 consensus isoforms were clustered and used to represent 36,820 non-redundant transcripts, of which 34,839 (94.62%) were annotated in public databases. In the comparison sets of leaves vs roots, and leaves vs stems, 13,850 (7,566 up-regulated, 6,284 down-regulated) and 9,502 (5,284 up-regulated, 4,218 down-regulated) differentially expressed transcripts (DETs) were obtained, respectively. Specifically, the expression profile and KEGG functional enrichment analysis of these DETs indicated that they were significantly enriched in the biosynthesis of amino acids, carotenoids, diterpenoids and flavonoids, as well as the metabolism processes of glycine, serine and threonine. Moreover, multiple genes encoding significant enzymes or transcription factors related to diterpenoid biosynthesis were highly expressed in the A. argyi leaves. Additionally, several transcription factor families, such as RLK-Pelle_LRR-L-1 and RLK-Pelle_DLSV, were also identified. In conclusion, this study offers a valuable resource for transcriptome information, and provides a functional genomic foundation for further research on molecular mechanisms underlying the medicinal use of A. argyi leaves.


2021 ◽  
Author(s):  
Jing Song ◽  
Ping Li ◽  
De-Long Guan ◽  
Yan Sun

Abstract Although leeches are of great medical and economic value in anticoagulant therapy, full-length transcriptomes for leeches remain scarce. Here, we generated the first full-length transcriptome for the paddy leech Whitmania pigra (the most widely utilized medical leech in Chinese traditional medicine) through Pacific Biosciences (Pacbio) single-molecule long-read sequencing. A total of 191,676 full-length non-chimeric (FLNC) reads were obtained, 30,660 were high-quality unique full-length transcripts. The BUSCO (Bench-marking Universal Single-Copy Orthologues) accession of completeness demonstrated that 74.8% of BUSCOs were complete. We functionally annotated 28,144 transcripts were in public databases, including NR, gene ontology (GO), Pfam, etc. Furthermore, 1,314 long non-coding RNAs (LncRNAs), 2,574 alternative splicing (AS) events, 932 transcript factors (TFs), and 33,258 simple sequence repeats (SSRs) we identified across all transcripts. From the generated data, a total of 426 anticoagulant genes, including 122 Antistasins, 124 with the Fibrinogen beta and gamma chains, and 62 Kazal-type serine protease inhibitors were screened out. Twenty-five novel proteins were revealed following the evaluation of the annotations and products of these anticoagulant transcripts. The regulation network between LncRNAs and corresponding coding transcripts was found with the typical mang-to-many pattern, especially obvious in a specific type of protein, Guamerin. Collectively, the present findings provide a rich set of full-length cDNA sequences for W. pigra, which will greatly facilitate research on transcriptomic genetic for this species and leeches.


2018 ◽  
Vol 2018 ◽  
pp. 1-6 ◽  
Author(s):  
Shang-Qian Xie ◽  
Yue Han ◽  
Xiao-Zhou Chen ◽  
Tai-Yu Cao ◽  
Kai-Kai Ji ◽  
...  

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.


2017 ◽  
Author(s):  
Julien Lagarde ◽  
Barbara Uszczynska-Ratajczak ◽  
Silvia Carbonell ◽  
SÍlvia Pérez-Lluch ◽  
Amaya Abad ◽  
...  

AbstractAccurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued–particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.Abbreviationsbpbase pairFLfull lengthntnucleotideROIread of insert, i.e. PacBio readSJsplice junctionSMRTsingle-molecule real-timeTMtranscript model


Sign in / Sign up

Export Citation Format

Share Document