novel transcript
Recently Published Documents


TOTAL DOCUMENTS

140
(FIVE YEARS 38)

H-INDEX

20
(FIVE YEARS 3)

BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Rajinder Gupta ◽  
Jos Kleinjans ◽  
Florian Caiment

Abstract Background Hepatocellular carcinoma (HCC) is one of the leading causes of cancer death in the world owing to limitations in its prognosis. The current prognosis approaches include radiological examination and detection of serum biomarkers, however, both have limited efficiency and are ineffective in early prognosis. Due to such limitations, we propose to use RNA-Seq data for evaluating putative higher accuracy biomarkers at the transcript level that could help in early prognosis. Methods To identify such potential transcript biomarkers, RNA-Seq data for healthy liver and various HCC cell models were subjected to five different machine learning algorithms: random forest, K-nearest neighbor, Naïve Bayes, support vector machine, and neural networks. Various metrics, namely sensitivity, specificity, MCC, informedness, and AUC-ROC (except for support vector machine) were evaluated. The algorithms that produced the highest values for all metrics were chosen to extract the top features that were subjected to recursive feature elimination. Through recursive feature elimination, the least number of features were obtained to differentiate between the healthy and HCC cell models. Results From the metrics used, it is demonstrated that the efficiency of the known protein biomarkers for HCC is comparatively lower than complete transcriptomics data. Among the different machine learning algorithms, random forest and support vector machine demonstrated the best performance. Using recursive feature elimination on top features of random forest and support vector machine three transcripts were selected that had an accuracy of 0.97 and kappa of 0.93. Of the three transcripts, two were protein coding (PARP2–202 and SPON2–203) and one was a non-coding transcript (CYREN-211). Lastly, we demonstrated that these three selected transcripts outperformed randomly taken three transcripts (15,000 combinations), hence were not chance findings, and could then be an interesting candidate for new HCC biomarker development. Conclusion Using RNA-Seq data combined with machine learning approaches can aid in finding novel transcript biomarkers. The three biomarkers identified: PARP2–202, SPON2–203, and CYREN-211, presented the highest accuracy among all other transcripts in differentiating the healthy and HCC cell models. The machine learning pipeline developed in this study can be used for any RNA-Seq dataset to find novel transcript biomarkers. Code: www.github.com/rajinder4489/ML_biomarkers


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Balázs Kakuk ◽  
Dóra Tombácz ◽  
Zsolt Balázs ◽  
Norbert Moldován ◽  
Zsolt Csabai ◽  
...  

AbstractLong-read sequencing (LRS), a powerful novel approach, is able to read full-length transcripts and confers a major advantage over the earlier gold standard short-read sequencing in the efficiency of identifying for example polycistronic transcripts and transcript isoforms, including transcript length- and splice variants. In this work, we profile the human cytomegalovirus transcriptome using two third-generation LRS platforms: the Sequel from Pacific BioSciences, and MinION from Oxford Nanopore Technologies. We carried out both cDNA and direct RNA sequencing, and applied the LoRTIA software, developed in our laboratory, for the transcript annotations. This study identified a large number of novel transcript variants, including splice isoforms and transcript start and end site isoforms, as well as putative mRNAs with truncated in-frame ORFs (located within the larger ORFs of the canonical mRNAs), which potentially encode N-terminally truncated polypeptides. Our work also disclosed a highly complex meshwork of transcriptional read-throughs and overlaps.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Chenchen Zhu ◽  
Jingyan Wu ◽  
Han Sun ◽  
Francesca Briganti ◽  
Benjamin Meder ◽  
...  

AbstractAlternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we establish an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generate a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms (http://steinmetzlab.embl.de/iBrowser/). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identify 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by the discovery of IMMT isoforms mis-spliced by RBM20 mutations. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms, providing more immediate biological interpretation and higher resolution transcriptome comparisons.


2021 ◽  
Author(s):  
Chenchen Zhu ◽  
Jingyan Wu ◽  
Han Sun ◽  
Francesca Briganti ◽  
Benjamin Meder ◽  
...  

Alternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we established an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generated a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms (http://steinmetzlab.embl.de/iBrowser/). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identified 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by novel IMMT isoforms. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms as the functional unit, instead of genes or exons, providing more immediate biological interpretation and higher resolution transcriptome comparisons.


2021 ◽  
Author(s):  
Suman Lata ◽  
Ramesh Kumar Yadav ◽  
B.S. Tomar

Okra (Abelmoschus esculentus L. Moench), is an important vegetable crop with limited studies on genomics. It is considered as an essential constituent for balanced food due to its dietary fibers, amino-acid and vitamins. It is most widely cultivated for its pods throughout Asia and Africa. Most of the okra cultivation is done exclusively in the developing countries of Asia and Africa with very poor productivity. India ranks first in the world with a production of 6.3 million MT (72% of the total world production). Cultivated okra is mostly susceptible to a large number of begomoviruses. Yellow vein mosaic disease (YVMD) caused by Yellow vein mosaic virus (YVMV) of genus Begomovirus (family Geminiviridae) results in the serious losses in okra cultivation. Symptoms of YVMD are chlorosis and yellowing of veins and veinlets at various levels, small size leaves, lesser and smaller fruits, and stunting growth. The loss in yield, due to YVMD in okra was found ranging from 30 to 100% depending on the age of the plant at the time of infection. Exploitation of biotechnological tools in okra improvement programmes is often restricted, due to the non availability of abundant polymorphic molecular markers and defined genetic maps. Moreover, okra genome is allopolyploid in nature and possess a large number of chromosomes (2n = 56–196) which makes it more complicated. Genomics tools like RNA- seq. for transcriptome analysis has emerged as a powerful tool to identify novel transcript/gene sequences in non-model plants like okra.


2021 ◽  
Vol 89 (9) ◽  
pp. S305
Author(s):  
Nicola Hall ◽  
Wilfried Haerty ◽  
Elizabeth M. Tunbridge

2021 ◽  
Author(s):  
David J Wright ◽  
Nicola Hall ◽  
Naomi Irish ◽  
Angela L Man ◽  
Will Glynn ◽  
...  

ABSTRACTAlternative splicing (AS) is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of AS processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line and to characterise isoform expression and usage across differentiation. We identify many previously unannotated features, including a novel transcript of the voltage-gated calcium channel subunit gene, CACNA2D2. We show differential expression and usage of transcripts during differentiation, and identify a putative molecular regulator underlying this state change. Our work highlights the potential of long read sequencing to uncover previously unknown transcript diversity and mechanisms influencing alternative splicing.


2021 ◽  
Author(s):  
Kamila Kwiecien ◽  
Pawel Majewski ◽  
Maciej Bak ◽  
Piotr Brzoza ◽  
Urszula Godlewska ◽  
...  

Abstract Chemerin is a chemoattractant protein with adipokine and antimicrobial properties encoded by the retinoic acid receptor responder 2 (RARRES2) gene. Chemerin bioactivity is largely dependent on carboxyl-terminal proteolytic processing that generates chemerin isoforms with the different chemotactive, regulatory and antimicrobial potential.While these mechanisms are relatively well known, a role of alternative splicing in generating isoform diversity remains obscure. Using rapid amplification of cDNA ends (RACE) PCR, we have identified novel transcript variant 4 of mouse RARRES2 encoding mChem153K. Moreover, RT-QPCR results and analysis of publicly available RNA-seq datasets showed that different alternatively spliced variants of mouse RARRES2 are present in mouse tissues, and their expression pattern was not affected by inflammatory nor infectious stimuli. Finally, we demonstratedthat chemerin isoform mChem157S exhibits higher bactericidal but not chemotactic activity compared to mChem156S. Together, our findings highlight the importance of an alternative splicing in generation of chemerin isoforms diversity and activity.


2021 ◽  
Author(s):  
Balazs Kakuk ◽  
Dora Tombacz ◽  
Zsolt Balazs ◽  
Norbert Moldovan ◽  
Zsolt Csabai ◽  
...  

Long-read sequencing (LRS), a powerful novel approach, is able to read full-length transcripts and confers a major advantage over the earlier gold standard short-read sequencing in the efficiency of identifying for example polycistronic transcripts and transcript isoforms, including transcript length- and splice variants. In this work, we profile the human cytomegalovirus transcriptome using two third-generation LRS platforms: the Sequel from Pacific BioSciences, and MinION from Oxford Nanopore Technologies. We carried out both cDNA and direct RNA sequencing, and applied the LoRTIA software, developed in our laboratory, for the transcript annotations. This study identified a large number of novel transcript variants, including splice isoforms and transcript start and end site isoforms, as well as putative mRNAs with truncated in-frame ORFs (located within the larger ORFs of the canonical mRNAs), which potentially encode N-terminally truncated polypeptides. Our work also disclosed a highly complex meshwork of transcriptional read-throughs and overlaps.


Author(s):  
Gyan Ranjan ◽  
Paras Sehgal ◽  
Disha Sharma ◽  
Vinod Scaria ◽  
Sridhar Sivasubbu

AbstractThe utility of model organisms to understand the function of a novel transcript/genes has allowed us to delineate their molecular mechanisms in maintaining cellular homeostasis. Organisms such as zebrafish have contributed a lot in the field of developmental and disease biology. Attributable to advancement and deep transcriptomics, many new transcript isoforms and non-coding RNAs such as long noncoding RNA (lncRNA) and circular RNAs (circRNAs) have been identified and cataloged in multiple databases and many more are yet to be identified. Various methods and tools have been utilized to identify lncRNAs/circRNAs in zebrafish using deep sequencing of transcriptomes as templates. Functional analysis of a few candidates such as tie1-AS, ECAL1 and CDR1as in zebrafish provides a prospective outline to approach other known or novel lncRNA/circRNA. New genetic alteration tools like TALENS and CRISPRs have helped in probing for the molecular function of lncRNA/circRNA in zebrafish. Further latest improvements in experimental and computational techniques offer the identification of lncRNA/circRNA counterparts in humans and zebrafish thereby allowing easy modeling and analysis of function at cellular level.


Sign in / Sign up

Export Citation Format

Share Document