scholarly journals RNAsamba: coding potential assessment using ORF and whole transcript sequence information

2019 ◽  
Author(s):  
Antonio P. Camargo ◽  
Vsevolod Sourkov ◽  
Marcelo F. Carazzolle

AbstractMotivationThe advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveil the biological roles of genomic elements, being one of the main tasks the identification of protein-coding and long non-coding RNAs.ResultsWe describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a deep-learning model that processes both the whole sequence and the ORF to look for patterns that distinguish coding and non-coding RNAs. We evaluated the model in the classification of coding and non-coding transcripts of humans and five other model organisms and show that RNAsamba mostly outperforms other state-of-the-art methods. We also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its model is not dependent on the presence of complete coding regions. RNAsamba is a fast and easy tool that can provide valuable contributions to genome annotation pipelines.Availability and implementationThe source code of RNAsamba is freely available at:https://github.com/apcamargo/RNAsamba.

2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Antonio P Camargo ◽  
Vsevolod Sourkov ◽  
Gonçalo A G Pereira ◽  
Marcelo F Carazzolle

Abstract The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.


Genome ◽  
2016 ◽  
Vol 59 (4) ◽  
pp. 263-275 ◽  
Author(s):  
Mohammad Reza Bakhtiarizadeh ◽  
Batool Hosseinpour ◽  
Babak Arefnezhad ◽  
Narges Shamabadi ◽  
Seyed Alireza Salami

Long non-coding RNAs (lncRNAs) are transcribed RNA molecules >200 nucleotides in length that do not encode proteins and serve as key regulators of diverse biological processes. Recently, thousands of long intergenic non-coding RNAs (lincRNAs), a type of lncRNAs, have been identified in mammalians using massive parallel large sequencing technologies. The availability of the genome sequence of sheep (Ovis aries) has allowed us genomic prediction of non-coding RNAs. This is the first study to identify lincRNAs using RNA-seq data of eight different tissues of sheep, including brain, heart, kidney, liver, lung, ovary, skin, and white adipose. A computational pipeline was employed to characterize 325 putative lincRNAs with high confidence from eight important tissues of sheep using different criteria such as GC content, exon number, gene length, co-expression analysis, stability, and tissue-specific scores. Sixty-four putative lincRNAs displayed tissues-specific expression. The highest number of tissues-specific lincRNAs was found in skin and brain. All novel lincRNAs that aligned to the human and mouse lincRNAs had conserved synteny. These closest protein-coding genes were enriched in 11 significant GO terms such as limb development, appendage development, striated muscle tissue development, and multicellular organismal development. The findings reported here have important implications for the study of sheep genome.


2021 ◽  
Vol 22 (16) ◽  
pp. 8719
Author(s):  
Muhammad Nabeel Asim ◽  
Muhammad Ali Ibrahim ◽  
Muhammad Imran Malik ◽  
Andreas Dengel ◽  
Sheraz Ahmed

Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.


2021 ◽  
Vol 14 ◽  
Author(s):  
Rafaela Policarpo ◽  
Annerieke Sierksma ◽  
Bart De Strooper ◽  
Constantin d’Ydewalle

Recent advances in RNA sequencing technologies helped to uncover the existence of tens of thousands of long non-coding RNAs (lncRNAs) that arise from the dark matter of the genome. These lncRNAs were originally thought to be transcriptional noise but an increasing number of studies demonstrate that these transcripts can modulate protein-coding gene expression by a wide variety of transcriptional and post-transcriptional mechanisms. The spatiotemporal regulation of lncRNA expression is particularly evident in the central nervous system, suggesting that they may directly contribute to specific brain processes, including neurogenesis and cellular homeostasis. Not surprisingly, lncRNAs are therefore gaining attention as putative novel therapeutic targets for disorders of the brain. In this review, we summarize the recent insights into the functions of lncRNAs in the brain, their role in neuronal maintenance, and their potential contribution to disease. We conclude this review by postulating how these RNA molecules can be targeted for the treatment of yet incurable neurological disorders.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Garima Bhatia ◽  
Santosh K. Upadhyay ◽  
Anuradha Upadhyay ◽  
Kashmir Singh

Abstract Background Long non-coding RNAs (lncRNAs) are regulatory transcripts of length > 200 nt. Owing to the rapidly progressing RNA-sequencing technologies, lncRNAs are emerging as considerable nodes in the plant antifungal defense networks. Therefore, we investigated their role in Vitis vinifera (grapevine) in response to obligate biotrophic fungal phytopathogens, Erysiphe necator (powdery mildew, PM) and Plasmopara viticola (downy mildew, DM), which impose huge agro-economic burden on grape-growers worldwide. Results Using computational approach based on RNA-seq data, 71 PM- and 83 DM-responsive V. vinifera lncRNAs were identified and comprehensively examined for their putative functional roles in plant defense response. V. vinifera protein coding sequences (CDS) were also profiled based on expression levels, and 1037 PM-responsive and 670 DM-responsive CDS were identified. Next, co-expression analysis-based functional annotation revealed their association with gene ontology (GO) terms for ‘response to stress’, ‘response to biotic stimulus’, ‘immune system process’, etc. Further investigation based on analysis of domains, enzyme classification, pathways enrichment, transcription factors (TFs), interactions with microRNAs (miRNAs), and real-time quantitative PCR of lncRNAs and co-expressing CDS pairs suggested their involvement in modulation of basal and specific defense responses such as: Ca2+-dependent signaling, cell wall reinforcement, reactive oxygen species metabolism, pathogenesis related proteins accumulation, phytohormonal signal transduction, and secondary metabolism. Conclusions Overall, the identified lncRNAs provide insights into the underlying intricacy of grapevine transcriptional reprogramming/post-transcriptional regulation to delay or seize the living cell-dependent pathogen growth. Therefore, in addition to defense-responsive genes such as TFs, the identified lncRNAs can be further examined and leveraged to candidates for biotechnological improvement/breeding to enhance fungal stress resistance in this susceptible fruit crop of economic and nutritional importance.


The Auk ◽  
2019 ◽  
Vol 136 (4) ◽  
Author(s):  
Erik R Funk ◽  
Scott A Taylor

Abstract Avian evolution has generated an impressive array of patterns and colors in the ~10,000 bird species that exist on Earth. Recently, a number of exciting studies have utilized whole-genome sequencing to reveal new details on the genetics of avian plumage color. These findings provide compelling evidence for genes that underlie plumage variation across a wide variety of bird species (e.g., juncos, warblers, seedeaters, and estrildid finches). While much is known about large, body-wide color changes, these species exhibit discrete color differences across small plumage patches. Many genetic differences appear to be located in regulatory regions of genes rather than in protein-coding regions, suggesting gene expression is playing a large role in the control of these color patches. Taken together, these studies have the potential to broadly facilitate further research of sexual selection and evolution in these charismatic taxa.


2005 ◽  
Vol 79 (12) ◽  
pp. 7570-7596 ◽  
Author(s):  
Luciano Brocchieri ◽  
Thomas N. Kledal ◽  
Samuel Karlin ◽  
Edward S. Mocarski

ABSTRACT Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis.


2018 ◽  
Vol 45 (3) ◽  
pp. 1191-1204 ◽  
Author(s):  
JingJing Wu ◽  
Swei Sunny Hann

Nasopharyngeal carcinoma (NPC) is one of the most common cancers originating in the nasopharynx and occurring at high frequency in South-eastern Asia and North Africa. Long non-coding RNAs (lncRNAs) are a class of non-protein-coding RNA molecules and key regulators of developmental, physiological, and pathological processes in humans. Emerging studies have shown that lncRNAs play critical roles in tumorgenicity and cancer prognosis. With the development of deep sequencing analyses, an extensive amount of functional lncRNAs have been discovered in nasopharyngeal carcinoma tissues and cell lines. However, the roles and mechanisms of aberrantly expressed lncRNAs in the pathogenesis of NPC are not fully understood. In this review, we briefly illustrate the concept, identification, functional characterization, and summarize recent advancements of biological functions of lncRNAs with heterogeneous mechanistic characterization and their involvement in NPC. Then, we describe individual lncRNAs that have been associated with tumorgenesis, growth, invasion, cancer stem cell differentiation, metastasis, drug resistance and discuss the strategies of their therapeutic manipulation in NPC. We also review the emerging insights into the role of lncRNAs and their potential as biomarkers and therapeutic targets for novel treatment paradigms. Finally, we highlight the up-to-date of clinical information involving lncRNAs and future directions in the linking lncRNAs to potential gene therapies, and how modifications of lncRNAs can be exploited for prevention and treatment of NPC.


2019 ◽  
Vol 109 (6) ◽  
pp. 983-992 ◽  
Author(s):  
Dan Edward V. Villamor ◽  
Kenneth C. Eastwell

Western X (WX) disease, caused by ‘Candidatus Phytoplasma pruni’, is a devastating disease of sweet cherry resulting in the production of small, bitter-flavored fruits that are unmarketable. Escalation of WX disease in Washington State prompted the development of a rapid detection assay based on recombinase polymerase amplification (RPA) to facilitate timely removal and replacement of diseased trees. Here, we report on a reliable RPA assay targeting putative immunodominant protein coding regions that showed comparable sensitivity to polymerase chain reaction (PCR) in detecting ‘Ca. Phytoplasma pruni’ from crude sap of sweet cherry tissues. Apart from the predominant strain of ‘Ca. Phytoplasma pruni’, the RPA assay also detected a novel strain of phytoplasma from several WX-affected trees. Multilocus sequence analyses using the immunodominant protein A (idpA), imp, rpoE, secY, and 16S ribosomal RNA regions from several ‘Ca. Phytoplasma pruni’ isolates from WX-affected trees showed that this novel phytoplasma strain represents a new subgroup within the 16SrIII group. Examination of high-throughput sequencing data from total RNA of WX-affected trees revealed that the imp coding region is highly expressed, and as supported by quantitative reverse transcription PCR data, it showed higher RNA transcript levels than the previously proposed idpA coding region of ‘Ca. Phytoplasma pruni’.


Planta ◽  
2020 ◽  
Vol 252 (5) ◽  
Author(s):  
Li Chen ◽  
Qian-Hao Zhu ◽  
Kerstin Kaufmann

Abstract Main conclusion Long non-coding RNAs modulate gene activity in plant development and stress responses by various molecular mechanisms. Abstract Long non-coding RNAs (lncRNAs) are transcripts larger than 200 nucleotides without protein coding potential. Computational approaches have identified numerous lncRNAs in different plant species. Research in the past decade has unveiled that plant lncRNAs participate in a wide range of biological processes, including regulation of flowering time and morphogenesis of reproductive organs, as well as abiotic and biotic stress responses. LncRNAs execute their functions by interacting with DNA, RNA and protein molecules, and by modulating the expression level of their targets through epigenetic, transcriptional, post-transcriptional or translational regulation. In this review, we summarize characteristics of plant lncRNAs, discuss recent progress on understanding of lncRNA functions, and propose an experimental framework for functional characterization.


Sign in / Sign up

Export Citation Format

Share Document