FindNonCoding: rapid and simple detection of non-coding RNAs in genomes

Author(s):  
Erik S Wright

Abstract Summary Non-coding RNAs are often neglected during genome annotation due to their difficulty of detection relative to protein coding genes. FindNonCoding takes a pattern mining approach to capture the essential sequence motifs and hairpin loops representing a non-coding RNA family and quickly identify matches in genomes. FindNonCoding was designed for ease of use and accurately finds non-coding RNAs with a low false discovery rate. Availability FindNonCoding is implemented within the DECIPHER package (v2.19.3) for R (v4.1) available from Bioconductor. Pre-trained models of common non-coding RNA families are included for bacteria, archaea, and eukarya. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 36 (1) ◽  
pp. 73-80 ◽  
Author(s):  
Mohamed Chaabane ◽  
Robert M Williams ◽  
Austin T Stephens ◽  
Juw Won Park

Abstract Motivation Over the past two decades, a circular form of RNA (circular RNA), produced through alternative splicing, has become the focus of scientific studies due to its major role as a microRNA (miRNA) activity modulator and its association with various diseases including cancer. Therefore, the detection of circular RNAs is vital to understanding their biogenesis and purpose. Prediction of circular RNA can be achieved in three steps: distinguishing non-coding RNAs from protein coding gene transcripts, separating short and long non-coding RNAs and predicting circular RNAs from other long non-coding RNAs (lncRNAs). However, the available tools are less than 80 percent accurate for distinguishing circular RNAs from other lncRNAs due to difficulty of classification. Therefore, the availability of a more accurate and fast machine learning method for the identification of circular RNAs, which considers the specific features of circular RNA, is essential to the development of systematic annotation. Results Here we present an End-to-End deep learning framework, circDeep, to classify circular RNA from other lncRNA. circDeep fuses an RCM descriptor, ACNN-BLSTM sequence descriptor and a conservation descriptor into high level abstraction descriptors, where the shared representations across different modalities are integrated. The experiments show that circDeep is not only faster than existing tools but also performs at an unprecedented level of accuracy by achieving a 12 percent increase in accuracy over the other tools. Availability and implementation https://github.com/UofLBioinformatics/circDeep. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 21 (10) ◽  
pp. 3711
Author(s):  
Melina J. Sedano ◽  
Alana L. Harrison ◽  
Mina Zilaie ◽  
Chandrima Das ◽  
Ramesh Choudhari ◽  
...  

Genome-wide RNA sequencing has shown that only a small fraction of the human genome is transcribed into protein-coding mRNAs. While once thought to be “junk” DNA, recent findings indicate that the rest of the genome encodes many types of non-coding RNA molecules with a myriad of functions still being determined. Among the non-coding RNAs, long non-coding RNAs (lncRNA) and enhancer RNAs (eRNA) are found to be most copious. While their exact biological functions and mechanisms of action are currently unknown, technologies such as next-generation RNA sequencing (RNA-seq) and global nuclear run-on sequencing (GRO-seq) have begun deciphering their expression patterns and biological significance. In addition to their identification, it has been shown that the expression of long non-coding RNAs and enhancer RNAs can vary due to spatial, temporal, developmental, or hormonal variations. In this review, we explore newly reported information on estrogen-regulated eRNAs and lncRNAs and their associated biological functions to help outline their markedly prominent roles in estrogen-dependent signaling.


2021 ◽  
Author(s):  
Hanneke Vlaming ◽  
Claudia A Mimoso ◽  
Benjamin JE Martin ◽  
Andrew R Field ◽  
Karen Adelman

Organismal growth and development rely on RNA Polymerase II (RNAPII) synthesizing the appropriate repertoire of messenger RNAs (mRNAs) from protein-coding genes. Productive elongation of full-length transcripts is essential for mRNA function, however what determines whether an engaged RNAPII molecule will terminate prematurely or transcribe processively remains poorly understood. Notably, despite a common process for transcription initiation across RNAPII-synthesized RNAs, RNAPII is highly susceptible to termination when transcribing non-coding RNAs such as upstream antisense RNAs (uaRNAs) and enhancers RNAs (eRNAs), suggesting that differences arise during RNAPII elongation. To investigate the impact of transcribed sequence on elongation potential, we developed a method to screen the effects of thousands of INtegrated Sequences on Expression of RNA and Translation using high-throughput sequencing (INSERT-seq). We found that higher AT content in uaRNAs and eRNAs, rather than specific sequence motifs, underlies the propensity for RNAPII termination on these transcripts. Further, we demonstrate that 5' splice sites exert both splicing-dependent and autonomous, splicing-independent stimulation of transcription, even in the absence of polyadenylation signals. Together, our results reveal a potent role for transcribed sequence in dictating gene output at mRNA and non-coding RNA loci, and demonstrate the power of INSERT-seq towards illuminating these contributions.


2020 ◽  
Author(s):  
Neil D. Warnock ◽  
Erwan Atcheson ◽  
Ciaran McCoy ◽  
Johnathan J. Dalzell

AbstractWe conducted a transcriptomic and small RNA analysis of infective juveniles (IJs) from three behaviourally distinct Steinernema species. Substantial variation was found in the expression of shared gene orthologues, revealing gene expression signatures that correlate with behavioural states. 97% of predicted microRNAs are novel to each species. Surprisingly, our data provide evidence that isoform variation can effectively convert protein-coding neuropeptide genes into non-coding transcripts, which may represent a new family of long non-coding RNAs. These data suggest that differences in neuropeptide gene expression, isoform variation, and small RNA interactions could contribute to behavioural differences within the Steinernema genus.


2020 ◽  
Vol 6 (3) ◽  
pp. 40
Author(s):  
Paola Briata ◽  
Roberto Gherzi

Although mammals possess roughly the same number of protein-coding genes as worms, it is evident that the non-coding transcriptome content has become far broader and more sophisticated during evolution. Indeed, the vital regulatory importance of both short and long non-coding RNAs (lncRNAs) has been demonstrated during the last two decades. RNA binding proteins (RBPs) represent approximately 7.5% of all proteins and regulate the fate and function of a huge number of transcripts thus contributing to ensure cellular homeostasis. Transcriptomic and proteomic studies revealed that RBP-based complexes often include lncRNAs. This review will describe examples of how lncRNA-RBP networks can virtually control all the post-transcriptional events in the cell.


2019 ◽  
Vol 5 (1) ◽  
pp. 15 ◽  
Author(s):  
Shrey Gandhi ◽  
Frank Ruehle ◽  
Monika Stoll

Cardiovascular diseases (CVDs) affect the heart and the vascular system with a high prevalence and place a huge burden on society as well as the healthcare system. These complex diseases are often the result of multiple genetic and environmental risk factors and pose a great challenge to understanding their etiology and consequences. With the advent of next generation sequencing, many non-coding RNA transcripts, especially long non-coding RNAs (lncRNAs), have been linked to the pathogenesis of CVD. Despite increasing evidence, the proper functional characterization of most of these molecules is still lacking. The exploration of conservation of sequences across related species has been used to functionally annotate protein coding genes. In contrast, the rapid evolutionary turnover and weak sequence conservation of lncRNAs make it difficult to characterize functional homologs for these sequences. Recent studies have tried to explore other dimensions of interspecies conservation to elucidate the functional role of these novel transcripts. In this review, we summarize various methodologies adopted to explore the evolutionary conservation of cardiovascular non-coding RNAs at sequence, secondary structure, syntenic, and expression level.


2020 ◽  
Vol 21 (21) ◽  
pp. 8252
Author(s):  
Alexander Suvorov ◽  
J. Richard Pilsner ◽  
Vladimir Naumov ◽  
Victoria Shtratnikova ◽  
Anna Zheludkevich ◽  
...  

Advanced paternal age at fertilization is a risk factor for multiple disorders in offspring and may be linked to age-related epigenetic changes in the father’s sperm. An understanding of aging-related epigenetic changes in sperm and environmental factors that modify such changes is needed. Here, we characterize changes in sperm small non-coding RNA (sncRNA) between young pubertal and mature rats. We also analyze the modification of these changes by exposure to environmental xenobiotic 2,2′,4,4′-tetrabromodiphenyl ether (BDE-47). sncRNA libraries prepared from epididymal spermatozoa were sequenced and analyzed using DESeq 2. The distribution of small RNA fractions changed with age, with fractions mapping to rRNA and lncRNA decreasing and fractions mapping to tRNA and miRNA increasing. In total, 249 miRNA, 908 piRNA and 227 tRNA-derived RNA were differentially expressed (twofold change, false discovery rate (FDR) p ≤ 0.05) between age groups in control animals. Differentially expressed miRNA and piRNA were enriched for protein-coding targets involved in development and metabolism, while piRNA were enriched for long terminal repeat (LTR) targets. BDE-47 accelerated age-dependent changes in sncRNA in younger animals, decelerated these changes in older animals and increased the variance in expression of all sncRNA. Our results indicate that the natural aging process has profound effects on sperm sncRNA profiles and this effect may be modified by environmental exposure.


2021 ◽  
Author(s):  
Kazi Rahman ◽  
Alex A. Compton

The interferon-induced transmembrane ( IFITM ) family performs multiple functions in immunity, including inhibition of virus entry into cells. The IFITM repertoire varies widely between species and consists of protein-coding genes and pseudogenes. The selective forces driving pseudogenization within gene families are rarely understood. In this issue, the human pseudogene IFITM4P is characterized as a virus-induced, long non-coding RNA that contributes to restriction of Influenza A virus by regulating mRNA levels of IFITM1 , IFITM2 , and IFITM3 .


2021 ◽  
Author(s):  
David Staněk

Abstract In this review I focus on the role of splicing in long non-coding RNA (lncRNA) life. First, I summarize differences between the splicing efficiency of protein-coding genes and lncRNAs and discuss why non-coding RNAs are spliced less efficiently. In the second half of the review, I speculate why splice sites are the most conserved sequences in lncRNAs and what additional roles could splicing play in lncRNA metabolism. I discuss the hypothesis that the splicing machinery can, besides its dominant role in intron removal and exon joining, protect cells from undesired transcripts.


Sign in / Sign up

Export Citation Format

Share Document