scholarly journals FEELnc: A tool for Long non-coding RNAs annotation and its application to the dog transcriptome

2016 ◽  
Author(s):  
V Wucher ◽  
F Legeai ◽  
B Hédan ◽  
G Rizk ◽  
L Lagoutte ◽  
...  

ABSTRACTWhole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. Among the plethora of reconstructed transcripts, one of the main bottlenecks consists in correctly identifying the different classes of RNAs, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program which accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE datasets. The program also provides several specific modules that enable to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to annotate lncRNAs even in the absence of training set of noncoding RNAs. We used FEELnc on a real dataset comprising 20 new canine RNA-seq samples produced in the frame of the European LUPA consortium to expand the canine genome annotation and classified 10,374 novel lncRNAs and 58,640 new mRNA transcripts. FEELnc represents a standardized protocol for identifying and annotating lncRNAs and is freely accessible at https://github.com/tderrien/FEELnc.

2020 ◽  
Author(s):  
Urminder Singh ◽  
Eve Syrkin Wurtele

SummarySearching for ORFs in transcripts is a critical step prior to annotating coding regions in newly-sequenced genomes and to search for alternative reading frames within known genes. With the tremendous increase in RNA-Seq data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in fasta sequences. The search is rapid and is fully customizable, with a choice of Fasta and BED output formats.Availability and implementationorfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy)[email protected], [email protected] informationSupplementary data are available at https://github.com/urmi-21/orfipy


2019 ◽  
Author(s):  
Hsin-Yen Larry Wu ◽  
Polly Yingshan Hsu

ABSTRACTBackgroundRibo-seq has revolutionized the study of mRNA translation in a genome-wide scale. High-quality Ribo-seq data display strong 3-nucleotide (nt) periodicity, which corresponds to translating ribosomes decipher three nucleotides each time. While the 3-nt periodicity has been widely used to study novel translation events and identify small open reading frames on presumed non-coding RNAs, tools which allow the visualization of those events remain underdeveloped.FindingsRiboPlotR is a visualization package written in R that presents both RNA-seq coverage and Ribo-seq reads for all annotated transcript isoforms in a context of a given gene. In particular, RiboPlotR plots Ribo-seq reads mapped in three reading frames using three colors for one isoform model at a time. Moreover, RiboPlotR shows Ribo-seq reads on upstream ORFs, 5’ and 3’ untranslated regions and introns, which is critical for observing new translation events and potential regulatory mechanisms.ConclusionsRiboPlotR is freely available (https://github.com/hsinyenwu/RiboPlotR) and allows the visualization of the translating features in Ribo-seq data.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 155 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Monica Britton ◽  
Jill Wegrzyn ◽  
Timothy Butterfield ◽  
Pedro José Martínez-García ◽  
...  

The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves existing methodologies, and implements a workflow for error estimation and correction followed by genome annotation and transcript abundance estimation for RNA-seq derived transcriptome sequences (YeATS - Yet Another Tool Suite for analyzing RNA-seq derived transcriptome). A unique feature of YeATS is the upfront determination of the errors in the sequencing or transcript assembly process by analyzing open reading frames of transcripts. YeATS identifies transcripts that have not been merged, result in broken open reading frames or contain long repeats as erroneous transcripts. We present the YeATS workflow using a representative sample of the transcriptome from the tissue at the heartwood/sapwood transition zone in black walnut. A novel feature of the transcriptome that emerged from our analysis was the identification of a highly abundant transcript that had no known homologous genes (GenBank accession: KT023102). The amino acid composition of the longest open reading frame of this gene classifies this as a putative extensin. Also, we corroborated the transcriptional abundance of proline-rich proteins, dehydrins, senescence-associated proteins, and the DNAJ family of chaperone proteins. Thus, YeATS presents a workflow for analyzing RNA-seq data with several innovative features that differentiate it from existing software.


PLoS ONE ◽  
2016 ◽  
Vol 11 (10) ◽  
pp. e0165429 ◽  
Author(s):  
Julia Hahn ◽  
Olga V. Tsoy ◽  
Sebastian Thalmann ◽  
Jelena Čuklina ◽  
Mikhail S. Gelfand ◽  
...  

2021 ◽  
Vol 12 ◽  
Author(s):  
Jing Li ◽  
Urminder Singh ◽  
Zebulun Arendsee ◽  
Eve Syrkin Wurtele

The “dark transcriptome” can be considered the multitude of sequences that are transcribed but not annotated as genes. We evaluated expression of 6,692 annotated genes and 29,354 unannotated open reading frames (ORFs) in the Saccharomyces cerevisiae genome across diverse environmental, genetic and developmental conditions (3,457 RNA-Seq samples). Over 30% of the highly transcribed ORFs have translation evidence. Phylostratigraphic analysis infers most of these transcribed ORFs would encode species-specific proteins (“orphan-ORFs”); hundreds have mean expression comparable to annotated genes. These data reveal unannotated ORFs most likely to be protein-coding genes. We partitioned a co-expression matrix by Markov Chain Clustering; the resultant clusters contain 2,468 orphan-ORFs. We provide the aggregated RNA-Seq yeast data with extensive metadata as a project in MetaOmGraph (MOG), a tool designed for interactive analysis and visualization. This approach enables reuse of public RNA-Seq data for exploratory discovery, providing a rich context for experimentalists to make novel, experimentally testable hypotheses about candidate genes.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Endika Varela-Martínez ◽  
Martin Bilbao-Arribas ◽  
Naiara Abendaño ◽  
Javier Asín ◽  
Marta Pérez ◽  
...  

Abstract Aluminium hydroxide adjuvants are crucial for livestock and human vaccines. Few studies have analysed their effect on the central nervous system in vivo. In this work, lambs received three different treatments of parallel subcutaneous inoculations during 16 months with aluminium-containing commercial vaccines, an equivalent dose of aluminium hydroxide or mock injections. Brain samples were sequenced by RNA-seq and miRNA-seq for the expression analysis of mRNAs, long non-coding RNAs and microRNAs and three expression comparisons were made. Although few differentially expressed genes were identified, some dysregulated genes by aluminium hydroxide alone were linked to neurological functions, the lncRNA TUNA among them, or were enriched in mitochondrial energy metabolism related functions. In the same way, the miRNA expression was mainly disrupted by the adjuvant alone treatment. Some differentially expressed miRNAs had been previously linked to neurological diseases, oxidative stress and apoptosis. In brief, in this study aluminium hydroxide alone altered the transcriptome of the encephalon to a higher degree than commercial vaccines that present a milder effect. The expression changes in the animals inoculated with aluminium hydroxide suggest mitochondrial disfunction. Further research is needed to elucidate to which extent these changes could have pathological consequences.


2013 ◽  
Vol 11 (05) ◽  
pp. 1342002 ◽  
Author(s):  
ASHIS KUMER BISWAS ◽  
BAOJU ZHANG ◽  
XIAOYONG WU ◽  
JEAN X. GAO

The statistics about the open reading frames, the base compositions and the properties of the predicted secondary structures have potential to address the problem of discriminating coding and noncoding transcripts. Again, the Next Generation Sequencing platform, RNA-seq, provides us bounty of data from which expression profiles of the transcripts can be extracted which urged us adding a new set of dimension in this classification task. In this paper, we proposed CNCTDiscriminator — a coding and noncoding transcript discriminating system where we applied the integration of these four categories of features about the transcripts. The feature integration was done using both hypothesis learning and feature specific ensemble learning approaches. The CNCTDiscriminator model which was trained with composition and ORF features outperforms (precision 83.86%, recall 82.01%) other three popular methods — CPC (precision 98.31%, recall 25.95%), CPAT (precision 97.74%, recall 52.50%) and PORTRAIT (precision 84.37%, recall 73.2%) when applied to an independent benchmark dataset. However, the CNCTDiscriminator model that was trained using the ensemble approach shows comparable performance (precision 89.85%, recall 71.08%).


Sign in / Sign up

Export Citation Format

Share Document