scholarly journals Ranked Choice Voting for Representative Transcripts with TRaCE

2020 ◽  
Author(s):  
Andrew J Olson ◽  
Doreen Ware

Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an 'election' in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts.

2013 ◽  
Vol 42 (5) ◽  
pp. 2820-2832 ◽  
Author(s):  
Nicolas Philippe ◽  
Elias Bou Samra ◽  
Anthony Boureux ◽  
Alban Mancheron ◽  
Florence Rufflé ◽  
...  

Abstract Recent sequencing technologies that allow massive parallel production of short reads are the method of choice for transcriptome analysis. Particularly, digital gene expression (DGE) technologies produce a large dynamic range of expression data by generating short tag signatures for each cell transcript. These tags can be mapped back to a reference genome to identify new transcribed regions that can be further covered by RNA-sequencing (RNA-Seq) reads. Here, we applied an integrated bioinformatics approach that combines DGE tags, RNA-Seq, tiling array expression data and species-comparison to explore new transcriptional regions and their specific biological features, particularly tissue expression or conservation. We analysed tags from a large DGE data set (designated as ‘TranscriRef’). We then annotated 750 000 tags that were uniquely mapped to the human genome according to Ensembl. We retained transcripts originating from both DNA strands and categorized tags corresponding to protein-coding genes, antisense, intronic- or intergenic-transcribed regions and computed their overlap with annotated non-coding transcripts. Using this bioinformatics approach, we identified ∼34 000 novel transcribed regions located outside the boundaries of known protein-coding genes. As demonstrated using sequencing data from human pluripotent stem cells for biological validation, the method could be easily applied for the selection of tissue-specific candidate transcripts. DigitagCT is available at http://cractools.gforge.inria.fr/softwares/digitagct.


2020 ◽  
Vol 49 (D1) ◽  
pp. D924-D931 ◽  
Author(s):  
Richard M Baldarelli ◽  
Constance M Smith ◽  
Jacqueline H Finger ◽  
Terry F Hayamizu ◽  
Ingeborg J McCright ◽  
...  

Abstract The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental gene expression information. For many years, GXD has collected and integrated data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot, and western blot experiments through curation of the scientific literature and by collaborations with large-scale expression projects. Since our last report in 2019, we have continued to acquire these classical types of expression data; developed a searchable index of RNA-Seq and microarray experiments that allows users to quickly and reliably find specific mouse expression studies in ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) and GEO (https://www.ncbi.nlm.nih.gov/geo/); and expanded GXD to include RNA-Seq data. Uniformly processed RNA-Seq data are imported from the EBI Expression Atlas and then integrated with the other types of expression data in GXD, and with the genetic, functional, phenotypic and disease-related information in Mouse Genome Informatics (MGI). This integration has made the RNA-Seq data accessible via GXD’s enhanced searching and filtering capabilities. Further, we have embedded the Morpheus heat map utility into the GXD user interface to provide additional tools for display and analysis of RNA-Seq data, including heat map visualization, sorting, filtering, hierarchical clustering, nearest neighbors analysis and visual enrichment.


2018 ◽  
Author(s):  
Sonali Arora ◽  
Siobhan S. Pattwell ◽  
Eric C. Holland ◽  
Hamid Bolouri

RNA-sequencing data is widely used to identify disease biomarkers and therapeutic targets. Here, using data from five RNA-seq processing pipelines applied to 6,690 human tumor and normal tissues, we show that for >12% of protein-coding genes, in at least 1% of samples, current best-in-class RNA-seq processing pipelines differ in their abundance estimates by more than four-fold using the same samples and the same set of RNA-seq reads, raising clinical concern.


2021 ◽  
Vol 22 (14) ◽  
pp. 7298
Author(s):  
Izabela Rudzińska ◽  
Małgorzata Cieśla ◽  
Tomasz W. Turowski ◽  
Alicja Armatowska ◽  
Ewa Leśniewska ◽  
...  

The coordinated transcription of the genome is the fundamental mechanism in molecular biology. Transcription in eukaryotes is carried out by three main RNA polymerases: Pol I, II, and III. One basic problem is how a decrease in tRNA levels, by downregulating Pol III efficiency, influences the expression pattern of protein-coding genes. The purpose of this study was to determine the mRNA levels in the yeast mutant rpc128-1007 and its overdose suppressors, RBS1 and PRT1. The rpc128-1007 mutant prevents assembly of the Pol III complex and functionally mimics similar mutations in human Pol III, which cause hypomyelinating leukodystrophies. We applied RNAseq followed by the hierarchical clustering of our complete RNA-seq transcriptome and functional analysis of genes from the clusters. mRNA upregulation in rpc128-1007 cells was generally stronger than downregulation. The observed induction of mRNA expression was mostly indirect and resulted from the derepression of general transcription factor Gcn4, differently modulated by suppressor genes. rpc128-1007 mutation, regardless of the presence of suppressors, also resulted in a weak increase in the expression of ribosome biogenesis genes. mRNA genes that were downregulated by the reduction of Pol III assembly comprise the proteasome complex. In summary, our results provide the regulatory links affected by Pol III assembly that contribute differently to cellular fitness.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 320
Author(s):  
Lorissa I. McDougall ◽  
Ryan M. Powell ◽  
Magdalena Ratajska ◽  
Chi F. Lynch-Sutherland ◽  
Sultana Mehbuba Hossain ◽  
...  

Melanoma comprises <5% of cutaneous malignancies, yet it causes a significant proportion of skin cancer-related deaths worldwide. While new therapies for melanoma have been developed, not all patients respond well. Thus, further research is required to better predict patient outcomes. Using long-range nanopore sequencing, RT-qPCR, and RNA sequencing analyses, we examined the transcription of BARD1 splice isoforms in melanoma cell lines and patient tissue samples. Seventy-six BARD1 mRNA variants were identified in total, with several previously characterised isoforms (γ, φ, δ, ε, and η) contributing to a large proportion of the expressed transcripts. In addition, we identified four novel splice events, namely, Δ(E3_E9), ▼(i8), IVS10+131▼46, and IVS10▼176, occurring in various combinations in multiple transcripts. We found that short-read RNA-Seq analyses were limited in their ability to predict isoforms containing multiple non-contiguous splicing events, as compared to long-range nanopore sequencing. These studies suggest that further investigations into the functional significance of the identified BARD1 splice variants in melanoma are warranted.


2021 ◽  
Vol 22 (5) ◽  
pp. 2683
Author(s):  
Princess D. Rodriguez ◽  
Hana Paculova ◽  
Sophie Kogut ◽  
Jessica Heath ◽  
Hilde Schjerven ◽  
...  

Non-coding RNAs (ncRNAs) comprise a diverse class of non-protein coding transcripts that regulate critical cellular processes associated with cancer. Advances in RNA-sequencing (RNA-Seq) have led to the characterization of non-coding RNA expression across different types of human cancers. Through comprehensive RNA-Seq profiling, a growing number of studies demonstrate that ncRNAs, including long non-coding RNA (lncRNAs) and microRNAs (miRNA), play central roles in progenitor B-cell acute lymphoblastic leukemia (B-ALL) pathogenesis. Furthermore, due to their central roles in cellular homeostasis and their potential as biomarkers, the study of ncRNAs continues to provide new insight into the molecular mechanisms of B-ALL. This article reviews the ncRNA signatures reported for all B-ALL subtypes, focusing on technological developments in transcriptome profiling and recently discovered examples of ncRNAs with biologic and therapeutic relevance in B-ALL.


2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Peter H. Sudmant ◽  
Maria S. Alexis ◽  
Christopher B. Burge

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Étienne Fafard-Couture ◽  
Danny Bergeron ◽  
Sonia Couture ◽  
Sherif Abou-Elela ◽  
Michelle S. Scott

Abstract Background Small nucleolar RNAs (snoRNAs) are mid-size non-coding RNAs required for ribosomal RNA modification, implying a ubiquitous tissue distribution linked to ribosome synthesis. However, increasing numbers of studies identify extra-ribosomal roles of snoRNAs in modulating gene expression, suggesting more complex snoRNA abundance patterns. Therefore, there is a great need for mapping the snoRNome in different human tissues as the blueprint for snoRNA functions. Results We used a low structure bias RNA-Seq approach to accurately quantify snoRNAs and compare them to the entire transcriptome in seven healthy human tissues (breast, ovary, prostate, testis, skeletal muscle, liver, and brain). We identify 475 expressed snoRNAs categorized in two abundance classes that differ significantly in their function, conservation level, and correlation with their host gene: 390 snoRNAs are uniformly expressed and 85 are enriched in the brain or reproductive tissues. Most tissue-enriched snoRNAs are embedded in lncRNAs and display strong correlation of abundance with them, whereas uniformly expressed snoRNAs are mostly embedded in protein-coding host genes and are mainly non- or anticorrelated with them. Fifty-nine percent of the non-correlated or anticorrelated protein-coding host gene/snoRNA pairs feature dual-initiation promoters, compared to only 16% of the correlated non-coding host gene/snoRNA pairs. Conclusions Our results demonstrate that snoRNAs are not a single homogeneous group of housekeeping genes but include highly regulated tissue-enriched RNAs. Indeed, our work indicates that the architecture of snoRNA host genes varies to uncouple the host and snoRNA expressions in order to meet the different snoRNA abundance levels and functional needs of human tissues.


2014 ◽  
Vol 14 (1) ◽  
pp. 169 ◽  
Author(s):  
Lei Wang ◽  
Chenlong Cao ◽  
Qibin Ma ◽  
Qiaoying Zeng ◽  
Haifeng Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document