High-confidence Coding and Noncoding Transcriptome Maps

AbstractThe advent of high-throughput RNA-sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising ninety-nine billion RNAs-seq reads from the ENCODE, human BodyMap projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about eighty-nine billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.

Download Full-text

Enabling cross-study analysis of RNA-Sequencing data

10.1101/110734 ◽

2017 ◽

Cited By ~ 4

Author(s):

Qingguo Wang ◽

Joshua Armenia ◽

Chao Zhang ◽

Alexander V. Penson ◽

Ed Reznik ◽

...

Keyword(s):

Large Scale ◽

Tissue Expression ◽

Underlying Disease ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Sequencing Data ◽

Whole Transcriptome Sequencing ◽

Cancer Genome Atlas ◽

Next Generation Sequencing Ngs ◽

Different Sources

AbstractDriven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data. While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources poses a great challenge, due to differences in sample and data processing. Here, we present a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment and gene expression quantification as well as batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA) and have successfully corrected for study-specific biases, enabling comparative analysis across studies. The normalized data are available for download via GitHub (at https://github.com/mskcc/RNAseqDB).

Download Full-text

Large-scale profiling of microRNAs for The Cancer Genome Atlas

Nucleic Acids Research ◽

10.1093/nar/gkv808 ◽

2015 ◽

Vol 44 (1) ◽

pp. e3-e3 ◽

Cited By ~ 62

Author(s):

Andy Chu ◽

Gordon Robertson ◽

Denise Brooks ◽

Andrew J. Mungall ◽

Inanc Birol ◽

...

Keyword(s):

Large Scale ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Cancer Genome Atlas ◽

Genome Atlas

Download Full-text

PR/SET Domain Family and Cancer: Novel Insights from the Cancer Genome Atlas

International Journal of Molecular Sciences ◽

10.3390/ijms19103250 ◽

2018 ◽

Vol 19 (10) ◽

pp. 3250 ◽

Cited By ~ 9

Author(s):

Anna Sorrentino ◽

Antonio Federico ◽

Monica Rienzo ◽

Patrizia Gazzerro ◽

Maurizio Bifulco ◽

...

Keyword(s):

Family Members ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Driver Gene ◽

Rna Seq ◽

Set Domain ◽

Primary Tumors ◽

Cancer Genome Atlas ◽

Pan Cancer ◽

Genome Atlas

The PR/SET domain gene family (PRDM) encodes 19 different transcription factors that share a subtype of the SET domain [Su(var)3-9, enhancer-of-zeste and trithorax] known as the PRDF1-RIZ (PR) homology domain. This domain, with its potential methyltransferase activity, is followed by a variable number of zinc-finger motifs, which likely mediate protein–protein, protein–RNA, or protein–DNA interactions. Intriguingly, almost all PRDM family members express different isoforms, which likely play opposite roles in oncogenesis. Remarkably, several studies have described alterations in most of the family members in malignancies. Here, to obtain a pan-cancer overview of the genomic and transcriptomic alterations of PRDM genes, we reanalyzed the Exome- and RNA-Seq public datasets available at The Cancer Genome Atlas portal. Overall, PRDM2, PRDM3/MECOM, PRDM9, PRDM16 and ZFPM2/FOG2 were the most mutated genes with pan-cancer frequencies of protein-affecting mutations higher than 1%. Moreover, we observed heterogeneity in the mutation frequencies of these genes across tumors, with cancer types also reaching a value of about 20% of mutated samples for a specific PRDM gene. Of note, ZFPM1/FOG1 mutations occurred in 50% of adrenocortical carcinoma patients and were localized in a hotspot region. These findings, together with OncodriveCLUST results, suggest it could be putatively considered a cancer driver gene in this malignancy. Finally, transcriptome analysis from RNA-Seq data of paired samples revealed that transcription of PRDMs was significantly altered in several tumors. Specifically, PRDM12 and PRDM13 were largely overexpressed in many cancers whereas PRDM16 and ZFPM2/FOG2 were often downregulated. Some of these findings were also confirmed by real-time-PCR on primary tumors.

Download Full-text

A practical guide to buildde-novoassemblies for single tissues of non-model organisms: the example of a Neotropical frog

PeerJ ◽

10.7717/peerj.3702 ◽

2017 ◽

Vol 5 ◽

pp. e3702 ◽

Cited By ~ 5

Author(s):

Santiago Montero-Mendieta ◽

Manfred Grabherr ◽

Henrik Lantz ◽

Ignacio De la Riva ◽

Jennifer A. Leonard ◽

...

Keyword(s):

Defense Mechanisms ◽

De Novo ◽

Transcriptome Assembly ◽

Cost Effective ◽

Model Organisms ◽

Rna Seq ◽

Assembly Pipeline ◽

Wide Variability ◽

History Of ◽

Inexperienced User

Whole genome sequencing (WGS) is a very valuable resource to understand the evolutionary history of poorly known species. However, in organisms with large genomes, as most amphibians, WGS is still excessively challenging and transcriptome sequencing (RNA-seq) represents a cost-effective tool to explore genome-wide variability. Non-model organisms do not usually have a reference genome and the transcriptome must be assembledde-novo. We used RNA-seq to obtain the transcriptomic profile forOreobates cruralis, a poorly known South American direct-developing frog. In total, 550,871 transcripts were assembled, corresponding to 422,999 putative genes. Of those, we identified 23,500, 37,349, 38,120 and 45,885 genes present in the Pfam, EggNOG, KEGG and GO databases, respectively. Interestingly, our results suggested that genes related to immune system and defense mechanisms are abundant in the transcriptome ofO. cruralis. We also present a pipeline to assist with pre-processing, assembling, evaluating and functionally annotating ade-novotranscriptome from RNA-seq data of non-model organisms. Our pipeline guides the inexperienced user in an intuitive way through all the necessary steps to buildde-novotranscriptome assemblies using readily available software and is freely available at:https://github.com/biomendi/TRANSCRIPTOME-ASSEMBLY-PIPELINE/wiki.

Download Full-text

Visual Display of 5p-arm and 3p-arm miRNA Expression with a Mobile Application

BioMed Research International ◽

10.1155/2017/6037168 ◽

2017 ◽

Vol 2017 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Chao-Yu Pan ◽

Wei-Ting Kuo ◽

Chien-Yuan Chiu ◽

Wen-chang Lin

Keyword(s):

Mirna Expression ◽

Expression Patterns ◽

Mirna Gene ◽

Visual Display ◽

Mature Mirnas ◽

Mobile App ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Cancer Genome Atlas ◽

User Friendly

MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.

Download Full-text

ERK1/2 signaling regulates the immune microenvironment and macrophage recruitment in glioblastoma

Bioscience Reports ◽

10.1042/bsr20191433 ◽

2019 ◽

Vol 39 (9) ◽

Cited By ~ 4

Author(s):

Claire Lailler ◽

Christophe Louandre ◽

Mony Chenda Morisse ◽

Thomas Lhossein ◽

Corinne Godin ◽

...

Keyword(s):

Tumor Microenvironment ◽

Recent Observation ◽

The Cancer Genome Atlas ◽

Response To Treatment ◽

Mrna Levels ◽

Rna Seq ◽

Immune Microenvironment ◽

Oncogenic Signaling ◽

Cancer Genome Atlas ◽

Human Gbm

Abstract The tumor microenvironment is an important determinant of glioblastoma (GBM) progression and response to treatment. How oncogenic signaling in GBM cells modulates the composition of the tumor microenvironment and its activation is unclear. We aimed to explore the potential local immunoregulatory function of ERK1/2 signaling in GBM. Using proteomic and transcriptomic data (RNA seq) available for GBM tumors from The Cancer Genome Atlas (TCGA), we show that GBM with high levels of phosphorylated ERK1/2 have increased infiltration of tumor-associated macrophages (TAM) with a non-inflammatory M2 polarization. Using three human GBM cell lines in culture, we confirmed the existence of ERK1/2-dependent regulation of the production of the macrophage chemoattractant CCL2/MCP1. In contrast with this positive regulation of TAM recruitment, we found no evidence of a direct effect of ERK1/2 signaling on two other important aspects of TAM regulation by GBM cells: (1) the expression of the immune checkpoint ligands PD-L1 and PD-L2, expressed at high mRNA levels in GBM compared with other solid tumors; (2) the production of the tumor metabolite lactate recently reported to dampen tumor immunity by interacting with the receptor GPR65 present on the surface of TAM. Taken together, our observations suggest that ERK1/2 signaling regulates the recruitment of TAM in the GBM microenvironment. These findings highlight some potentially important particularities of the immune microenvironment in GBM and could provide an explanation for the recent observation that GBM with activated ERK1/2 signaling may respond better to anti-PD1 therapeutics.

Download Full-text

IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences

Bioinformatics ◽

10.1093/bioinformatics/btz247 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4469-4471 ◽

Cited By ~ 21

Author(s):

Kristoffer Vitting-Seerup ◽

Albin Sandelin

Keyword(s):

Alternative Splicing ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Rna Seq ◽

Genome Wide ◽

Functional Consequences ◽

Cancer Genome Atlas ◽

Health And Disease ◽

Splicing Patterns

Abstract Summary Alternative splicing is an important mechanism involved in health and disease. Recent work highlights the importance of investigating genome-wide changes in splicing patterns and the subsequent functional consequences. Current computational methods only support such analysis on a gene-by-gene basis. Therefore, we extended IsoformSwitchAnalyzeR R library to enable analysis of genome-wide changes in specific types of alternative splicing and predicted functional consequences of the resulting isoform switches. As a case study, we analyzed RNA-seq data from The Cancer Genome Atlas and found systematic changes in alternative splicing and the consequences of the associated isoform switches. Availability and implementation Windows, Linux and Mac OS: http://bioconductor.org/packages/IsoformSwitchAnalyzeR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Target variant detection in leukemia using unaligned RNA-Seq reads

10.1101/295808 ◽

2018 ◽

Cited By ~ 1

Author(s):

Eric Olivier Audemard ◽

Patrick Gendron ◽

Vincent-Philippe Lavallée ◽

Josée Hébert ◽

Guy Sauvageau ◽

...

Keyword(s):

Variant Calling ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Read Mapping ◽

Targeted Mutation ◽

Cancer Genome Atlas ◽

Computationally Intensive ◽

And Performance ◽

Next Generation Sequencing Ngs ◽

Ngs Data

AbstractMutations identified in each Acute Myeloid Leukemia (AML) patients are useful for prognosis and to select targeted therapies. Detection of such mutations by the analysis of Next-Generation Sequencing (NGS) data requires a computationally intensive read mapping step and application of several variant calling methods. Targeted mutation identification drastically shifts the usual tradeoff between accuracy and performance by concentrating all computations over a small portion of sequence space. Here, we present km, an efficient approach leveraging k-mer decomposition of reads to identify targeted mutations. Our approach is versatile, as it can detect single-base mutations, several types of insertions and deletions, as well as fusions. We used two independent AML cohorts (The Cancer Genome Atlas and Leucegene), to show that mutation detection bykmis fast, accurate and mainly limited by sequencing depth. Therefore,kmallows to establish fast diagnostics from NGS data, and could be suitable for clinical applications.

Download Full-text

Μελέτη μοριακών σηματοδοτικών μηχανισμών επαγωγής της κυτταρικής επιβίωσης και αποδιαφοροποίησης στον καρκίνο

10.12681/eadd/46398 ◽

2019 ◽

Author(s):

Ευστάθιος-Ιάσων Βλαχάβας

Keyword(s):

Cancer Genome ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

Cancer Genome Atlas ◽

Genome Atlas

Ο όρος καρκίνος χρησιμοποιείται όχι για μια ασθένεια, αλλά για ένα ευρύτερο σύνολο συσχετιζόμενων νοσημάτων, οι οποίες περιγράφονται από ένα κοινό χαρακτηριστικό: την αφύσικη ανάπτυξη κυττάρων που διαιρούνται ανεξέλεγκτα και μπορούν να διηθήσουν παρακείμενους ιστούς. Η μελέτη των μοριακών μηχανισμών που εμπλέκονται στη παθοφυσιολογία των καρκίνου, αποτελεί πεδίο έντονης έρευνας, γιατί η κατανόησή τους συνδέεται άμεσα με την ορθή αντιμετώπιση της νόσου. Στην παρούσα διατριβή, εστιάζουμε στον καρκίνο του παχέος εντέρου. Η πλειοψηφία των όγκων του παχέος εντέρου είναι αδενοκαρκινώματα, τα οποία προκύπτουν από τη δημιουργία πολυπόδων που σχηματίζονται στο εσωτερικό τοίχωμα του παχέος εντέρου και μπορούν να εξελιχθούν σε νεοπλασίες. Ο καρκίνος του παχέος εντέρου είναι μια ιδιαίτερα σύνθετη ασθένεια, με έντονα μεταβλητά μοριακά και γενετικά χαρακτηριστικά και με διαφορική απόκριση σε φαρμακευτικά σχήματα. Περίπου το 40% των περιπτώσεων ανιχνεύονται σε πρώιμο στάδιο με το ποσοστό της πενταετούς επιβίωσης να κυμαίνεται στο 90%. Επιπρόσθετα, αποτελεί μια αιτία θνησιμότητας στις χώρες του ανεπτυγμένου κόσμου, κυρίως λόγω του υψηλού μεταστατικού δυναμικού που παρουσιάζει. Συνολικά, τα παραπάνω στοιχεία καθιστούν επιτακτική την αξιοποίηση όλων των διαθέσιμων μοριακών πληροφοριών αλλά και ιατροβιολογικών δεδομένων για την εφαρμογή εξατομικευμένης θεραπείας στα πλαίσια της μεταφραστικής έρευνας.Την τελευταία δεκαετία, η εκτεταμένη αξιοποίηση των τεχνικών υψηλής απόδοσης, όπως οι μικροσυστοιχίες γονιδίων και οι τεχνολογίες αλληλούχισης νέας γενιάς για την ανάλυση της γονιδιακής έκφρασης σε διάφορους τύπους καρκίνου του παχέος εντέρου, συνέβαλε σημαντικά στην ταυτοποίηση σημαντικών γονιδιακών μεταλλαγών και στον χαρακτηρισμό γονιδίων που εμπλέκονται στην παθοφυσιολογία του συγκεκριμένου καρκίνου. Αν και οι τεχνολογίες αλληλούχισης νέας γενιάς παρουσιάζουν σημαντικές τεχνολογικές βελτιώσεις, οι μικροσυστοιχίες DNA εξακολουθούν να παραμένουν δημοφιλείς για την ανάλυση του μεταγραφώματος, αφενός γιατί παραμένουν πιο οικονομικές και αφετέρου προϋποθέτουν μια λιγότερο σύνθετη προετοιμασία δειγμάτων, συγκριτικά με τις μεθοδολογίες αλληλούχισης. Αντιπροσωπευτικά παραδείγματα για τη συμβολή των τεχνολογιών υψηλής απόδοσης αποτελούν ερευνητικές δημοσιεύσεις από τη διεθνή ερευνητική συνεργασία (The Cancer Genome Atlas) για τον μοριακό χαρακτηρισμού του καρκίνου του παχέος εντέρου, καθώς και τον χαρακτηρισμό συγκεκριμένων μοριακών υποτύπων με βάση γονιδιωματικά δεδομένα από διαφορετικές ερευνητικές ομάδες. Ωστόσο, η ιδιαίτερη ετερογένεια που παρουσιάζει ο συγκεκριμένος καρκίνος αλλά και η δυσκολία της ερμηνείας μοριακών δεδομένων υψηλής διαστασιμότητας που προκύπτουν από την ανάλυση μικροσυστοιχιών αλλά και DNA/RNA-Seq τεχνολογιών, δυσχεραίνουν την αξιοποίηση των διαφόρων γονιδιακών υπογραφών που περιγράφουν ένα συγκεκριμένο καρκινικό φαινότυπο, στην κλινική εφαρμογή.

Download Full-text

Pan-cancer analysis reveals complex tumor-specific alternative polyadenylation

10.1101/160960 ◽

2017 ◽

Author(s):

Zhuyi Xue ◽

René L Warren ◽

Ewan A Gibb ◽

Daniel MacMillan ◽

Johnathan Wong ◽

...

Keyword(s):

Alternative Polyadenylation ◽

Length Change ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Rna Seq ◽

Multiple Cancer ◽

Tissue Samples ◽

Cancer Genome Atlas ◽

Specific Alternative ◽

Cancer Types

AbstractAlternative polyadenylation (APA) of 3’ untranslated regions (3’ UTRs) has been implicated in cancer development. Earlier reports on APA in cancer primarily focused on 3’ UTR length modifications, and the conventional wisdom is that tumor cells preferentially express transcripts with shorter 3’ UTRs. Here, we analyzed the APA patterns of 114 genes, a select list of oncogenes and tumor suppressors, in 9,939 tumor and 729 normal tissue samples across 33 cancer types using RNA-Seq data from The Cancer Genome Atlas, and we found that the APA regulation machinery is much more complicated than what was previously thought. We report 77 cases (gene-cancer type pairs) of differential 3’ UTR cleavage patterns between normal and tumor tissues, involving 33 genes in 13 cancer types. For 15 genes, the tumor-specific cleavage patterns are recurrent across multiple cancer types. While the cleavage patterns in certain genes indicate apparent trends of 3’ UTR shortening in tumor samples, over half of the 77 cases imply 3’ UTR length change trends in cancer that are more complex than simple shortening or lengthening. This work extends the current understanding of APA regulation in cancer, and demonstrates how large volumes of RNA-seq data generated for characterizing cancer cohorts can be mined to investigate this process.

Download Full-text