RNA sequencing analysis for profiling activation of cancer-associated molecular pathways.

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e13032-e13032 ◽  
Author(s):  
Anton Buzdin ◽  
Andrew Garazha ◽  
Maxim Sorokin ◽  
Alex Glusker ◽  
Alexey Aleshin ◽  
...  

e13032 Background: Intracellular molecular pathways (IMPs) control all major events in the living cell. They are considered hotspots in contemporary oncology because knowledge of IMPs activation is essential for understanding mechanisms of molecular pathogenesis in oncology. Profiling IMPs requires RNA-seq data for tumors and for a collection of reference normal tissues. However, there is a shortage now in such profiles for normal tissues from healthy human donors, uniformly profiled in a single series of experiments. Access to the largest dataset of normal profiles GTEx is only partly available through the dbGaP. In TCGA database, norms are adjacent to surgically removed tumors and may be affected by tumor-linked growth factors, inflammation and altered vascularization. ENCODE datasets were for the autopsies of normal tissues, but they can’t form statistically significant reference groups. Methods: Tissue samples representing 20 organs were taken from post-mortal human healthy donors killed in road accidents no later than 36 hours after death, blood samples were taken from healthy volunteers. Gene expression was profiled in RNA-seq experiments using the same reagents, equipment and protocols. Bioinformatic algorithms for IMP analysis were developed and validated using experimental and public gene expression datasets. Results: From original sequencing data we constructed the biggest fully open reference expression database of normal human tissues including 465 profiles termed Oncobox Atlas of Normal Tissue Expression (ANTE, original data: GSE120795). We next developed a method termed Oncobox for interrogating activation of IMPs in human cancers. It includes modules of expression data harmonization and comparison and an algorithm for automatic annotation of molecular pathways. The Oncobox system enables accurate scoring of thousands molecular pathways using RNA-seq data. Oncobox pathway analysis is also applicable for quantitative proteomics and microRNA data in oncology. Conclusions: The Oncobox system can be used for a plethora of applications in cancer research including finding differentially regulated genes and IMPs, and for discovery of new pathway-related diagnostic and prognostic biomarkers.

2015 ◽  
Vol 112 (23) ◽  
pp. E3050-E3057 ◽  
Author(s):  
Christian L. Barrett ◽  
Christopher DeBoever ◽  
Kristen Jepsen ◽  
Cheryl C. Saenz ◽  
Dennis A. Carson ◽  
...  

Tumor-specific molecules are needed across diverse areas of oncology for use in early detection, diagnosis, prognosis and therapy. Large and growing public databases of transcriptome sequencing data (RNA-seq) derived from tumors and normal tissues hold the potential of yielding tumor-specific molecules, but because the data are new they have not been fully explored for this purpose. We have developed custom bioinformatic algorithms and used them with 296 high-grade serous ovarian (HGS-OvCa) tumor and 1,839 normal RNA-seq datasets to identify mRNA isoforms with tumor-specific expression. We rank prioritized isoforms by likelihood of being expressed in HGS-OvCa tumors and not in normal tissues and analyzed 671 top-ranked isoforms by high-throughput RT-qPCR. Six of these isoforms were expressed in a majority of the 12 tumors examined but not in 18 normal tissues. An additional 11 were expressed in most tumors and only one normal tissue, which in most cases was fallopian or colon. Of the 671 isoforms, the topmost 5% (n = 33) ranked based on having tumor-specific or highly restricted normal tissue expression by RT-qPCR analysis are enriched for oncogenic, stem cell/cancer stem cell, and early development loci—including ETV4, FOXM1, LSR, CD9, RAB11FIP4, and FGFRL1. Many of the 33 isoforms are predicted to encode proteins with unique amino acid sequences, which would allow them to be specifically targeted for one or more therapeutic strategies—including monoclonal antibodies and T-cell–based vaccines. The systematic process described herein is readily and rapidly applicable to the more than 30 additional tumor types for which sufficient amounts of RNA-seq already exist.


2021 ◽  
pp. jrheum.201594
Author(s):  
Tatiana Nevskaya ◽  
Janet E. Pope ◽  
Matthew A. Turk ◽  
Jenny Shu ◽  
April Marquardt ◽  
...  

Objective Systemic sclerosis (SSc) is a multisystem disease with heterogeneity in presentation and prognosis. An international collaboration to develop new SSc subset criteria is underway. Our objectives were to identify systems of SSc subset classification and synthesize novel concepts to inform development of new criteria. Methods Medline, Cochrane MEDLINE, CINAHL, EMBASE and Web of Science were searched from their inceptions to December 2019 for studies related to SSc sub-classification, limited to humans without language or sample size restrictions. Results Of 5686 citations, 102 articles reported original data on SSc subsets. Subset classification systems relied on extent of skin involvement and/or scleroderma-specific autoantibodies (n=61), nailfold capillary patterns (n=29), molecular, genomic and cellular patterns (n=12). While some systems of subset classification confer prognostic value for clinical phenotype, severity, and mortality; only subsetting by gene expression signatures in tissue samples has been associated with response to therapy. Conclusion Subsetting on extent of skin involvement remains important. Novel disease attributes including SSc-specific autoantibodies, nailfold capillary patterns and tissue gene expression signatures have been proposed as innovative means of SSc subsetting.


2020 ◽  
pp. 160-170
Author(s):  
John Vivian ◽  
Jordan M. Eizenga ◽  
Holly C. Beale ◽  
Olena M. Vaske ◽  
Benedict Paten

PURPOSE Many antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high intersample variance. Moreover, some cancer samples have misidentified tissues of origin or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparisons to a single patient sample. METHODS We propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and underexpression. RESULTS We demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissue samples. Furthermore, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns. CONCLUSION This exploratory method is suitable for identifying expression outliers from comparative RNA sequencing (RNA-seq) analysis for individual samples, and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing its pediatric cohort.


NAR Cancer ◽  
2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Julianne K David ◽  
Sean K Maden ◽  
Benjamin R Weeder ◽  
Reid F Thompson ◽  
Abhinav Nellore

Abstract This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon–exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon–exon junctions may have a substantial causal relationship with the biology of disease.


Author(s):  
VG LeBlanc ◽  
D Trinh ◽  
M Hughes ◽  
I Luthra ◽  
D Livingstone ◽  
...  

Glioblastomas (GBMs) account for nearly half of all primary malignant brain tumours, and current therapies are often only marginally effective. Our understanding of the underlying biology of these tumours and the development of new therapies have been complicated in part by widespread inter- and intratumoural heterogeneity. To characterize this heterogeneity, we performed regional subsampling of primary glioblastomas and derived organoids from these tissue samples. We then performed single-cell RNA-sequencing (scRNA-seq) on these primary regional subsamples and 1-3 matched organoids per sample. We have profiled samples from six tumour sets to date and have obtained sequencing data for 21,234 primary tissue cells and 14,742 organoid cells. While the most apparent differences in gene expression appear to be between individual tumours, we were also able to identify similar cellular subpopulations across tissue samples and across organoids. Importantly, organoids derived from the same tissue sample appeared to be composed of similar cellular subpopulations and were highly comparable to each other, indicating that replicate organoids faithfully represent the original tumour tissue. Overall, our scRNA-seq approach will help evaluate the utility of tumour-derived organoids as model systems for GBM and will aid in identifying cellular subpopulations defined by gene expression patterns, both in primary GBM regional subsamples and their associated organoids. These analyses will allow for the characterization of clonal or subclonal populations that are likely to respond to different therapeutic approaches and may also uncover novel therapeutic targets previously unrevealed through bulk analyses.


2016 ◽  
Vol 34 (2_suppl) ◽  
pp. 8-8
Author(s):  
Himisha Beltran ◽  
Alexander Wyatt ◽  
Edmund Chedgy ◽  
Ladan Fazli ◽  
Andrea Sboner ◽  
...  

8 Background: Molecular analyses of neoadjuvant post-treatment radical prostatectomy (RP) specimens has been challenging as often times only microscopic foci remain present at time of RP precluding RNA-seq. DNA analysis alone in the absence of expression may be suboptimal in elucidating complex mechanisms of resistance and/or prognostic risk stratification. We therefore set out to develop an assay that could quantify mRNA expression in treated and untreated PCA using formalin fixed paraffin embedded (FFPE) tissues. Methods: We evaluated 40 untreated and post-treatment FFPE specimens as well as patient-matched pre-treated needle biopsies and baseline clinical data from patients enrolled on CALGB 90203: a randomized phase 3 trial comparing noeadjuvant docetaxel and ADT followed by RP vs RP alone for men with high risk localized PCA. High-density tumor areas were selected for RNA extraction (min 50ng RNA). We used NanoString nCounter to quantify gene expression of a custom panel of 75 genes including AR and androgen regulated, neural/neuroendocrine (NE), EMT, cell cycle, hormone receptors, TMPRSS-ERG, ARv7 splice variant, and housekeeper genes. mRNA data was integrated with matched whole exome sequencing data. Frozen specimens and RNA-Seq (n = 7) were used for QC and comparative analysis. Results: Quantitative expression using Nanostring showed high correlation with RNA-seq of patient-matched frozen tissue (Spearman coefficient 0.9). There was significant upregulation of AR and the ARv7 expression following treatment, as well as a subset of NE and EMT genes; three high chromogranin A outlier cases were identified in the treatment arm. There was an overall higher AR score in treated cases (based on expression of 30 AR signaling genes) compared to untreated, along the spectrum of CRPC. Conclusions: These data support the feasibility of quantifying gene expression in neoadjuvant-treated PCA cases with limited FFPE tissue requirement. Extensive characterization of AR status and NE/EMT genes identifies molecular outliers that can arise post-treatment and provides new insight into the heterogeneity of treatment response and potential early markers of resistance. Clinical trial information: NCT00430183.


2019 ◽  
Author(s):  
Tim O. Nieuwenhuis ◽  
Stephanie Yang ◽  
Rohan X. Verma ◽  
Vamsee Pillalamarri ◽  
Dan E. Arking ◽  
...  

AbstractOne of the challenges of next generation sequencing (NGS) is read contamination. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, to understand the factors that contribute to contamination. We obtained GTEx datasets and technical metadata and validating RNA-Seq from other studies. Of 48 analyzed tissues in GTEx, 26 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicated contamination. Sample contamination by non-native genes was associated with a sample being sequenced on the same day as a tissue that natively expressed those genes. This was highly significant for pancreas and esophagus genes (linear model, p=9.5e-237 and p=5e-260 respectively). Nine SNPs in four genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes, validating the contamination. Low-level contamination affected 4,497 (39.6%) samples (defined as 10 PRSS1 TPM). It also led ≥ to eQTL assignments in inappropriate tissues among these 18 genes. We note this type of contamination occurs widely, impacting bulk and single cell data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses. Awareness of this process is necessary to avoid assigning inaccurate importance to low-level gene expression in inappropriate tissues and cells.


2019 ◽  
Author(s):  
Yue Jiang ◽  
Michael J. Apostolides ◽  
Mia Husić ◽  
Robert Siddaway ◽  
Man Yu ◽  
...  

AbstractRecent advancements in high throughput sequencing analysis have enabled the characterization of cancer-driving fusions, improving our understanding of cancer development. Most fusion calling methods, however, examine either RNA or DNA information alone and are limited to a rigid definition of what constitutes a fusion. For this study we developed a pipeline that incorporates several fusion calling methods and considers both RNA and DNA to capture a more complete representation of the tumour fusion landscape. Interestingly, most of the fusions we identified were specific to RNA, with no evidence of corresponding genomic restructuring. Further, while the average total number of fusions in tumour and normal brain tissue samples is comparable, their overall fusion profiles vary significantly. Tumours have an over-representation of fusions occurring between coding genes, whereas fusions involving intergenic or non-coding regions comprised the vast majority of those in normals. Tumours were also more abundant in unique, sample-specific fusions compared to normals, though several fusions exhibited strong recurrence in the tumour type examined (diffuse intrinsic pontine glioma; DIPG) and were absent from both normal tissues and other cancers. Intriguingly, tumours also show broad up- or down-regulation of spliceosomal gene expression, which significantly correlates with fusion number (p=0.007). Our results show that RNA-specific fusions are abundant in both tumour and normal tissue and are associated with spliceosomal gene dysregulation. RNA-specific fusions should be considered as a potential mechanism that may contribute to cancer formation initiation and maintenance alongside more traditional structural events.


2021 ◽  
Author(s):  
Arun H. Patil ◽  
Marc K. Halushka

ABSTRACTMicroRNAs and tRFs are classes of small non-coding RNAs, known for their roles in translational regulation of genes. Advances in next-generation sequencing (NGS) have enabled high-throughput small RNA-seq studies, which require robust alignment pipelines. Our laboratory previously developed miRge and miRge2.0, as flexible tools to process sequencing data for annotation of miRNAs and other small-RNA species and further predict novel miRNAs using a support vector machine approach. Although, miRge2.0 is a leading analysis tool in terms of speed with unique quantifying and annotation features, it has a few limitations. We present miRge3.0 which provides additional features along with compatibility to newer versions of Cutadapt and Python. The revisions of the tool include the ability to process Unique Molecular Identifiers (UMIs) to account for PCR duplicates while quantifying miRNAs in the datasets and an accurate GFF3 formatted isomiR tool. miRge3.0 also has speed improvements benchmarked to miRge2.0, Chimira and sRNAbench. Finally, miRge3.0 output integrates into other packages for a streamlined analysis process and provides a cross-platform Graphical User Interface (GUI). In conclusion miRge3.0 is our 3rd generation small RNA-seq aligner with improvements in speed, versatility, and functionality over earlier iterations.


2021 ◽  
Author(s):  
Daniel Osorio ◽  
Marieke Lydia Kuijjer ◽  
James J. Cai

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.


Sign in / Sign up

Export Citation Format

Share Document