scholarly journals BayCount: A Bayesian Decomposition Method for Inferring Tumor Heterogeneity using RNA-Seq Counts

2017 ◽  
Author(s):  
Fangzheng Xie ◽  
Mingyuan Zhou ◽  
Yanxun Xu

AbstractTumors are heterogeneous - a tumor sample usually consists of a set of subclones with distinct transcriptional profiles and potentially different degrees of aggressiveness and responses to drugs. Understanding tumor heterogeneity is therefore critical for precise cancer prognosis and treatment. In this paper, we introduce BayCount, a Bayesian decomposition method to infer tumor heterogeneity with highly over-dispersed RNA sequencing count data. Using negative binomial factor analysis, BayCount takes into account both the between-sample and gene-specific random effects on raw counts of sequencing reads mapped to each gene. For the posterior inference, we develop an efficient compound Poisson based blocked Gibbs sampler. Simulation studies show that BayCount is able to accurately estimate the subclonal inference, including number of subclones, the proportions of these subclones in each tumor sample, and the gene expression profiles in each subclone. For real-world data examples, we apply BayCount to The Cancer Genome Atlas lung cancer and kidney cancer RNA sequencing count data and obtain biologically interpretable results. Our method represents the first effort in characterizing tumor heterogeneity using RNA sequencing count data that simultaneously removes the need of normalizing the counts, achieves statistical robustness, and obtains biologically/clinically meaningful insights. The R package BayCount implementing our model and algorithm is available for download.

2021 ◽  
Vol 2 (1) ◽  
pp. 43-61
Author(s):  
Aanchal Malhotra ◽  
Samarendra Das ◽  
Shesh N. Rai

Single-cell RNA-sequencing (scRNA-seq) technology provides an excellent platform for measuring the expression profiles of genes in heterogeneous cell populations. Multiple tools for the analysis of scRNA-seq data have been developed over the years. The tools require complicated commands and steps to analyze the underlying data, which are not easy to follow by genome researchers and experimental biologists. Therefore, we describe a step-by-step workflow for processing and analyzing the scRNA-seq unique molecular identifier (UMI) data from Human Lung Adenocarcinoma cell lines. We demonstrate the basic analyses including quality check, mapping and quantification of transcript abundance through suitable real data example to obtain UMI count data. Further, we performed basic statistical analyses, such as zero-inflation, differential expression and clustering analyses on the obtained count data. We studied the effects of excess zero-inflation present in scRNA-seq data on the downstream analyses. Our findings indicate that the zero-inflation associated with UMI data had no or minimal role in clustering, while it had significant effect on identifying differentially expressed genes. We also provide an insight into the comparative analysis for differential expression analysis tools based on zero-inflated negative binomial and negative binomial models on scRNA-seq data. The sensitivity analysis enhanced our findings in that the negative binomial model-based tool did not provide an accurate and efficient way to analyze the scRNA-seq data. This study provides a set of guidelines for the users to handle and analyze real scRNA-seq data more easily.


2020 ◽  
Author(s):  
Wei Ma ◽  
Dandan Li ◽  
Changjian Zhang ◽  
Ming Xiong ◽  
Yuanyuan Qiao

Abstract Purpose: We tried to explore new gene signature via the combination of tumor-derived expression profile and the adjacent normal-derived expression profile to find more robust cancer biomarker. Methods: Log2 transformed ratio of tumor tissue and the adjacent normal tissue (Log2TN) expression, tumor-derived expression, and normal-derived expression were used to do univariate Cox regression in The Cancer Genome Atlas (TCGA) lung squamous cell carcinoma (LUSC) respectively. Then, we used factor analysis and least absolute shrinkage and selection operator Cox (LASSO-Cox) to select gene signature in TCGA LUSC for Log2TN, tumor, and adjacent normal respectively.Results: By comparing Log2TN with tumor and adjacent normal in LUSC, we found that genes derived from Log2TN show more robust (p = 0.006 and p = 0.001) and have lower p-values (p < 0.001). Gene signature selected from Log2TN shows the best generalization in the three GEO datasets even though only tumor-derived expression profiles were available in the three datasets. Enrichment analysis showed that the tumor cells mainly focus on proliferation with losing functional of metabolism.Conclusions: These results indicate that (1) Log2TN could get more robust genes and gene signature than tumor-derived expression profiles used traditionally; (2) the adjacent-normal tissue may also play an important role in the progress and outcome of the tumor.Implications for Cancer Survivors: By combined of tumor-derived expression profile and the adjacent normal-derived expression profile, we could find more robust gene signature than traditionally method. Using these robust gene signatures, robust cancer biomarkers could be constructed and will do great help to improve cancer prognosis.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Siddhant Kalra ◽  
Aayushi Mittal ◽  
Krishan Gupta ◽  
Vrinda Singhal ◽  
Anku Gupta ◽  
...  

AbstractEctopically expressed olfactory receptors (ORs) have been linked with multiple clinically-relevant physiological processes. Previously used tissue-level expression estimation largely shadowed the potential role of ORs due to their overall low expression levels. Even after the introduction of the single-cell transcriptomics, a comprehensive delineation of expression dynamics of ORs in tumors remained unexplored. Our targeted investigation into single malignant cells revealed a complex landscape of combinatorial OR expression events. We observed differentiation-dependent decline in expressed OR counts per cell as well as their expression intensities in malignant cells. Further, we constructed expression signatures based on a large spectrum of ORs and tracked their enrichment in bulk expression profiles of tumor samples from The Cancer Genome Atlas (TCGA). TCGA tumor samples stratified based on OR-centric signatures exhibited divergent survival probabilities. In summary, our comprehensive analysis positions ORs at the cross-road of tumor cell differentiation status and cancer prognosis.


Author(s):  
Alexander D. Knudson ◽  
Tomasz J. Kozubowski ◽  
Anna K. Panorska ◽  
A. Grant Schissler

AbstractWe propose a flexible multivariate stochastic model for over-dispersed count data. Our methodology is built upon mixed Poisson random vectors (Y1,…,Yd), where the {Yi} are conditionally independent Poisson random variables. The stochastic rates of the {Yi} are multivariate distributions with arbitrary non-negative margins linked by a copula function. We present basic properties of these mixed Poisson multivariate distributions and provide several examples. A particular case with geometric and negative binomial marginal distributions is studied in detail. We illustrate an application of our model by conducting a high-dimensional simulation motivated by RNA-sequencing data.


2021 ◽  
Vol 10 ◽  
Author(s):  
Jun Liu ◽  
Shanqiang Zhang ◽  
Wenjie Dai ◽  
Chongwei Xie ◽  
Ji-Cheng Li

SLC41A3, as a member of the 41st family of solute carriers, participates in the transport of magnesium. The role of SLC41A3 in cancer prognosis and immune regulation has rarely been reported. This study was designed to analyze the expression status and prognostic significance of SLC41A3 in pan-cancers. The mRNA expression profiles of SLC41A3 were obtained from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx), the Broad Institute Cancer Cell Line Encyclopedia (CCLE), and the International Cancer Genome Consortium (ICGC). The Cox regression and Kaplan-Meier analyses were used to evaluate the prognostic value of SLC41A3 in pan-cancer. Furthermore, the correlation between SLC41A3 expression and immune cells infiltration, immune checkpoint, mismatch repair (MMR), DNA methyltransferase (DNMT), tumor mutation burden (TMB), and microsatellite instability (MSI) were calculated using data form TCGA database. The results showed that the expression of SLC41A3 was down-regulated in kidney renal clear cell carcinoma (KIRC), and was associated with poor overall survival and tumor-specific mortality. Whereas, the expression of SLC41A3 was up-regulated in liver hepatocellular carcinoma (LIHC), and the results of Cox regression analysis revealed that SLC41A3 was an independent factor for LIHC prognosis. Meanwhile, a nomogram including SLC41A3 and stage was built and exhibited good predictive power for the overall survival of LIHC patients. Additionally, correlation analysis suggested a significant correlation between SLC41A3 and TMB, MSI, MMR, DNMT, and immune cells infiltration in various cancers. The overall survival and disease-specific survival analysis revealed that the combined SLC41A3 expression and immune cell score, TMB, and MSI were significantly associated with clinical outcomes in ACC, LIHC, and UVM patients. Therefore, we proposed that SLC41A3 may serve as a potential prognostic biomarker for cancer.


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 3623-3623
Author(s):  
F. Anthony San Lucas ◽  
Scott Kopetz ◽  
Paul A. Scheet ◽  
Eduardo Vilar Sanchez

3623 Background: Approximately 10% of colorectal cancers (CRCs) harbor a BRAF mutation (BRAFm). Patients with BRAFm tumors have poor prognosis and are a therapeutic challenge. A BRAFm gene expression signature has been communicated (Popovici et al, JCO 2012), which can identify BRAFm tumors as well as BRAF wild-type tumors that display a similar expression pattern. Collectively, these tumors are termed BRAFm-like. Our goal was to validate this signature using next-generation sequencing and to discover novel therapies for BRAFm-like CRCs using a systems biology approach. Methods: We developed a semi-automated workflow that integrates publicly available tools named the Cancer In-silico Drug Discovery (CIDD). To validate the BRAFm-like signature, we used CIDD to analyze the CRC dataset from the The Cancer Genome Atlas Network (TCGA). Samples were stratified on BRAFm status using exome-sequencing, and expression profiles were inferred from RNA-sequencing. We matched expression profiles with drug-induced signatures inferred from the Connectivity Map (CMap) – a systems biology tool that contains expression data of cell lines treated with 1,500 compounds. CIDD statistically ranks candidate compounds and annotates them to pathways using public databases. Results: When applied to TCGA RNA-sequencing data, a classifier based on the BRAFm-like signature resulted in 93.3% sensitivity and 83.5% specificity for detecting BRAFm samples. When applied to Agilent gene expression data, this resulted in 80% sensitivity and 91.1% specificity. 41% of KRAS-mutated samples and 14% of double wild-type samples were predicted to be BRAFm-like. 100% of MSI-high and 18% of MSS samples were predicted to be BRAFm-like. Compounds near the top of our drug rankings include Gefitinib and MG-262 a proteasome inhibitor. Conclusions: We have validated the BRAFm-like signature using RNA-sequencing and Agilent expression data from the TCGA, and showed a high degree of robustness across technologies. We have identified EGFR and proteasome inhibitors as potential compounds to target BRAFm-like CRCs.


2018 ◽  
Vol 1 (2) ◽  
pp. 55-70 ◽  
Author(s):  
Morshed Alam ◽  
Naim Al Mahi ◽  
Munni Begum

One of the main objectives of many biological studies is to explore differential gene expression profiles between samples. Genes are referred to as differentially expressed (DE) if the read counts change across treatments or conditions systematically. Poisson and negative binomial (NB) regressions are widely used methods for non-over-dispersed (NOD) and over-dispersed (OD) count data respectively. However, in the presence of excessive number of zeros, these methods need adjustments. In this paper, we consider a zero-inflated Poisson mixed effects model (ZIPMM) and zero-inflated negative binomial mixed effects model (ZINBMM) to address excessive zero counts in the NOD and OD RNA-seq data respectively in the presence of random effects. We apply these methods to both simulated and real RNA-seq datasets. The ZIPMM and ZINBMM perform better on both simulated and real datasets.


BMC Cancer ◽  
2019 ◽  
Vol 19 (1) ◽  
Author(s):  
H. Sallinen ◽  
S. Janhonen ◽  
P. Pölönen ◽  
H. Niskanen ◽  
O. H. Liu ◽  
...  

Abstract Background High grade serous ovarian carcinoma (HGSOC) is the most common subtype of epithelial ovarian cancers (EOC) with poor prognosis. In most cases EOC is widely disseminated at the time of diagnosis. Despite the optimal cytoreductive surgery and chemotherapy most patients develop chemoresistance, and the 5-year overall survival being only 25–35%. Methods Here we analyzed the gene expression profiles of 10 primary HGSOC tumors and 10 related omental metastases using RNA sequencing and identified 100 differentially expressed genes. Results The differentially expressed genes were associated with decreased embryogenesis and vasculogenesis and increased cellular proliferation and organismal death. Top upstream regulators responsible for this gene signature were NR5A1, GATA4, FOXL2, TP53 and BMP7. A subset of these genes were highly expressed in the ovarian cancer among the cancer transcriptomes of The Cancer Genome Atlas. Importantly, the metastatic gene signature was suggestive of poor survival in TCGA data based on gene enrichment analysis. Conclusion By comparing the gene expression profiles of primary HGSOC tumors and their matched metastasis, we provide evidence that a signature of 100 genes is able to separate these two sample types and potentially predict patient survival. Our study identifies functional categories of genes and transcription factors that could play important roles in promoting metastases and serve as markers for cancer prognosis.


2019 ◽  
Vol 8 ◽  
pp. 1078-1085
Author(s):  
Liliana Lopez-Kleine ◽  
Cristian Andres Gonzalez-Prieto

Interactions between genes, such as regulations are best represented by gene regulatory networks (GRN). These are often constructed based on gene expression data. Few methods for the construction of GRN exist for RNA sequencing count data. One of the most used methods for microarray data is based on graphical Gaussian networks. Considering that count data have different distributions, a method assuming RNA sequencing counts distribute Poisson has been proposed recently. Nevertheless, it has been argued that the most likely distribution of RNA sequencing counts is not Poisson due to overdispersion. Therefore, the negative binomial distribution is much more likely. For this distribution, no model-based method for the construction of GRN has been proposed until now. Here, we present a graphical, model-based method for the construction of GRN assuming a negative binomial distribution of the RNA sequencing count data. The R code is available under request. We used the method proposed both on simulated RNA sequencing count data and on real data. The graph is showed, and its descriptive measurements were assessed. They were found some interesting biological conclusions. We confirm that using negative binomial distribution for fitting the model is suitable because RNA sequencing data present overdispersion.


2022 ◽  
Vol 12 ◽  
Author(s):  
Yiran Zhou ◽  
Qinghua Cui ◽  
Yuan Zhou

tRNA-derived fragments (tRFs) constitute a novel class of small non-coding RNA cleaved from tRNAs. In recent years, researches have shown the regulatory roles of a few tRFs in cancers, illuminating a new direction for tRF-centric cancer researches. Nonetheless, more specific screening of tRFs related to oncogenesis pathways, cancer progression stages and cancer prognosis is continuously demanded to reveal the landscape of the cancer-associated tRFs. In this work, by combining the clinical information recorded in The Cancer Genome Atlas (TCGA) and the tRF expression profiles curated by MINTbase v2.0, we systematically screened 1,516 cancer-associated tRFs (ca-tRFs) across seven cancer types. The ca-tRF set collectively combined the differentially expressed tRFs between cancer samples and control samples, the tRFs significantly correlated with tumor stage and the tRFs significantly correlated with patient survival. By incorporating our previous tRF-target dataset, we found the ca-tRFs tend to target cancer-associated genes and onco-pathways like ATF6-mediated unfolded protein response, angiogenesis, cell cycle process regulation, focal adhesion, PI3K-Akt signaling pathway, cellular senescence and FoxO signaling pathway across multiple cancer types. And cell composition analysis implies that the expressions of ca-tRFs are more likely to be correlated with T-cell infiltration. We also found the ca-tRF expression pattern is informative to prognosis, suggesting plausible tRF-based cancer subtypes. Together, our systematic analysis demonstrates the potentially extensive involvements of tRFs in cancers, and provides a reasonable list of cancer-associated tRFs for further investigations.


Sign in / Sign up

Export Citation Format

Share Document