scholarly journals CTDPathSim: Cell line-tumor deconvoluted pathway-based similarity in the context of precision medicine in cancer *

2020 ◽  
Author(s):  
Banabithi Bose ◽  
Serdar Bozdag

ABSTRACTIn cancer research and drug development, human tumor-derived cell lines are used as popular model for cancer patients to evaluate the biological functions of genes, drug efficacy, side-effects, and drug metabolism. Using these cell lines, the functional relationship between genes and drug response and prediction of drug response based on genomic and chemical features have been studied. Knowing the drug response on the real patients, however, is a more important and challenging task. To tackle this challenge, some studies integrate data from primary tumors and cancer cell lines to find associations between cell lines and tumors. These studies, however, do not integrate multi-omics datasets to their full extent. Also, several studies rely on a genome-wide correlation-based approach between cell lines and bulk tumor samples without considering the heterogeneous cell population in bulk tumors. To address these gaps, we developed a computational pipeline, CTDPathSim, a pathway activity-based approach to compute similarity between primary tumor samples and cell lines at genetic, genomic, and epigenetic levels integrating multi-omics datasets. We utilized a deconvolution method to get cell type-specific DNA methylation and gene expression profiles and computed deconvoluted methylation and expression profiles of tumor samples. We assessed CTDPathSim by applying on breast and ovarian cancer data in The Cancer Genome Atlas (TCGA) and cancer cell lines data in the Cancer Cell Line Encyclopedia (CCLE) databases. Our results showed that highly similar sample-cell line pairs have similar drug response compared to lowly similar pairs in several FDA-approved cancer drugs, such as Paclitaxel, Vinorelbine and Mitomycin-c. CTDPathSim outperformed state-of-the-art methods in recapitulating the known drug responses between samples and cell lines. Also, CTDPathSim selected higher number of significant cell lines belonging to the same cancer types than other methods. Furthermore, our aligned cell lines to samples were found to be clinical biomarkers for patients’ survival whereas unaligned cell lines were not. Our method could guide the selection of appropriate cell lines to be more intently serve as proxy of patient tumors and could direct the pre-clinical translation of drug testing into clinical platform towards the personalized therapies. Furthermore, this study could guide the new uses for old drugs and benefits the development of new drugs in cancer treatments.CCS CONCEPTSComputational biologyGenomicsSystems biologyBioinformaticsGeneticsACM Reference formatBanabithi Bose, Serdar Bozdag. 2020. CTDPathSim: Cell line-tumor deconvoluted pathway-based similarity in the context of precision medicine in cancer.

2018 ◽  
Author(s):  
K. Yu ◽  
B. Chen ◽  
D. Aran ◽  
J. Charalel ◽  
A. Butte ◽  
...  

AbstractCancer cell lines are commonly used as models for cancer biology. While they are limited in their ability to capture complex interactions between tumors and their surrounding environment, they are a cornerstone of cancer research and many important findings have been discovered utilizing cell line models. Not all cell lines are appropriate models of primary tumors, however, which may contribute to the difficulty in translating in vitro findings to patients. Previous studies have leveraged public datasets to evaluate cell lines as models of primary tumors, but they have been limited in scope to specific tumor types and typically ignore the presence of tumor infiltrating cells in the primary tumor samples. We present here a comprehensive pan-cancer analysis utilizing approximately 9,000 transcriptomic profiles from The Cancer Genome Atlas and the Cancer Cell Line Encyclopedia to evaluate cell lines as models of primary tumors across 22 different tumor types. After adjusting for tumor purity in the primary tumor samples, we performed correlation analysis and differential gene expression analysis between the primary tumor samples and cell lines. We found that cell-cycle pathways are consistently upregulated in cell lines, while no pathways are consistently upregulated across the primary tumor samples. In a case study, we compared colorectal cancer cell lines with primary tumor samples across the colorectal subtypes and identified three colorectal cell lines that were derived from fibroblasts rather than tumor epithelial cells. Lastly, we propose a new set of cell lines panel, the TCGA-110, which contains the most representative cell lines from 22 different tumor types as a more comprehensive and informative alternative to the NCI-60 panel. Our analysis of the other tumor types are available in our web app (http://comphealth.ucsf.edu/TCGA110) as a resource to the cancer research community, and we hope it will allow researchers to select more appropriate cell line models and increase the translatability of in vitro findings.


Author(s):  
Akram Emdadi ◽  
Changiz Eslahchi

Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model’s performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF’s logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9440
Author(s):  
Xiaoxi Yang ◽  
Yuqi Wen ◽  
Xinyu Song ◽  
Song He ◽  
Xiaochen Bo

Background Cancer classification is of great importance to understanding its pathogenesis, making diagnosis and developing treatment. The accumulation of extensive omics data of abundant cancer cell line provide basis for large scale classification of cancer with low cost. However, the reliability of cell lines as in vitro models of cancer has been controversial. Methods In this study, we explore the classification on pan-cancer cell line with single and integrated multiple omics data from the Cancer Cell Line Encyclopedia (CCLE) database. The representative omics data of cancer, mRNA data, miRNA data, copy number variation data, DNA methylation data and reverse-phase protein array data were taken into the analysis. TumorMap web tool was used to illustrate the landscape of molecular classification.The molecular classification of patient samples was compared with cancer cell lines. Results Eighteen molecular clusters were identified using integrated multiple omics clustering. Three pan-cancer clusters were found in integrated multiple omics clustering. By comparing with single omics clustering, we found that integrated clustering could capture both shared and complementary information from each omics data. Omics contribution analysis for clustering indicated that, although all the five omics data were of value, mRNA and proteomics data were particular important. While the classifications were generally consistent, samples from cancer patients were more diverse than cancer cell lines. Conclusions The clustering analysis based on integrated omics data provides a novel multi-dimensional map of cancer cell lines that can reflect the extent to pan-cancer cell lines represent primary tumors, and an approach to evaluate the importance of omic features in cancer classification.


2012 ◽  
Vol 30 (5_suppl) ◽  
pp. 377-377
Author(s):  
Brian Shuch ◽  
Christopher Ricketts ◽  
Carole Sourbier ◽  
Shinji Tsutsumi ◽  
Xiu-ying Zhang ◽  
...  

377 Background: Papillary kidney cancer, which occurs in 15% of patients with kidney cancer, can be aggressive and there is currently no effective form of therapy for this disease. To evaluate the metabolic characteristics of sporadic papillary kidney cancer, we have evaluated metabolic parameters of several papillary kidney cancer cell lines and available gene expression profiles. Methods: Established cell lines derived from patients with sporadic papillary kidney cancer (LABAZ, MDACC-55, HRC-86T2) and from a hereditary form of fumarate hydratase-deficient kidney cancer (UOK262) were evaluated. All sporadic lines were initially sequenced for fumarate hydratase (FH). All cell lines were metabolically profiled using the Seahorse Extracellular Flux Analyzer and further evaluated for reactive oxygen species (ROS), mitochondrial membrane potential, and glucose dependence. Finally gene expression profiles of publically available datasets of papillary and HLRCC tumors were downloaded, normalized, and analyzed. Results: Sporadic lines had no alterations in FH and metabolic analysis demonstrated normal oxygen consumption and minimal lactate production, in contrast to highly glycolytic UOK262. Also unlike UOK262, the sporadic papillary kidney cancer lines were not sensitive to glucose withdrawal, had low levels of ROS, and had normal mitochondria membrane potential. Principal component analysis (PCA) demonstrated that HLRCC tumor specimens are very different from sporadic papillary tumors at the molecular level. Conclusions: Our study of established sporadic papillary RCC and fumarate hydratase-deficient HLRCC cell line together with analysis of available gene expression profiles demonstrates that these sporadic papillary kidney cancer cell lines appear to have a distinct metabolic profile from those in the fumarate hydratase deficient kidney cancer cell line. Understanding the metabolic basis of papillary kidney cancer could provide the foundation for the development of targeted approaches to therapy for patients with this disease.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0245939
Author(s):  
Keita Fukuyama ◽  
Masataka Asagiri ◽  
Masahiro Sugimoto ◽  
Hiraki Tsushima ◽  
Satoru Seo ◽  
...  

Cancer cell lines are widely used in basic research to study cancer development, growth, invasion, or metastasis. They are also used for the development and screening of anticancer drugs. However, there are no clear criteria for choosing the most suitable cell lines among the wide variety of cancer cell lines commercially available for research, and the choice is often based on previously published reports. Here, we investigated the characteristics of liver cancer cell lines by analyzing the gene expression data available in the Cancer Cell Line Encyclopedia. Unsupervised clustering analysis of 28 liver cancer cell lines yielded two main clusters. One cluster showed a gene expression pattern similar to that of hepatocytes, and the other showed a pattern similar to that of fibroblasts. Analysis of hepatocellular carcinoma gene expression profiles available in The Cancer Genome Atlas showed that the gene expression patterns in most hepatoma tissues were similar to those in the hepatocyte-like cluster. With respect to liver cancer research, our findings may be useful for selecting an appropriate cell line for a specific study objective. Furthermore, our approach of utilizing a public database for comparing the properties of cell lines could be an attractive cell line selection strategy that can be applied to other fields of research.


2014 ◽  
Vol 1 (1) ◽  
Author(s):  
Glenn S Cowley ◽  
Barbara A Weir ◽  
Francisca Vazquez ◽  
Pablo Tamayo ◽  
Justine A Scott ◽  
...  

Abstract Using a genome-scale, lentivirally delivered shRNA library, we performed massively parallel pooled shRNA screens in 216 cancer cell lines to identify genes that are required for cell proliferation and/or viability. Cell line dependencies on 11,000 genes were interrogated by 5 shRNAs per gene. The proliferation effect of each shRNA in each cell line was assessed by transducing a population of 11M cells with one shRNA-virus per cell and determining the relative enrichment or depletion of each of the 54,000 shRNAs after 16 population doublings using Next Generation Sequencing. All the cell lines were screened using standardized conditions to best assess differential genetic dependencies across cell lines. When combined with genomic characterization of these cell lines, this dataset facilitates the linkage of genetic dependencies with specific cellular contexts (e.g., gene mutations or cell lineage). To enable such comparisons, we developed and provided a bioinformatics tool to identify linear and nonlinear correlations between these features.


2020 ◽  
Author(s):  
Alok Jaiswal ◽  
Prson Gautam ◽  
Elina A Pietilä ◽  
Sanna Timonen ◽  
Nora Nordström ◽  
...  

AbstractA caveat of cancer cell line models is that their molecular and functional profiling is subject to laboratory-specific experimental practices and data analysis protocols. The current challenge is how to make an integrated use of omics profiles of cancer cell lines for reliable discoveries. Here, we carried out a systematic analysis of nine types of data modalities using meta-analysis of 53 omics profiling studies across 12 research laboratories for 2018 cell lines. To account for relatively low consistency observed for certain data modalities, we developed a robust data integration approach that identifies reproducible signals shared among multiple data modalities and studies. We demonstrated the power of the integrative analyses by identifying a novel driver gene, ECHDC1, with tumor suppressive role validated both in breast cancer cells and patient tumors. Extension of the approach identified synthetic lethal partners of cancer drivers, including a co-dependency of PTEN deficient cells on RNA helicases.HighlightsA comprehensive meta-analysis of 53 multi-modal omics profiles of >2000 cancer cell lines from 12 research laboratoriesAn unexpected lack of consistency between TMT-labelled and non-labelled global proteomic profilesA non-parametric approach to integrate omics profiles from multiple laboratories and to identify robust molecular patterns in individual cell linesThe multi-modal data integration reveals novel drivers and potential therapeutic targets, including ECHDC1 in breast cancers and DDX27 in PTEN mutant cancers.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Akram Emdadi ◽  
Changiz Eslahchi

Abstract Background Predicting the response of cancer cell lines to specific drugs is an essential problem in personalized medicine. Since drug response is closely associated with genomic information in cancer cells, some large panels of several hundred human cancer cell lines are organized with genomic and pharmacogenomic data. Although several methods have been developed to predict the drug response, there are many challenges in achieving accurate predictions. This study proposes a novel feature selection-based method, named Auto-HMM-LMF, to predict cell line-drug associations accurately. Because of the vast dimensions of the feature space for predicting the drug response, Auto-HMM-LMF focuses on the feature selection issue for exploiting a subset of inputs with a significant contribution. Results This research introduces a novel method for feature selection of mutation data based on signature assignments and hidden Markov models. Also, we use the autoencoder models for feature selection of gene expression and copy number variation data. After selecting features, the logistic matrix factorization model is applied to predict drug response values. Besides, by comparing to one of the most powerful feature selection methods, the ensemble feature selection method (EFS), we showed that the performance of the predictive model based on selected features introduced in this paper is much better for drug response prediction. Two datasets, the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are used to indicate the efficiency of the proposed method across unseen patient cell-line. Evaluation of the proposed model showed that Auto-HMM-LMF could improve the accuracy of the results of the state-of-the-art algorithms, and it can find useful features for the logistic matrix factorization method. Conclusions We depicted an application of Auto-HMM-LMF in exploring the new candidate drugs for head and neck cancer that showed the proposed method is useful in drug repositioning and personalized medicine. The source code of Auto-HMM-LMF method is available in https://github.com/emdadi/Auto-HMM-LMF.


2016 ◽  
Vol 31 (2) ◽  
pp. 153-162 ◽  
Author(s):  
Alfonso Bettin ◽  
Ismael Reyes ◽  
Niradiz Reyes

Background The aim of this study was to evaluate the gene expression profiles of a set of prostate cancer–associated genes in prostate cancer cell lines, to determine their association with different cancer phenotypes and identify potential novel biomarkers for this disease. Methods Quantitative real-time PCR was used to determine the expression profiles of 21 prostate cancer–associated genes in the human prostate cancer cell lines PC-3 and LNCaP, using the nontumorigenic cell line PWR-1E as control cell line. Genes evaluated were ESM-1, SERPINE2, CLU, BGN, A2M, PENK, FMOD, CD81, DCN, TSPAN8, KBTBD10, F2RL1, TMSB4X, SNCG, CXXC5, FOXQ1, PDPN, SPN, CAV1, CD24 and KLK3. A potential biomarker from this set of genes, the FMOD gene, encoding the small leucine-rich proteoglycan fibromodulin, was selected for further evaluation in clinical samples from patients diagnosed with benign or malignant prostatic disease. Results Several of the evaluated genes showed significantly altered expression in the prostate cancer cell lines, compared with nontumorigenic PWR-1E cells. Further evaluation of FMOD transcript in prostate clinical samples from patients diagnosed with benign or malignant prostatic disease identified a significant difference in the expression levels of this proteoglycan between benign and malignant tissue (p<0.05). Conclusions A number of gene transcripts were differentially expressed by the cell lines assayed. Among them, FMOD was further evaluated in clinical samples and was found to be differentially expressed between benign and prostate cancer tissue. Further validation of FMOD transcript in a larger population is required to ascertain its usefulness as biomarker for prostate cancer.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 844
Author(s):  
Abhishek Majumdar ◽  
Yueze Liu ◽  
Yaoqin Lu ◽  
Shaofeng Wu ◽  
Lijun Cheng

Background: Cancer cell lines are frequently used in research as in-vitro tumor models. Genomic data and large-scale drug screening have accelerated the right drug selection for cancer patients. Accuracy in drug response prediction is crucial for success. Due to data-type diversity and big data volume, few methods can integrative and efficiently find the principal low-dimensional manifold of the high-dimensional cancer multi-omics data to predict drug response in precision medicine. Method: A novelty k-means Ensemble Support Vector Regression (kESVR) is developed to predict each drug response values for single patient based on cell-line gene expression data. The kESVR is a blend of supervised and unsupervised learning methods and is entirely data driven. It utilizes embedded clustering (Principal Component Analysis and k-means clustering) and local regression (Support Vector Regression) to predict drug response and obtain the global pattern while overcoming missing data and outliers’ noise. Results: We compared the efficiency and accuracy of kESVR to 4 standard machine learning regression models: (1) simple linear regression, (2) support vector regression (3) random forest (quantile regression forest) and (4) back propagation neural network. Our results, which based on drug response across 610 cancer cells from Cancer Cell Line Encyclopedia (CCLE) and Cancer Therapeutics Response Portal (CTRP v2), proved to have the highest accuracy (smallest mean squared error (MSE) measure). We next compared kESVR with existing 17 drug response prediction models based a varied range of methods such as regression, Bayesian inference, matrix factorization and deep learning. After ranking the 18 models based on their accuracy of prediction, kESVR ranks first (best performing) in majority (74%) of the time. As for the remaining (26%) cases, kESVR still ranked in the top five performing models. Conclusion: In this paper we introduce a novel model (kESVR) for drug response prediction using high dimensional cell-line gene expression data. This model outperforms current existing prediction models in terms of prediction accuracy and speed and overcomes overfitting. This can be used in future to develop a robust drug response prediction system for cancer patients using the cancer cell-lines guidance and multi-omics data.


Sign in / Sign up

Export Citation Format

Share Document