Improving drug response prediction by integrating multiple data sources: matrix factorization, kernel and network-based approaches

Abstract Predicting the response of cancer cell lines to specific drugs is one of the central problems in personalized medicine, where the cell lines show diverse characteristics. Researchers have developed a variety of computational methods to discover associations between drugs and cell lines, and improved drug sensitivity analyses by integrating heterogeneous biological data. However, choosing informative data sources and methods that can incorporate multiple sources efficiently is the challenging part of successful analysis in personalized medicine. The reason is that finding decisive factors of cancer and developing methods that can overcome the problems of integrating data, such as differences in data structures and data complexities, are difficult. In this review, we summarize recent advances in data integration-based machine learning for drug response prediction, by categorizing methods as matrix factorization-based, kernel-based and network-based methods. We also present a short description of relevant databases used as a benchmark in drug response prediction analyses, followed by providing a brief discussion of challenges faced in integrating and interpreting data from multiple sources. Finally, we address the advantages of combining multiple heterogeneous data sources on drug sensitivity analysis by showing an experimental comparison. Contact:[email protected]

Download Full-text

Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720021500359 ◽

2021 ◽

Author(s):

Akram Emdadi ◽

Changiz Eslahchi

Keyword(s):

Cell Line ◽

Cell Lines ◽

Cancer Cell ◽

Matrix Factorization ◽

Drug Response ◽

Drug Sensitivity ◽

Cancer Cell Lines ◽

Tissue Type ◽

Factorization Approach ◽

Clinical Drug

Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model’s performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF’s logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.

Download Full-text

Anticancer Drug Response Prediction in Cell Lines Using Weighted Graph Regularized Matrix Factorization

Molecular Therapy — Nucleic Acids ◽

10.1016/j.omtn.2019.05.017 ◽

2019 ◽

Vol 17 ◽

pp. 164-174 ◽

Cited By ~ 12

Author(s):

Na-Na Guan ◽

Yan Zhao ◽

Chun-Chun Wang ◽

Jian-Qiang Li ◽

Xing Chen ◽

...

Keyword(s):

Cell Lines ◽

Anticancer Drug ◽

Matrix Factorization ◽

Drug Response ◽

Weighted Graph ◽

Response Prediction ◽

Anticancer Drug Response

Download Full-text

Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization

BMC Cancer ◽

10.1186/s12885-017-3500-5 ◽

2017 ◽

Vol 17 (1) ◽

Cited By ~ 36

Author(s):

Lin Wang ◽

Xiaozhong Li ◽

Louxin Zhang ◽

Qiang Gao

Keyword(s):

Cell Lines ◽

Anticancer Drug ◽

Matrix Factorization ◽

Drug Response ◽

Response Prediction ◽

Anticancer Drug Response

Download Full-text

Dr.Paso: Drug response prediction and analysis system for oncology research

10.1101/237727 ◽

2017 ◽

Cited By ~ 1

Author(s):

Francisco Azuaje ◽

Tony Kaoma ◽

Céline Jeanty ◽

Petr V. Nazarov ◽

Arnaud Muller ◽

...

Keyword(s):

Cell Lines ◽

Drug Response ◽

Drug Sensitivity ◽

Drug Efficacy ◽

Response Prediction ◽

Large Cell ◽

Cancer Drug ◽

Diverse Range ◽

Oncology Research ◽

Effective Manner

SummaryThe prediction of anticancer drug response is crucial for achieving a more effective and precise treatment of patients. Models based on the analysis of large cell line collections have shown potential for investigating drug efficacy in a clinically-meaningful, cost-effective manner. Using data from thousands of cancer cell lines and drug response experiments, we propose a drug sensitivity prediction system based on a 47-gene expression profile, which was derived from an unbiased transcriptomic network analysis approach. The profile reflects the molecular activity of a diverse range of cancer-relevant processes and pathways. We validated our model using independent datasets and comparisons with published models. A high concordance between predicted and observed drug sensitivities was obtained, including additional validated predictions for four glioblastoma cell lines and four drugs. Our approach can accurately predict anti-cancer drug sensitivity and will enable further pre-clinical research. In the longer-term, it may benefit patient-oriented investigations and interventions.

Download Full-text

Auto-HMM-LMF: feature selection based method for prediction of drug response via autoencoder and hidden Markov model

BMC Bioinformatics ◽

10.1186/s12859-021-03974-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Akram Emdadi ◽

Changiz Eslahchi

Keyword(s):

Feature Selection ◽

Personalized Medicine ◽

Cell Line ◽

Cell Lines ◽

Cancer Cell ◽

Matrix Factorization ◽

Drug Response ◽

Hidden Markov ◽

Cancer Cell Lines ◽

Selection Of

Abstract Background Predicting the response of cancer cell lines to specific drugs is an essential problem in personalized medicine. Since drug response is closely associated with genomic information in cancer cells, some large panels of several hundred human cancer cell lines are organized with genomic and pharmacogenomic data. Although several methods have been developed to predict the drug response, there are many challenges in achieving accurate predictions. This study proposes a novel feature selection-based method, named Auto-HMM-LMF, to predict cell line-drug associations accurately. Because of the vast dimensions of the feature space for predicting the drug response, Auto-HMM-LMF focuses on the feature selection issue for exploiting a subset of inputs with a significant contribution. Results This research introduces a novel method for feature selection of mutation data based on signature assignments and hidden Markov models. Also, we use the autoencoder models for feature selection of gene expression and copy number variation data. After selecting features, the logistic matrix factorization model is applied to predict drug response values. Besides, by comparing to one of the most powerful feature selection methods, the ensemble feature selection method (EFS), we showed that the performance of the predictive model based on selected features introduced in this paper is much better for drug response prediction. Two datasets, the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE) are used to indicate the efficiency of the proposed method across unseen patient cell-line. Evaluation of the proposed model showed that Auto-HMM-LMF could improve the accuracy of the results of the state-of-the-art algorithms, and it can find useful features for the logistic matrix factorization method. Conclusions We depicted an application of Auto-HMM-LMF in exploring the new candidate drugs for head and neck cancer that showed the proposed method is useful in drug repositioning and personalized medicine. The source code of Auto-HMM-LMF method is available in https://github.com/emdadi/Auto-HMM-LMF.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

Integrating molecular graph data of drugs and multiple -omic data of cell lines for drug response prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2021.3096960 ◽

2021 ◽

pp. 1-1

Author(s):

Giang Thi Thu Nguyen ◽

Duc-Hoa Vu ◽

Duc-Hau Le

Keyword(s):

Cell Lines ◽

Drug Response ◽

Molecular Graph ◽

Response Prediction ◽

Graph Data ◽

Omic Data

Download Full-text

A matrix completion method for drug response prediction in personalized medicine

Proceedings of the Ninth International Symposium on Information and Communication Technology - SoICT 2018 ◽

10.1145/3287921.3287974 ◽

2018 ◽

Cited By ~ 4

Author(s):

Giang T. T. Nguyen ◽

Duc-Hau Le

Keyword(s):

Personalized Medicine ◽

Drug Response ◽

Matrix Completion ◽

Response Prediction

Download Full-text

Pancreatic Ductal Adenocarcinoma (PDAC) Organoids: The Shining Light at the End of the Tunnel for Drug Response Prediction and Personalized Medicine

Cancers ◽

10.3390/cancers12102750 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2750

Author(s):

Pierre-Olivier Frappart ◽

Thomas G. Hofmann

Keyword(s):

Personalized Medicine ◽

Pancreatic Ductal Adenocarcinoma ◽

Drug Response ◽

Treatment Options ◽

Response Prediction ◽

Ductal Adenocarcinoma ◽

Chemotherapy Treatment ◽

The Past ◽

Therapeutic Tools ◽

And Personalized Medicine

Pancreatic ductal adenocarcinoma (PDAC) represents 90% of pancreatic malignancies. In contrast to many other tumor entities, the prognosis of PDAC has not significantly improved during the past thirty years. Patients are often diagnosed too late, leading to an overall five-year survival rate below 10%. More dramatically, PDAC cases are on the rise and it is expected to become the second leading cause of death by cancer in western countries by 2030. Currently, the use of gemcitabine/nab-paclitaxel or FOLFIRINOX remains the standard chemotherapy treatment but still with limited efficiency. There is an urgent need for the development of early diagnostic and therapeutic tools. To this point, in the past 5 years, organoid technology has emerged as a revolution in the field of PDAC personalized medicine. Here, we are reviewing and discussing the current technical and scientific knowledge on PDAC organoids, their future perspectives, and how they can represent a game change in the fight against PDAC by improving both diagnosis and treatment options.

Download Full-text

Connecting gene expression subtypes of colorectal cancer (CRC) with cell lines and drug resistance.

Journal of Clinical Oncology ◽

10.1200/jco.2013.31.15_suppl.e14544 ◽

2013 ◽

Vol 31 (15_suppl) ◽

pp. e14544-e14544

Author(s):

Eva Budinska ◽

Jenny Wilding ◽

Vlad Calin Popovici ◽

Edoardo Missiaglia ◽

Arnaud Roth ◽

...

Keyword(s):

Gene Expression ◽

Cell Line ◽

Clinical Significance ◽

Cell Lines ◽

Drug Response ◽

Drug Sensitivity ◽

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis

e14544 Background: We identified CRC gene expression subtypes (ASCO 2012, #3511), which associate with established parameters of outcome as well as relevant biological motifs. We now substantiate their biological and potentially clinical significance by linking them with cell line data and drug sensitivity, primarily attempting to identify models for the poor prognosis subtypes Mesenchymal and CIMP-H like (characterized by EMT/stroma and immune-associated gene modules, respectively). Methods: We analyzed gene expression profiles of 35 publicly available cell lines with sensitivity data for 82 drug compounds, and our 94 cell lines with data on sensitivity for 7 compounds and colony morphology. As in vitro, stromal and immune-associated genes loose their relevance, we trained a new classifier based on genes expressed in both systems, which identifies the subtypes in both tissue and cell cultures. Cell line subtypes were validated by comparing their enrichment for molecular markers with that of our CRC subtypes. Drug sensitivity was assessed by linking original subtypes with 92 drug response signatures (MsigDB) via gene set enrichment analysis, and by screening drug sensitivity of cell line panels against our subtypes (Kruskal-Wallis test). Results: Of the cell lines 70% could be assigned to a subtype with a probability as high as 0.95. The cell line subtypes were significantly associated with their KRAS, BRAF and MSI status and corresponded to our CRC subtypes. Interestingly, the cell lines which in matrigel created a network of undifferentiated cells were assigned to the Mesenchymal subtype. Drug response studies revealed potential sensitivity of subtypes to multiple compounds, in addition to what could be predicted based on their mutational profile (e.g. sensitivity of the CIMP-H subtype to Dasatinib, p<0.01). Conclusions: Our data support the biological and potentially clinical significance of the CRC subtypes in their association with cell line models, including results of drug sensitivity analysis. Our subtypes might not only have prognostic value but might also be predictive for response to drugs. Subtyping cell lines further substantiates their significance as relevant model for functional studies.

Download Full-text