Integrating molecular graph data of drugs and multiple -omic data of cell lines for drug response prediction

Author(s):  
Giang Thi Thu Nguyen ◽  
Duc-Hoa Vu ◽  
Duc-Hau Le
2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Zachary Stanfield ◽  
Mustafa Coşkun ◽  
Mehmet Koyutürk

Abstract Drug response prediction is a well-studied problem in which the molecular profile of a given sample is used to predict the effect of a given drug on that sample. Effective solutions to this problem hold the key for precision medicine. In cancer research, genomic data from cell lines are often utilized as features to develop machine learning models predictive of drug response. Molecular networks provide a functional context for the integration of genomic features, thereby resulting in robust and reproducible predictive models. However, inclusion of network data increases dimensionality and poses additional challenges for common machine learning tasks. To overcome these challenges, we here formulate drug response prediction as a link prediction problem. For this purpose, we represent drug response data for a large cohort of cell lines as a heterogeneous network. Using this network, we compute “network profiles” for cell lines and drugs. We then use the associations between these profiles to predict links between drugs and cell lines. Through leave-one-out cross validation and cross-classification on independent datasets, we show that this approach leads to accurate and reproducible classification of sensitive and resistant cell line-drug pairs, with 85% accuracy. We also examine the biological relevance of the network profiles.


Author(s):  
Delora Baptista ◽  
Pedro G Ferreira ◽  
Miguel Rocha

Abstract Predicting the sensitivity of tumors to specific anti-cancer treatments is a challenge of paramount importance for precision medicine. Machine learning(ML) algorithms can be trained on high-throughput screening data to develop models that are able to predict the response of cancer cell lines and patients to novel drugs or drug combinations. Deep learning (DL) refers to a distinct class of ML algorithms that have achieved top-level performance in a variety of fields, including drug discovery. These types of models have unique characteristics that may make them more suitable for the complex task of modeling drug response based on both biological and chemical data, but the application of DL to drug response prediction has been unexplored until very recently. The few studies that have been published have shown promising results, and the use of DL for drug response prediction is beginning to attract greater interest from researchers in the field. In this article, we critically review recently published studies that have employed DL methods to predict drug response in cancer cell lines. We also provide a brief description of DL and the main types of architectures that have been used in these studies. Additionally, we present a selection of publicly available drug screening data resources that can be used to develop drug response prediction models. Finally, we also address the limitations of these approaches and provide a discussion on possible paths for further improvement. Contact:[email protected]


2019 ◽  
Vol 17 ◽  
pp. 164-174 ◽  
Author(s):  
Na-Na Guan ◽  
Yan Zhao ◽  
Chun-Chun Wang ◽  
Jian-Qiang Li ◽  
Xing Chen ◽  
...  

BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Shujun Huang ◽  
Pingzhao Hu ◽  
Ted M. Lakowski

Abstract Background Predicting patient drug response based on a patient’s molecular profile is one of the key goals of precision medicine in breast cancer (BC). Multiple drug response prediction models have been developed to address this problem. However, most of them were developed to make sensitivity predictions for multiple single drugs within cell lines from various cancer types instead of a single cancer type, do not take into account drug properties, and have not been validated in cancer patient-derived data. Among the multi-omics data, gene expression profiles have been shown to be the most informative data for drug response prediction. However, these models were often developed with individual genes. Therefore, this study aimed to develop a drug response prediction model for BC using multiple data types from both cell lines and drugs. Methods We first collected the baseline gene expression profiles of 49 BC cell lines along with IC50 values for 220 drugs tested in these cell lines from Genomics of Drug Sensitivity in Cancer (GDSC). Using these data, we developed a multiple-layer cell line-drug response network (ML-CDN2) by integrating a one-layer cell line similarity network based on the pathway activity profiles and a three-layer drug similarity network based on the drug structures, targets, and pan-cancer IC50 profiles. We further used ML-CDN2 to predict the drug response for new BC cell lines or patient-derived samples. Results ML-CDN2 demonstrated a good predictive performance, with the Pearson correlation coefficient between the observed and predicted IC50 values for all GDSC cell line-drug pairs of 0.873. Also, ML-CDN2 showed a good performance when used to predict drug response in new BC cell lines from the Cancer Cell Line Encyclopedia (CCLE), with a Pearson correlation coefficient of 0.718. Moreover, we found that the cell line-derived ML-CDN2 model could be applied to predict drug response in the BC patient-derived samples from The Cancer Genome Atlas (TCGA). Conclusions The ML-CDN2 model was built to predict BC drug response using comprehensive information from both cell lines and drugs. Compared with existing methods, it has the potential to predict the drug response for BC patient-derived samples.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Yongsoo Kim ◽  
Tycho Bismeijer ◽  
Wilbert Zwart ◽  
Lodewyk F. A. Wessels ◽  
Daniel J. Vis

Abstract Integrative analyses that summarize and link molecular data to treatment sensitivity are crucial to capture the biological complexity which is essential to further precision medicine. We introduce Weighted Orthogonal Nonnegative parallel factor analysis (WON-PARAFAC), a data integration method that identifies sparse and interpretable factors. WON-PARAFAC summarizes the GDSC1000 cell line compendium in 130 factors. We interpret the factors based on their association with recurrent molecular alterations, pathway enrichment, cancer type, and drug-response. Crucially, the cell line derived factors capture the majority of the relevant biological variation in Patient-Derived Xenograft (PDX) models, strongly suggesting our factors capture invariant and generalizable aspects of cancer biology. Furthermore, drug response in cell lines is better and more consistently translated to PDXs using factor-based predictors as compared to raw feature-based predictors. WON-PARAFAC efficiently summarizes and integrates multiway high-dimensional genomic data and enhances translatability of drug response prediction from cell lines to patient-derived xenografts.


2020 ◽  
Vol 16 (1) ◽  
pp. 31-38
Author(s):  
Shiming Wang ◽  
Jie Li

Drug response prediction in cancer cell lines is vital to discover anticancer drugs for new cell lines.


2017 ◽  
Author(s):  
Vigneshwari Subramanian ◽  
Bence Szalai ◽  
Luis Tobalina ◽  
Julio Saez-Rodriguez

Network diffusion approaches are frequently used for identifying the relevant disease genes and for prioritizing the genes for drug sensitivity predictions. Majority of these studies rely on networks representing a single type of information. However, using multiplex heterogeneous networks (networks with multiple interconnected layers) is much more informative and helps to understand the global topology. We built a multi-layered network that incorporates information on protein-protein interactions, drug-drug similarities, cell line-cell line similarities and co-expressed genes. We applied Random Walk with Restart algorithm to investigate the interactions between drugs, targets and cancer cell lines. Results of ANOVA models show that these prioritized genes are among the most significant ones that relate to drug response. Moreover, the predictive power of the drug response prediction models built using the gene expression data of only the top ranked genes is similar to the models built using all the available genes. Taken together, the results confirm that the multiplex heterogeneous network-based approach is efficient in identifying the most significant genes associated with drug response.


2021 ◽  
Author(s):  
Hossein Sharifi-Noghabi ◽  
Parsa Alamzadeh Harjandi ◽  
Olga Zolotareva ◽  
Colin C Collins ◽  
Martin Ester

Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address this data discrepancy. These methods generally use cell lines as source domains and patients, patient-derived xenografts, or other cell lines as target domains. However, they assume that they have access to the target domain during training or fine-tuning and they can only take labeled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic. The latter means these methods rely on labeled source domains which are of limited size. To avoid this assumption, we formulate drug response prediction as an out-of-distribution generalization problem which does not assume that the target domain is accessible during training. Moreover, to exploit unlabeled source domain data, which tends to be much more plentiful than labeled data, we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization, and a consistency loss to incorporate unlabeled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts, and patients and therefore, may guide precision oncology more accurately.


Sign in / Sign up

Export Citation Format

Share Document