Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance

ABSTRACTBackgroundIdentification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance.ResultsWe developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein-protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group.ConclusionsOur results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.

Download Full-text

Velodrome: Out-of-Distribution Generalization from Labeled and Unlabeled Gene Expression Data for Drug Response Prediction

10.1101/2021.05.25.445658 ◽

2021 ◽

Author(s):

Hossein Sharifi-Noghabi ◽

Parsa Alamzadeh Harjandi ◽

Olga Zolotareva ◽

Colin C Collins ◽

Martin Ester

Keyword(s):

Gene Expression ◽

Transfer Learning ◽

Cell Lines ◽

Gene Expression Data ◽

Drug Response ◽

Response Prediction ◽

Fine Tuning ◽

Expression Data ◽

Target Domain ◽

Data Discrepancy

Data discrepancy between preclinical and clinical datasets poses a major challenge for accurate drug response prediction based on gene expression data. Different methods of transfer learning have been proposed to address this data discrepancy. These methods generally use cell lines as source domains and patients, patient-derived xenografts, or other cell lines as target domains. However, they assume that they have access to the target domain during training or fine-tuning and they can only take labeled source domains as input. The former is a strong assumption that is not satisfied during deployment of these models in the clinic. The latter means these methods rely on labeled source domains which are of limited size. To avoid this assumption, we formulate drug response prediction as an out-of-distribution generalization problem which does not assume that the target domain is accessible during training. Moreover, to exploit unlabeled source domain data, which tends to be much more plentiful than labeled data, we adopt a semi-supervised approach. We propose Velodrome, a semi-supervised method of out-of-distribution generalization that takes labeled and unlabeled data from different resources as input and makes generalizable predictions. Velodrome achieves this goal by introducing an objective function that combines a supervised loss for accurate prediction, an alignment loss for generalization, and a consistency loss to incorporate unlabeled samples. Our experimental results demonstrate that Velodrome outperforms state-of-the-art pharmacogenomics and transfer learning baselines on cell lines, patient-derived xenografts, and patients and therefore, may guide precision oncology more accurately.

Download Full-text

Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

Mathematics ◽

10.3390/math9070772 ◽

2021 ◽

Vol 9 (7) ◽

pp. 772

Author(s):

Seonghun Kim ◽

Seockhun Bae ◽

Yinhua Piao ◽

Kyuri Jo

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Large Scale ◽

Drug Response ◽

Response Prediction ◽

Biological Data ◽

Expression Data ◽

Convolutional Network ◽

Essential Information ◽

Protein Protein Interaction

Genomic profiles of cancer patients such as gene expression have become a major source to predict responses to drugs in the era of personalized medicine. As large-scale drug screening data with cancer cell lines are available, a number of computational methods have been developed for drug response prediction. However, few methods incorporate both gene expression data and the biological network, which can harbor essential information about the underlying process of the drug response. We proposed an analysis framework called DrugGCN for prediction of Drug response using a Graph Convolutional Network (GCN). DrugGCN first generates a gene graph by combining a Protein-Protein Interaction (PPI) network and gene expression data with feature selection of drug-related genes, and the GCN model detects the local features such as subnetworks of genes that contribute to the drug response by localized filtering. We demonstrated the effectiveness of DrugGCN using biological data showing its high prediction accuracy among the competing methods.

Download Full-text

A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency

Cell Cycle ◽

10.1080/15384101.2017.1417706 ◽

2018 ◽

Vol 17 (4) ◽

pp. 486-491 ◽

Cited By ~ 22

Author(s):

Nicolas Borisov ◽

Victor Tkachev ◽

Maria Suntsova ◽

Olga Kovalchuk ◽

Alex Zhavoronkov ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cancer Patients ◽

Cell Lines ◽

Gene Expression Data ◽

Data Transfer ◽

Expression Data ◽

Drug Efficiency

Download Full-text

An Integrated Systems Biology and Network-Based Approaches to Identify Novel Biomarkers in Breast Cancer Cell Lines Using Gene Expression Data

Interdisciplinary Sciences Computational Life Sciences ◽

10.1007/s12539-020-00360-0 ◽

2020 ◽

Vol 12 (2) ◽

pp. 155-168 ◽

Cited By ~ 1

Author(s):

Abbas Khan ◽

Zainab Rehman ◽

Huma Farooque Hashmi ◽

Abdul Aziz Khan ◽

Muhammad Junaid ◽

...

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Systems Biology ◽

Cell Lines ◽

Breast Cancer Cell ◽

Gene Expression Data ◽

Integrated Systems ◽

Breast Cancer Cell Lines ◽

Expression Data ◽

Novel Biomarkers

Download Full-text

Leveraging TCGA gene expression data to build predictive models for cancer drug response

BMC Bioinformatics ◽

10.1186/s12859-020-03690-4 ◽

2020 ◽

Vol 21 (S14) ◽

Cited By ~ 3

Author(s):

Evan A. Clayton ◽

Toyya A. Pujol ◽

John F. McDonald ◽

Peng Qiu

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Predictive Models ◽

Drug Response ◽

Cancer Drug ◽

Expression Data ◽

Classification Methods ◽

Clustering And Classification ◽

Machine Learning Models

Abstract Background Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients’ primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. Results We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study’s limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Conclusions Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.

Download Full-text

GEDS: A Gene Expression Display Server for mRNAs, miRNAs and Proteins

Cells ◽

10.3390/cells8070675 ◽

2019 ◽

Vol 8 (7) ◽

pp. 675 ◽

Cited By ~ 5

Author(s):

Xia ◽

Liu ◽

Zhang ◽

Guo

Keyword(s):

Gene Expression ◽

Cell Lines ◽

Gene Expression Data ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Cancer Cell Line ◽

Tissue Expression ◽

The Cancer Genome Atlas ◽

Expression Data ◽

Protein Levels

High-throughput technologies generate a tremendous amount of expression data on mRNA, miRNA and protein levels. Mining and visualizing the large amount of expression data requires sophisticated computational skills. An easy to use and user-friendly web-server for the visualization of gene expression profiles could greatly facilitate data exploration and hypothesis generation for biologists. Here, we curated and normalized the gene expression data on mRNA, miRNA and protein levels in 23315, 9009 and 9244 samples, respectively, from 40 tissues (The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GETx)) and 1594 cell lines (Cancer Cell Line Encyclopedia (CCLE) and MD Anderson Cell Lines Project (MCLP)). Then, we constructed the Gene Expression Display Server (GEDS), a web-based tool for quantification, comparison and visualization of gene expression data. GEDS integrates multiscale expression data and provides multiple types of figures and tables to satisfy several kinds of user requirements. The comprehensive expression profiles plotted in the one-stop GEDS platform greatly facilitate experimental biologists utilizing big data for better experimental design and analysis. GEDS is freely available on http://bioinfo.life.hust.edu.cn/web/GEDS/.

Download Full-text

Analysis of gene expression data of the NCl 60 cancer cell lines using Bayesian hierarchical effects model

10.1117/12.427993 ◽

2001 ◽

Cited By ~ 2

Author(s):

Jae K. Lee ◽

Uwe Scherf ◽

Lawrence H. Smith ◽

Lorraine Tanabe ◽

John N. Weinstein

Keyword(s):

Gene Expression ◽

Cell Lines ◽

Cancer Cell ◽

Gene Expression Data ◽

Cancer Cell Lines ◽

Expression Data ◽

Bayesian Hierarchical

Download Full-text

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data

F1000Research ◽

10.12688/f1000research.10529.1 ◽

2016 ◽

Vol 5 ◽

pp. 2927 ◽

Cited By ~ 9

Author(s):

Linh Nguyen ◽

Cuong C Dang ◽

Pedro J. Ballester

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Cell Line ◽

Cell Lines ◽

Gene Expression Data ◽

Single Gene ◽

Cancer Cell Line ◽

Expression Data ◽

Gene Markers ◽

Pan Cancer

Background:Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data.Methods:Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC50measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than K-fold cross-validation.Results and Discussion:Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG.Conclusions:We now know that this type of models can predictin vitrotumour response to these drugs. These models can thus be further investigated onin vivotumour models.

Download Full-text

Gene expression based inference of drug resistance in cancer

10.1101/2021.11.17.468905 ◽

2021 ◽

Author(s):

Smriti Chawla ◽

Anja Rockstroh ◽

Melanie Lehman ◽

Ellca Rather ◽

Atishay Jain ◽

...

Keyword(s):

Gene Expression ◽

Drug Resistance ◽

Cell Lines ◽

Large Scale ◽

Activity Patterns ◽

The Cancer Genome Atlas ◽

Learning Approaches ◽

Expression Data ◽

Sequencing Data ◽

Pathway Activity

Inter and intra-tumoral heterogeneity are major stumbling blocks in the treatment of cancer and are responsible for imparting differential drug responses in cancer patients. Recently, the availability of large-scale drug screening datasets has provided an opportunity for predicting appropriate patient-tailored therapies by employing machine learning approaches. In this study, we report a predictive modeling approach to infer treatment response in cancers using gene expression data. In particular, we demonstrate the benefits of considering integrated chemogenomics approach, utilizing the molecular drug descriptors and pathway activity information as opposed to gene expression levels. We performed extensive validation of our approach on tissue-derived single-cell and bulk expression data. Further, we constructed several prostate cancer cell lines and xenografts, exposed to differential treatment conditions to assess the predictability of the outcomes. Our approach was further assessed on pan-cancer RNA-sequencing data from The Cancer Genome Atlas (TCGA) archives, as well as an independent clinical trial study describing the treatment journey of three melanoma patients. To summarise, we benchmarked the proposed approach on cancer RNA-seq data, obtained from cell lines, xenografts, as well as humans. We concluded that pathway-activity patterns in cancer cells are reasonably indicative of drug resistance, and therefore can be leveraged in personalized treatment recommendations.

Download Full-text

Out-of-distribution generalization from labelled and unlabelled gene expression data for drug response prediction

Nature Machine Intelligence ◽

10.1038/s42256-021-00408-w ◽

2021 ◽

Author(s):

Hossein Sharifi-Noghabi ◽

Parsa Alamzadeh Harjandi ◽

Olga Zolotareva ◽

Colin C. Collins ◽

Martin Ester

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Drug Response ◽

Response Prediction ◽

Expression Data

Download Full-text