A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response

AbstractCancer treatments can be highly toxic and frequently only a subset of the patient population will benefit from a given treatment. Tumour genetic makeup plays an important role in cancer drug sensitivity. We suspect that gene expression markers could be used as a decision aid for treatment selection or dosage tuning. Using in vitro cancer cell line dose-response and gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, we build a dose-varying regression model. Unlike existing approaches, this allows us to estimate dosage-dependent associations with gene expression. We include the transcriptomic profiles as dose-invariant covariates into the regression model and assume that their effect varies smoothly over the dosage levels. A two-stage variable selection algorithm (variable screening followed by penalised regression) is used to identify genetic factors that are associated with drug response over the varying dosages. We evaluate the effectiveness of our method using simulation studies focusing on the choice of tuning parameters and cross-validation for predictive accuracy assessment. We further apply the model to data from five BRAF targeted compounds applied to different cancer cell lines under different dosage levels. We highlight the dosage-dependent dynamics of the associations between the selected genes and drug response, and we perform pathway enrichment analysis to show that the selected genes play an important role in pathways related to tumourgenesis and DNA damage response.Author SummaryTumour cell lines allow scientists to test anticancer drugs in a laboratory environment. Cells are exposed to the drug in increasing concentrations, and the drug response, or amount of surviving cells, is measured. Generally, drug response is summarized via a single number such as the concentration at which 50% of the cells have died (IC50). To avoid relying on such summary measures, we adopted a functional regression approach that takes the dose-response curves as inputs, and uses them to find biomarkers of drug response. One major advantage of our approach is that it describes how the effect of a biomarker on the drug response changes with the drug dosage. This is useful for determining optimal treatment dosages and predicting drug response curves for unseen drug-cell line combinations. Our method scales to large numbers of biomarkers by using regularisation and, in contrast with existing literature, selects the most informative genes by accounting for drug response at untested dosages. We demonstrate its value using data from the Genomics of Drug Sensitivity in Cancer project to identify genes whose expression is associated with drug response. We show that the selected genes recapitulate prior biological knowledge, and belong to known cancer pathways.

Download Full-text

A regularized functional regression model enabling transcriptome-wide dosage-dependent association study of cancer drug response

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008066 ◽

2021 ◽

Vol 17 (1) ◽

pp. e1008066

Author(s):

Evanthia Koukouli ◽

Dennis Wang ◽

Frank Dondelinger ◽

Juhyun Park

Keyword(s):

Gene Expression ◽

Regression Model ◽

Cancer Cell ◽

Drug Response ◽

Drug Sensitivity ◽

Predictive Accuracy ◽

Penalized Regression ◽

Cancer Drug ◽

Pathway Enrichment Analysis ◽

Functional Regression

Cancer treatments can be highly toxic and frequently only a subset of the patient population will benefit from a given treatment. Tumour genetic makeup plays an important role in cancer drug sensitivity. We suspect that gene expression markers could be used as a decision aid for treatment selection or dosage tuning. Using in vitro cancer cell line dose-response and gene expression data from the Genomics of Drug Sensitivity in Cancer (GDSC) project, we build a dose-varying regression model. Unlike existing approaches, this allows us to estimate dosage-dependent associations with gene expression. We include the transcriptomic profiles as dose-invariant covariates into the regression model and assume that their effect varies smoothly over the dosage levels. A two-stage variable selection algorithm (variable screening followed by penalized regression) is used to identify genetic factors that are associated with drug response over the varying dosages. We evaluate the effectiveness of our method using simulation studies focusing on the choice of tuning parameters and cross-validation for predictive accuracy assessment. We further apply the model to data from five BRAF targeted compounds applied to different cancer cell lines under different dosage levels. We highlight the dosage-dependent dynamics of the associations between the selected genes and drug response, and we perform pathway enrichment analysis to show that the selected genes play an important role in pathways related to tumorigenesis and DNA damage response.

Download Full-text

Connecting gene expression subtypes of colorectal cancer (CRC) with cell lines and drug resistance.

Journal of Clinical Oncology ◽

10.1200/jco.2013.31.15_suppl.e14544 ◽

2013 ◽

Vol 31 (15_suppl) ◽

pp. e14544-e14544

Author(s):

Eva Budinska ◽

Jenny Wilding ◽

Vlad Calin Popovici ◽

Edoardo Missiaglia ◽

Arnaud Roth ◽

...

Keyword(s):

Gene Expression ◽

Cell Line ◽

Clinical Significance ◽

Cell Lines ◽

Drug Response ◽

Drug Sensitivity ◽

Expression Profiles ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis

e14544 Background: We identified CRC gene expression subtypes (ASCO 2012, #3511), which associate with established parameters of outcome as well as relevant biological motifs. We now substantiate their biological and potentially clinical significance by linking them with cell line data and drug sensitivity, primarily attempting to identify models for the poor prognosis subtypes Mesenchymal and CIMP-H like (characterized by EMT/stroma and immune-associated gene modules, respectively). Methods: We analyzed gene expression profiles of 35 publicly available cell lines with sensitivity data for 82 drug compounds, and our 94 cell lines with data on sensitivity for 7 compounds and colony morphology. As in vitro, stromal and immune-associated genes loose their relevance, we trained a new classifier based on genes expressed in both systems, which identifies the subtypes in both tissue and cell cultures. Cell line subtypes were validated by comparing their enrichment for molecular markers with that of our CRC subtypes. Drug sensitivity was assessed by linking original subtypes with 92 drug response signatures (MsigDB) via gene set enrichment analysis, and by screening drug sensitivity of cell line panels against our subtypes (Kruskal-Wallis test). Results: Of the cell lines 70% could be assigned to a subtype with a probability as high as 0.95. The cell line subtypes were significantly associated with their KRAS, BRAF and MSI status and corresponded to our CRC subtypes. Interestingly, the cell lines which in matrigel created a network of undifferentiated cells were assigned to the Mesenchymal subtype. Drug response studies revealed potential sensitivity of subtypes to multiple compounds, in addition to what could be predicted based on their mutational profile (e.g. sensitivity of the CIMP-H subtype to Dasatinib, p<0.01). Conclusions: Our data support the biological and potentially clinical significance of the CRC subtypes in their association with cell line models, including results of drug sensitivity analysis. Our subtypes might not only have prognostic value but might also be predictive for response to drugs. Subtyping cell lines further substantiates their significance as relevant model for functional studies.

Download Full-text

Revisiting inconsistency in large pharmacogenomic studies

10.1101/026153 ◽

2015 ◽

Cited By ~ 5

Author(s):

Zhaleh Safikhani ◽

Mark Freeman ◽

Petr Smirnov ◽

Nehme El-Hachem ◽

Adrian She ◽

...

Keyword(s):

Gene Expression ◽

Cell Lines ◽

Dose Response ◽

Drug Sensitivity ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Drug Dose ◽

Response Curves ◽

Sensitivity Data ◽

Pharmacogenomic Studies

Background: In 2012, two large pharmacogenomic studies, the Genomics of Drug Sensitivity in Cancer (GDSC) and Cancer Cell Line Encyclopedia (CCLE), were published, each reported gene expression data and measures of drug response for a large number of drugs and hundreds of cell lines. In 2013, we published a comparative analysis that reported gene expression profiles for the 471 cell lines profiled in both studies and dose response measurements for the 15 drugs characterized in the common cell lines by both studies. While we found good concordance in gene expression profiles, there was substantial inconsistency in the drug responses reported by the GDSC and CCLE projects. Our paper was widely discussed and we received extensive feedback on the comparisons that we performed. This feedback, along with the release of new data, prompted us to revisit our initial analysis. Here we present a new analysis using these expanded data in which we address the most significant suggestions for improvements on our published analysis: that drugs with different response characteristics should have been treated differently, that targeted therapies and broad cytotoxic drugs should have been treated differently in assessing consistency, that consistency of both molecular profiles and drug sensitivity measurements should both be compared across cell lines to accurately assess differences in the studies, that we missed some biomarkers that are consistent between studies, and that the software analysis tools we provided with our analysis should have been easier to run, particularly as the GDSC and CCLE released additional data. Methods: For each drug, we used published sensitivity data from the GDSC and CCLE to separately estimate drug dose-response curves. We then used two statistics, the area between drug dose-response curves (ABC) and the Matthews correlation coefficient (MCC), to robustly estimate the consistency of continuous and discrete drug sensitivity measures, respectively. We also used recently released RNA-seq data together with previously published gene expression microarray data to assess inter-platform reproducibility of cell line gene expression profiles. Results: This re-analysis supports our previous finding that gene expression data are significantly more consistent than drug sensitivity measurements. The use of new statistics to assess data consistency allowed us to identify two broad effect drugs -- 17-AAG and PD-0332901 -- and three targeted drugs -- PLX4720, nilotinib and crizotinib -- with moderate to good consistency in drug sensitivity data between GDSC and CCLE. Not enough sensitive cell lines were screened in both studies to robustly assess consistency for three other targeted drugs, PHA-665752, erlotinib, and sorafenib. Concurring with our published results, we found evidence of inconsistencies in pharmacological phenotypes for the remaining eight drugs. Further, to discover "consistency" between studies required the use of multiple statistics and the selection of specific measures on a case-by-case basis. Conclusion: Our results reaffirm our initial findings of an inconsistency in drug sensitivity measures for eight of fifteen drugs screened both in GDSC and CCLE, irrespective of which statistical metric was used to assess correlation. Taken together, our findings suggest that the phenotypic data on drug response in the GDSC and CCLE continue to present challenges for robust biomarker discovery. This re-analysis provides additional support for the argument that experimental standardization and validation of pharmacogenomic response will be necessary to advance the broad use of large pharmacogenomic screens.

Download Full-text

Simulation of cancer cell line pharmacogenomics data to optimise experimental design and analysis strategy

10.7287/peerj.preprints.27345 ◽

2018 ◽

Author(s):

Hitesh Mistry ◽

Phil Chapman

Keyword(s):

Cell Line ◽

Cell Lines ◽

Dose Response ◽

Statistical Power ◽

Drug Sensitivity ◽

Mixed Effects ◽

Conventional Approach ◽

Cancer Drug ◽

Simulation Studies ◽

Non Linear

Explaining the variability in drug sensitivity across a panel of cell lines using genomic information is a key aspect of cancer drug discovery. The results of such analyses may ultimately determine which patients are likely to benefit from a new treatment. There are numerous experimental factors that can influence the outcomes of cell line screening panels such as the number of replicates, number of doses explored etc. Simulation studies can aid in understanding how variability in these experimental factors can affect the statistical power of a given analysis method. In this study dose response data was simulated for a variety of experimental designs and the ability of different methods to retrieve the original simulation parameters was compared. The analysis methods under consideration were a combination of non-linear least squares and ANOVA, conventional approach, versus non-linear mixed effects. Across the simulation studies explored the mixed-effects approach gave similar and in some situations greater statistical power than the conventional approach. In particular the mixed-effects approach gave significantly greater power when there was less information per dose response curve, and when more cell lines screened.More generally the best way to improve statistical power was to screen more cell lines. This study demonstrates the value of simulating data to understand design and analysis choices in the context of cancer drug sensitivity screening. By illustrating the performance of different methods in different situations these results will help researchers in the field generate and analyse data on future preclinical compounds. Ultimately this will benefit patients by ensuring that biomarkers of drug sensitivity have an increased chance of being identified at the preclinical stage.

Download Full-text

Simulation of cancer cell line pharmacogenomics data to optimise experimental design and analysis strategy

10.1101/174862 ◽

2017 ◽

Author(s):

Hitesh Mistry ◽

Phil Chapman

Keyword(s):

Cell Line ◽

Cell Lines ◽

Dose Response ◽

Statistical Power ◽

Drug Sensitivity ◽

Mixed Effects ◽

Conventional Approach ◽

Cancer Drug ◽

Simulation Studies ◽

Non Linear

AbstractBackgroundExplaining the variability in drug sensitivity across a panel of cell lines using genomic information is a key aspect of cancer drug discovery. The results of such analyses may ultimately determine which patients are likely to benefit from a new treatment. There are numerous experimental factors that can influence the outcomes of cell line screening panels such as the number of replicates, number of doses explored etc. Simulation studies can aid in understanding how variability in these experimental factors can affect the statistical power of a given analysis method. In this study dose response data was simulated for a variety of experimental designs and the ability of different methods to retrieve the original simulation parameters was compared. The analysis methods under consideration were a combination of non-linear least squares and ANOVA, conventional approach, versus non-linear mixed effects.ResultsAcross the simulation studies explored the mixed-effects approach gave similar and in some situations greater statistical power than the conventional approach. In particular the mixed-effects approach gave significantly greater power when there was less information per dose response curve, and when more cell lines screened. More generally the best way to improve statistical power was to screen more cell lines.ConclusionsThis study demonstrates the value of simulating data to understand design and analysis choices in the context of cancer drug sensitivity screening. By illustrating the performance of different methods in different situations these results will help researchers in the field generate and analyse data on future preclinical compounds. Ultimately this will benefit patients by ensuring that biomarkers of drug sensitivity have an increased chance of being identified at the preclinical stage.

Download Full-text

Simulation of cancer cell line pharmacogenomics data to optimise experimental design and analysis strategy

10.7287/peerj.preprints.27345v1 ◽

2018 ◽

Author(s):

Hitesh Mistry ◽

Phil Chapman

Keyword(s):

Cell Line ◽

Cell Lines ◽

Dose Response ◽

Statistical Power ◽

Drug Sensitivity ◽

Mixed Effects ◽

Conventional Approach ◽

Cancer Drug ◽

Simulation Studies ◽

Non Linear

Download Full-text

Scattome: A Single-Cell Analysis of Targeted Transcriptome Program to Predict Drug Sensitivity of Single Cells within Human Myeloma Tumors

Blood ◽

10.1182/blood.v126.23.4249.4249 ◽

2015 ◽

Vol 126 (23) ◽

pp. 4249-4249

Author(s):

Amit Kumar Mitra ◽

Ujjal Mukherjee ◽

Taylor Harding ◽

Holly Stessman ◽

Ying Li ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Lines ◽

Research Funding ◽

Drug Response ◽

Drug Sensitivity ◽

Single Cell Analysis ◽

Single Cells ◽

Pcr Analysis ◽

Cell Analysis

Abstract Multiple myeloma (MM) is characterized by significant genetic diversity at subclonal levels that likely plays a defining role in the heterogeneity of tumor progression, clinical aggressiveness and drug sensitivity. Such heterogeneity is a driving factor in the evolution of MM, from founder clones through outgrowth of subclonal fractions. DNA Sequencing studies on MM samples have indeed demonstrated such heterogeneity in subclonal architecture at diagnosis based on recurrent mutations in pathologically relevant genes that may ultimately to lead to relapse. However, no study so far has reported a predictive gene expression signature that can identify, distinguish and quantify drug sensitive and drug-resistant subpopulations within a bulk population of myeloma cells. In recent years, our laboratory has successfully developed a gene expression profile (GEP)-based signature that could not only distinguish drug response of MM cell lines, but also was effective in stratifying patient outcomes when applied to GEP profiles from MM clinical trials using proteasome inhibitors (PI) as chemotherapeutic agents. Further, we noted myeloma cell lines that responded to the drug often contained residual sub-population of cells that did not respond, and likely were selectively propagated during drug treatment in vitro, and in patients. In this study, we performed targeted qRT-PCR analysis of single cells using a gene panel that included PI sensitivity genes and gene signatures that could discriminate between low and high-risk myeloma followed by intensive bioinformatics and statistical analysis for the classification and prediction of PI response in individual cells within bulk multiple myeloma tumors. Fluidigm's C1 Single-Cell Auto Prep System was used to perform automated single-cell capture, processing and cDNA synthesis on 576 pre-treatment cells from 12 cell lines representing a wide range of PI-sensitivity and 370 cells from 7 patient samples undergoing PI treatment followed by targeted gene expression profiling of single cells using automated, high-throughput on-chip qRT-PCR analysis using 96.96 Dynamic Array IFCs on the BioMark HD System. Probability of resistance for each individual cell was predicted using a pipeline that employed the machine learning methods Random Forest, Support Vector Machine (radial and sigmoidal), LASSO and kNN (k Nearest Neighbor) for making single-cell GEP data-driven predictions/ decisions. The weighted probabilities from each of the algorithms were used to quantify resistance of each individual cell and plotted using Ensemble forecasting algorithm. Using our drug response GEP signature at the single cell level, we could successfully identify distinct subpopulations of tumor cells that were predicted to be sensitive or resistant to PIs. Subsequently, we developed a R Statistical analysis package (http://cran.r-project.org), SCATTome (Single Cell Analysis of Targeted Transcriptome), that can restructure data obtained from Fluidigm qPCR analysis run, filter missing data, perform scaling of filtered data, build classification models and successfully predict drug response of individual cells and classify each cell's probability of response based on the targeted transcriptome. We will present the program output as graphical displays of single cell response probabilities. This package provides a novel classification method that has the potential to predict subclonal response to a variety of therapeutic agents. Disclosures Kumar: Skyline: Consultancy, Honoraria; BMS: Consultancy; Onyx: Consultancy, Research Funding; Sanofi: Consultancy, Research Funding; Janssen: Consultancy, Research Funding; Novartis: Research Funding; Takeda: Consultancy, Research Funding; Celgene: Consultancy, Research Funding.

Download Full-text

A Non-Negative Matrix Tri-Factorization based Method for Predicting Antitumor Drug Sensitivity

10.1101/2021.12.03.471100 ◽

2021 ◽

Author(s):

Sara Pidò ◽

Carolina Testa ◽

Pietro Pinoli

Keyword(s):

Cell Line ◽

Cell Lines ◽

Drug Response ◽

Missing Values ◽

Drug Sensitivity ◽

Factorization Method ◽

Data Types ◽

Drug Profile ◽

The Matrix ◽

Ic50 Values

AbstractLarge annotated cell line collections have been proven to enable the prediction of drug response in the preclinical setting. We present an enhancement of Non-Negative Matrix Tri-Factorization method, which allows the integration of different data types for the prediction of missing associations. To test our method we retrieved a dataset from CCLE, containing the connections among cell lines and drugs by means of their IC50 values. We performed two different kind of experiments: a) prediction of missing values in the matrix, b) prediction of the complete drug profile of a new cell line, demonstrating the validity of the method in both scenarios.

Download Full-text

Clinical drug response prediction from preclinical cancer cell lines by logistic matrix factorization approach

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720021500359 ◽

2021 ◽

Author(s):

Akram Emdadi ◽

Changiz Eslahchi

Keyword(s):

Cell Line ◽

Cell Lines ◽

Cancer Cell ◽

Matrix Factorization ◽

Drug Response ◽

Drug Sensitivity ◽

Cancer Cell Lines ◽

Tissue Type ◽

Factorization Approach ◽

Clinical Drug

Predicting tumor drug response using cancer cell line drug response values for a large number of anti-cancer drugs is a significant challenge in personalized medicine. Predicting patient response to drugs from data obtained from preclinical models is made easier by the availability of different knowledge on cell lines and drugs. This paper proposes the TCLMF method, a predictive model for predicting drug response in tumor samples that was trained on preclinical samples and is based on the logistic matrix factorization approach. The TCLMF model is designed based on gene expression profiles, tissue type information, the chemical structure of drugs and drug sensitivity (IC 50) data from cancer cell lines. We use preclinical data from the Genomics of Drug Sensitivity in Cancer dataset (GDSC) to train the proposed drug response model, which we then use to predict drug sensitivity of samples from the Cancer Genome Atlas (TCGA) dataset. The TCLMF approach focuses on identifying successful features of cell lines and drugs in order to calculate the probability of the tumor samples being sensitive to drugs. The closest cell line neighbours for each tumor sample are calculated using a description of similarity between tumor samples and cell lines in this study. The drug response for a new tumor is then calculated by averaging the low-rank features obtained from its neighboring cell lines. We compare the results of the TCLMF model with the results of the previously proposed methods using two databases and two approaches to test the model’s performance. In the first approach, 12 drugs with enough known clinical drug response, considered in previous methods, are studied. For 7 drugs out of 12, the TCLMF can significantly distinguish between patients that are resistance to these drugs and the patients that are sensitive to them. These approaches are converted to classification models using a threshold in the second approach, and the results are compared. The results demonstrate that the TCLMF method provides accurate predictions across the results of the other algorithms. Finally, we accurately classify tumor tissue type using the latent vectors obtained from TCLMF’s logistic matrix factorization process. These findings demonstrate that the TCLMF approach produces effective latent vectors for tumor samples. The source code of the TCLMF method is available in https://github.com/emdadi/TCLMF.

Download Full-text