Bimodality of gene expression in cancer patient tumors as interpretable biomarkers for drug sensitivity
ABSTRACTIdentifying biomarkers predictive of cancer cells’ response to drug treatment constitutes one of the main challenges in precision oncology. Recent large-scale cancer pharmacogenomic studies have boosted the research for finding predictive biomarkers by profiling thousands of human cancer cell lines at the molecular level and screening them with hundreds of approved drugs and experimental chemical compounds. Many studies have leveraged these data to build predictive models of response using various statistical and machine learning methods. However, a common challenge in these methods is the lack of interpretability as to how they make the predictions and which features were the most associated with response, hindering the clinical translation of these models. To alleviate this issue, we develop a new machine learning pipeline based on the recent LOBICO approach that explores the space of bimodally expressed genes in multiple large in vitro pharmacogenomic studies and builds multivariate, nonlinear, yet interpretable logic-based models predictive of drug response. Using our method, we used a compendium of three of the largest pharmacogenomic data sets to build robust and interpretable models for 101 drugs that span 17 drug classes with high validation rate in independent datasets.