scholarly journals Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations

Author(s):  
Calvin Chi ◽  
Yuting Ye ◽  
Bin Chen ◽  
Haiyan Huang1

Abstract Motivation In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious. Results We present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomics datasets CTRP2, GDSC2, and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly-dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches. Availability Bipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartite Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (14) ◽  
pp. i510-i519 ◽  
Author(s):  
Soufiane Mourragui ◽  
Marco Loog ◽  
Mark A van de Wiel ◽  
Marcel J T Reinders ◽  
Lodewyk F A Wessels

Abstract Motivation Cell lines and patient-derived xenografts (PDXs) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data are often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting. Results We show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors. Availability and implementation PRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE). Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Chayaporn Supahvilai ◽  
Denis Bertrand ◽  
Niranjan Nagarajan

AbstractMotivationAs we move towards an era of precision medicine, the ability to predict patient-specific drug responses in cancer based on molecular information such as gene expression data represents both an opportunity and a challenge. In particular, methods are needed that can accommodate the high-dimensionality of data to learn interpretable models capturing drug response mechanisms, as well as providing robust predictions across datasets.ResultsWe propose a method based on ideas from “recommender systems” (CaDRReS) that predicts cancer drug responses for unseen cell-lines/patients based on learning projections for drugs and cell-lines into a latent “pharmacogenomic” space. Comparisons with other proposed approaches for this problem based on large public datasets (CCLE, GDSC) shows that CaDRReS provides consistently good models and robust predictions even across unseen patient-derived cell-line datasets. Analysis of the pharmacogenomic spaces inferred by CaDRReS also suggests that they can be used to understand drug mechanisms, identify cellular subtypes, and further characterize drug-pathway associations.AvailabilitySource code and datasets are available at https://github.com/CSB5/[email protected] informationSupplementary data are available online.


2019 ◽  
Author(s):  
Soufiane Mourragui ◽  
Marco Loog ◽  
Marcel JT Reinders ◽  
Lodewyk FA Wessels

AbstractMotivationCell lines and patient-derived xenografts (PDX) have been used extensively to understand the molecular underpinnings of cancer. While core biological processes are typically conserved, these models also show important differences compared to human tumors, hampering the translation of findings from pre-clinical models to the human setting. In particular, employing drug response predictors generated on data derived from pre-clinical models to predict patient response, remains a challenging task. As very large drug response datasets have been collected for pre-clinical models, and patient drug response data is often lacking, there is an urgent need for methods that efficiently transfer drug response predictors from pre-clinical models to the human setting.ResultsWe show that cell lines and PDXs share common characteristics and processes with human tumors. We quantify this similarity and show that a regression model cannot simply be trained on cell lines or PDXs and then applied on tumors. We developed PRECISE, a novel methodology based on domain adaptation that captures the common information shared amongst pre-clinical models and human tumors in a consensus representation. Employing this representation, we train predictors of drug response on pre-clinical data and apply these predictors to stratify human tumors. We show that the resulting domain-invariant predictors show a small reduction in predictive performance in the pre-clinical domain but, importantly, reliably recover known associations between independent biomarkers and their companion drugs on human tumors.AvailabilityPRECISE and the scripts for running our experiments are available on our GitHub page (https://github.com/NKI-CCB/PRECISE)[email protected] informationSupplementary data are available. online.


2020 ◽  
Vol 36 (Supplement_1) ◽  
pp. i380-i388
Author(s):  
Hossein Sharifi-Noghabi ◽  
Shuman Peng ◽  
Olga Zolotareva ◽  
Colin C Collins ◽  
Martin Ester

Abstract Motivation The goal of pharmacogenomics is to predict drug response in patients using their single- or multi-omics data. A major challenge is that clinical data (i.e. patients) with drug response outcome is very limited, creating a need for transfer learning to bridge the gap between large pre-clinical pharmacogenomics datasets (e.g. cancer cell lines), as a source domain, and clinical datasets as a target domain. Two major discrepancies exist between pre-clinical and clinical datasets: (i) in the input space, the gene expression data due to difference in the basic biology, and (ii) in the output space, the different measures of the drug response. Therefore, training a computational model on cell lines and testing it on patients violates the i.i.d assumption that train and test data are from the same distribution. Results We propose Adversarial Inductive Transfer Learning (AITL), a deep neural network method for addressing discrepancies in input and output space between the pre-clinical and clinical datasets. AITL takes gene expression of patients and cell lines as the input, employs adversarial domain adaptation and multi-task learning to address these discrepancies, and predicts the drug response as the output. To the best of our knowledge, AITL is the first adversarial inductive transfer learning method to address both input and output discrepancies. Experimental results indicate that AITL outperforms state-of-the-art pharmacogenomics and transfer learning baselines and may guide precision oncology more accurately. Availability and implementation https://github.com/hosseinshn/AITL. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
David Earl Hostallero ◽  
Yihui Li ◽  
Amin Emad

Motivation: The increasing number of publicly available databases containing drugs' chemical structures, their response in cell lines, and molecular profiles of the cell lines has garnered attention to the problem of drug response prediction. However, many existing methods do not fully leverage the information that is shared among cell lines and drugs with similar structure. As such, drug similarities in terms of cell line responses and chemical structures could prove to be useful in forming drug representations to improve drug response prediction accuracy. Results: We present two deep learning approaches, BiG-DRP and BiG-DRP+, for drug response prediction. Our models take advantage of the drugs' chemical structure and the underlying relationships of drugs and cell lines through a bipartite graph and a heterogenous graph convolutional network that incorporate sensitive and resistant cell line information in forming drug representations. Evaluation of our methods and other state-of-the-art models in different scenarios show that incorporating this bipartite graph significantly improve the prediction performance. Additionally, genes that contribute significantly to the performance of our models also point to important biological processes and signaling pathways.


2016 ◽  
Vol 11 (2) ◽  
pp. 203-210 ◽  
Author(s):  
Jiguang Wang ◽  
Judith Kribelbauer ◽  
Raul Rabadan

Author(s):  
Markus Ekvall ◽  
Michael Höhle ◽  
Lukas Käll

Abstract Motivation Permutation tests offer a straightforward framework to assess the significance of differences in sample statistics. A significant advantage of permutation tests are the relatively few assumptions about the distribution of the test statistic are needed, as they rely on the assumption of exchangeability of the group labels. They have great value, as they allow a sensitivity analysis to determine the extent to which the assumed broad sample distribution of the test statistic applies. However, in this situation, permutation tests are rarely applied because the running time of naïve implementations is too slow and grows exponentially with the sample size. Nevertheless, continued development in the 1980s introduced dynamic programming algorithms that compute exact permutation tests in polynomial time. Albeit this significant running time reduction, the exact test has not yet become one of the predominant statistical tests for medium sample size. Here, we propose a computational parallelization of one such dynamic programming-based permutation test, the Green algorithm, which makes the permutation test more attractive. Results Parallelization of the Green algorithm was found possible by non-trivial rearrangement of the structure of the algorithm. A speed-up—by orders of magnitude—is achievable by executing the parallelized algorithm on a GPU. We demonstrate that the execution time essentially becomes a non-issue for sample sizes, even as high as hundreds of samples. This improvement makes our method an attractive alternative to, e.g. the widely used asymptotic Mann-Whitney U-test. Availabilityand implementation In Python 3 code from the GitHub repository https://github.com/statisticalbiotechnology/parallelPermutationTest under an Apache 2.0 license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Yang Lin ◽  
Xiaoyong Pan ◽  
Hong-Bin Shen

Abstract Motivation Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. Results In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences.We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. Availability The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Lisa-Katrin Turnhoff ◽  
Ali Hadizadeh Esfahani ◽  
Maryam Montazeri ◽  
Nina Kusch ◽  
Andreas Schuppert

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]


Sign in / Sign up

Export Citation Format

Share Document