scholarly journals FORESEE: a tool for the systematic comparison of translational drug response modeling pipelines

Author(s):  
Lisa-Katrin Turnhoff ◽  
Ali Hadizadeh Esfahani ◽  
Maryam Montazeri ◽  
Nina Kusch ◽  
Andreas Schuppert

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]

2019 ◽  
Author(s):  
Lisa-Katrin Turnhoff ◽  
Ali Hadizadeh Esfahani ◽  
Maryam Montazeri ◽  
Nina Kusch ◽  
Andreas Schuppert

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]


2019 ◽  
Author(s):  
Lisa-Katrin Turnhoff ◽  
Ali Hadizadeh Esfahani ◽  
Maryam Montazeri ◽  
Nina Kusch ◽  
Andreas Schuppert

Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient data sets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art preprocessing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and Implementation: FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE . Supplementary Information: Supplementary Files 1 and 2 provide detailed descriptions of the pipeline and the data preparation process, while Supplementary File 3 presents basic use cases of the package. Contact: [email protected]


2019 ◽  
Vol 35 (19) ◽  
pp. 3846-3848 ◽  
Author(s):  
Lisa-Katrin Turnhoff ◽  
Ali Hadizadeh Esfahani ◽  
Maryam Montazeri ◽  
Nina Kusch ◽  
Andreas Schuppert

Abstract Summary Translational models that utilize omics data generated in in vitro studies to predict the drug efficacy of anti-cancer compounds in patients are highly distinct, which complicates the benchmarking process for new computational approaches. In reaction to this, we introduce the uniFied translatiOnal dRug rESponsE prEdiction platform FORESEE, an open-source R-package. FORESEE not only provides a uniform data format for public cell line and patient datasets, but also establishes a standardized environment for drug response prediction pipelines, incorporating various state-of-the-art pre-processing methods, model training algorithms and validation techniques. The modular implementation of individual elements of the pipeline facilitates a straightforward development of combinatorial models, which can be used to re-evaluate and improve already existing pipelines as well as to develop new ones. Availability and implementation FORESEE is licensed under GNU General Public License v3.0 and available at https://github.com/JRC-COMBINE/FORESEE and https://doi.org/10.17605/OSF.IO/RF6QK, and provides vignettes for documentation and application both online and in the Supplementary Files 2 and 3. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (14) ◽  
pp. i501-i509 ◽  
Author(s):  
Hossein Sharifi-Noghabi ◽  
Olga Zolotareva ◽  
Colin C Collins ◽  
Martin Ester

Abstract Motivation Historically, gene expression has been shown to be the most informative data for drug response prediction. Recent evidence suggests that integrating additional omics can improve the prediction accuracy which raises the question of how to integrate the additional omics. Regardless of the integration strategy, clinical utility and translatability are crucial. Thus, we reasoned a multi-omics approach combined with clinical datasets would improve drug response prediction and clinical relevance. Results We propose MOLI, a multi-omics late integration method based on deep neural networks. MOLI takes somatic mutation, copy number aberration and gene expression data as input, and integrates them for drug response prediction. MOLI uses type-specific encoding sub-networks to learn features for each omics type, concatenates them into one representation and optimizes this representation via a combined cost function consisting of a triplet loss and a binary cross-entropy loss. The former makes the representations of responder samples more similar to each other and different from the non-responders, and the latter makes this representation predictive of the response values. We validate MOLI on in vitro and in vivo datasets for five chemotherapy agents and two targeted therapeutics. Compared to state-of-the-art single-omics and early integration multi-omics methods, MOLI achieves higher prediction accuracy in external validations. Moreover, a significant improvement in MOLI’s performance is observed for targeted drugs when training on a pan-drug input, i.e. using all the drugs with the same target compared to training only on drug-specific inputs. MOLI’s high predictive power suggests it may have utility in precision oncology. Availability and implementation https://github.com/hosseinshn/MOLI. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Nanne Aben ◽  
Johan A. Westerhuis ◽  
Yipeng Song ◽  
Henk A.L. Kiers ◽  
Magali Michaut ◽  
...  

AbstractMotivationIn biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets.ResultsWe present iTOP, a methodology to infera topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics.AvailabilityAn implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary [email protected] and [email protected] informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 10 ◽  
Author(s):  
Debleena Guin ◽  
Jyoti Rani ◽  
Priyanka Singh ◽  
Sandeep Grover ◽  
Shivangi Bora ◽  
...  

Understanding patients’ genomic variations and their effect in protecting or predisposing them to drug response phenotypes is important for providing personalized healthcare. Several studies have manually curated such genotype–phenotype relationships into organized databases from clinical trial data or published literature. However, there are no text mining tools available to extract high-accuracy information from such existing knowledge. In this work, we used a semiautomated text mining approach to retrieve a complete pharmacogenomic (PGx) resource integrating disease–drug–gene-polymorphism relationships to derive a global perspective for ease in therapeutic approaches. We used an R package, pubmed.mineR, to automatically retrieve PGx-related literature. We identified 1,753 disease types, and 666 drugs, associated with 4,132 genes and 33,942 polymorphisms collated from 180,088 publications. With further manual curation, we obtained a total of 2,304 PGx relationships. We evaluated our approach by performance (precision = 0.806) with benchmark datasets like Pharmacogenomic Knowledgebase (PharmGKB) (0.904), Online Mendelian Inheritance in Man (OMIM) (0.600), and The Comparative Toxicogenomics Database (CTD) (0.729). We validated our study by comparing our results with 362 commercially used the US- Food and drug administration (FDA)-approved drug labeling biomarkers. Of the 2,304 PGx relationships identified, 127 belonged to the FDA list of 362 approved pharmacogenomic markers, indicating that our semiautomated text mining approach may reveal significant PGx information with markers for drug response prediction. In addition, it is a scalable and state-of-art approach in curation for PGx clinical utility.


2018 ◽  
Author(s):  
Martin Pirkl ◽  
Niko Beerenwinkel

AbstractMotivationNew technologies allow for the elaborate measurement of different traits of single cells. These data promise to elucidate intra-cellular networks in unprecedented detail and further help to improve treatment of diseases like cancer. However, cell populations can be very heterogeneous.ResultsWe developed a mixture of Nested Effects Models (M&NEM) for single-cell data to simultaneously identify different cellular sub-populations and their corresponding causal networks to explain the heterogeneity in a cell population. For inference, we assign each cell to a network with a certain probability and iteratively update the optimal networks and cell probabilities in an Expectation Maximization scheme. We validate our method in the controlled setting of a simulation study and apply it to three data sets of pooled CRISPR screens generated previously by two novel experimental techniques, namely Crop-Seq and Perturb-Seq.AvailabilityThe mixture Nested Effects Model (M&NEM) is available as the R-package mnem at https://github.com/cbgethz/mnem/[email protected], [email protected] informationSupplementary data are available.online.


2016 ◽  
Author(s):  
Suleiman A. Khan ◽  
Muhammad Ammad-ud-din

AbstractWith recent advancements in measurement technologies, many multi-way and tensor datasets have started to emerge. Exploiting the natural tensor structure in the data has been shown to be advantageous for both explorative and predictive studies in several application areas of bioinformatics and computational biology. Therefore, there has subsequently arisen a need for robust and flexible tools for effectively analyzing tensor data sets. We present the R package tensorBF, which is the first R package providing Bayesian factorization of a tensor. Our package implements a generative model that automatically identifies the number of factors needed to explain the tensor, overcoming a key limitation of traditional tensor factorizations. We also recommend best practices when using tensor factorizations for both, explorative and predictive analysis with an example application on drug response dataset. The package also implements tools related to the normalization of data, informative noise priors and visualization. Availability: The package is available at https://cran.r-project.org/package=tensorBF.


2019 ◽  
Author(s):  
Zachary B. Abrams ◽  
Caitlin E. Coombes ◽  
Suli Li ◽  
Kevin R. Coombes

AbstractSummaryUnsupervised data analysis in many scientific disciplines is based on calculating distances between observations and finding ways to visualize those distances. These kinds of unsupervised analyses help researchers uncover patterns in large-scale data sets. However, researchers can select from a vast number of different distance metrics, each designed to highlight different aspects of different data types. There are also numerous visualization methods with their own strengths and weaknesses. To help researchers perform unsupervised analyses, we developed the Mercator R package. Mercator enables users to see important patterns in their data by generating multiple visualizations using different standard algorithms, making it particularly easy to compare and contrast the results arising from different metrics. By allowing users to select the distance metric that best fits their needs, Mercator helps researchers perform unsupervised analyses that use pattern identification through computation and visual inspection.Availability and ImplementationMercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html)[email protected] informationSupplementary data are available at Bioinformatics online.


Cancers ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 885
Author(s):  
Robert F. Gruener ◽  
Alexander Ling ◽  
Ya-Fang Chang ◽  
Gladys Morrison ◽  
Paul Geeleher ◽  
...  

(1) Background: Drug imputation methods often aim to translate in vitro drug response to in vivo drug efficacy predictions. While commonly used in retrospective analyses, our aim is to investigate the use of drug prediction methods for the generation of novel drug discovery hypotheses. Triple-negative breast cancer (TNBC) is a severe clinical challenge in need of new therapies. (2) Methods: We used an established machine learning approach to build models of drug response based on cell line transcriptome data, which we then applied to patient tumor data to obtain predicted sensitivity scores for hundreds of drugs in over 1000 breast cancer patients. We then examined the relationships between predicted drug response and patient clinical features. (3) Results: Our analysis recapitulated several suspected vulnerabilities in TNBC and identified a number of compounds-of-interest. AZD-1775, a Wee1 inhibitor, was predicted to have preferential activity in TNBC (p < 2.2 × 10−16) and its efficacy was highly associated with TP53 mutations (p = 1.2 × 10−46). We validated these findings using independent cell line screening data and pathway analysis. Additionally, co-administration of AZD-1775 with standard-of-care paclitaxel was able to inhibit tumor growth (p < 0.05) and increase survival (p < 0.01) in a xenograft mouse model of TNBC. (4) Conclusions: Overall, this study provides a framework to turn any cancer transcriptomic dataset into a dataset for drug discovery. Using this framework, one can quickly generate meaningful drug discovery hypotheses for a cancer population of interest.


Sign in / Sign up

Export Citation Format

Share Document