scholarly journals Representation transfer for differentially private drug sensitivity prediction

2019 ◽  
Vol 35 (14) ◽  
pp. i218-i224
Author(s):  
Teppo Niinimäki ◽  
Mikko A Heikkilä ◽  
Antti Honkela ◽  
Samuel Kaski

Abstract Motivation Human genomic datasets often contain sensitive information that limits use and sharing of the data. In particular, simple anonymization strategies fail to provide sufficient level of protection for genomic data, because the data are inherently identifiable. Differentially private machine learning can help by guaranteeing that the published results do not leak too much information about any individual data point. Recent research has reached promising results on differentially private drug sensitivity prediction using gene expression data. Differentially private learning with genomic data is challenging because it is more difficult to guarantee privacy in high dimensions. Dimensionality reduction can help, but if the dimension reduction mapping is learned from the data, then it needs to be differentially private too, which can carry a significant privacy cost. Furthermore, the selection of any hyperparameters (such as the target dimensionality) needs to also avoid leaking private information. Results We study an approach that uses a large public dataset of similar type to learn a compact representation for differentially private learning. We compare three representation learning methods: variational autoencoders, principal component analysis and random projection. We solve two machine learning tasks on gene expression of cancer cell lines: cancer type classification, and drug sensitivity prediction. The experiments demonstrate significant benefit from all representation learning methods with variational autoencoders providing the most accurate predictions most often. Our results significantly improve over previous state-of-the-art in accuracy of differentially private drug sensitivity prediction. Availability and implementation Code used in the experiments is available at https://github.com/DPBayes/dp-representation-transfer.

2021 ◽  
Vol 5 (1) ◽  
pp. 5
Author(s):  
Ninghan Chen ◽  
Zhiqiang Zhong ◽  
Jun Pang

The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a reflection of reality. This paper aims to figure out how Twitter users in the Greater Region (GR) and related countries react differently over time through conducting a data-driven exploratory study of COVID-19 information using machine learning and representation learning methods. We find that tweet volume and COVID-19 cases in GR and related countries are correlated, but this correlation only exists in a particular period of the pandemic. Moreover, we plot the changing of topics in each country and region from 22 January 2020 to 5 June 2020, figuring out the main differences between GR and related countries.


2021 ◽  
Vol 11 (2) ◽  
pp. 61
Author(s):  
Jiande Wu ◽  
Chindo Hicks

Background: Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data. Methods: We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets. Results: Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated. Conclusions: The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kanggeun Lee ◽  
Hyoung-oh Jeong ◽  
Semin Lee ◽  
Won-Ki Jeong

AbstractWith recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 2372-2372
Author(s):  
Habib Hamidi ◽  
Christopher R Bolen ◽  
Elisabeth A Lasater ◽  
Diana Dunshee ◽  
Elizabeth A Punnoose ◽  
...  

Abstract Introduction: AML is a heterogeneous disease with a wide array of common genetic aberrations. Traditional classification of AML leverages both classical cytogenetics and mutational profiling to stratify patients into four distinct risk groups (ELN). However, tumor gene expression profiles can play an important role in response to therapy, and are potentially useful for unravelling the heterogeneity of AML. In this study, we hypothesized that clinical outcomes and variable responses to therapeutic modalities in AML may be driven by patterns of gene expression, and sought to identify clinically actionable molecular subtypes using the available RNAseq data from the BEAT AML functional genomics study. Methods: Unsupervised machine learning approach based on consensus non-negative matrix factorization (cNMF) was applied to VOOM normalized BEAT-AML RNAseq data from patient samples with ≥50% blasts (N=389) to identify transcriptomic-based molecular subtypes. The subtypes were then compared to the genomic based subtypes for their association with clinical outcome (log-rank test) and ex-vivo drug sensitivity (Kruskal Wallis test). Subtypes were also biologically characterized by gene signature scoring using well curated pathway signatures (GSVA analysis using Hallmark pathways), cell type enrichment (xCell enrichment) and AML differentiation state (scRNAseq signature based on Van Galen et. al). Finally, a random forest classifier was defined based on samples from BEAT AML to predict the NMF subtypes in an independent data set (TCGA AML cohort). Results: Our cNMF based analysis identified six clusters of patients based on the 5,060 (top 10%) most variable genes. These novel subtypes were strongly prognostic (Figure 1A, log rank p=2.79e-08), and were independent of ELN genomic based subtypes (anova p=4.45e-07). Comparison to other genomic based classification is ongoing. The prognostic value of the transcriptomic subtypes was further validated by predicting the subtypes in an independent cohort (TCGA LAML, N=200). We observed a significant association with outcome (Figure 1B, p=0.00013), with clusters 5 and 1 showing markedly better prognosis, similar to BEATAML. These subtypes also displayed unique biological profiles, including significant association with scRNAseq-derived AML differentiation state cell types, Hallmark pathways and cellularity signatures. Notably, clusters 1 and 3 showed a mature phenotype, while clusters 2, 4, and 5 were more progenitor-like (table 1). Importantly, the transcriptomic subtypes were highly predictive of ex-vivo drug sensitivity, with sensitivity to 70 compounds significantly associated with cNMF subtype (Kruskal Wallis p>0.01), compared with 4 in the ELN subtypes.Of the tested molecules, single agent Venetoclax was the most strongly associated with subtype (p=1.7e-13); two subtypes were strongly resistant (median IC50 of 10uM) and four were sensitive, with IC50s in the sub-micromolar range (Table 1). No association was seen between the ELN subtypes and venetoclax sensitivity (p=.35). Conclusions: Unsupervised machine learning-based clustering analysis of transcriptomic data identified six novel subtypes which are similarly prognostic as the ELN genomic based subtype and provide a novel avenue for identifying clinically actionable subsets of AML. Figure 1 Figure 1. Disclosures Hamidi: Genentech: Current Employment, Current equity holder in publicly-traded company. Bolen: Genentech: Current Employment; F. Hoffmann-La Roche: Current equity holder in publicly-traded company. Lasater: Genentech: Current Employment, Current equity holder in publicly-traded company. Dunshee: Genentech/Roche: Current Employment, Current equity holder in publicly-traded company. Punnoose: Genentech: Current Employment, Current equity holder in publicly-traded company. Dail: Genentech/Roche: Current Employment, Current equity holder in publicly-traded company.


2019 ◽  
Author(s):  
Marina Esteban ◽  
María Peña-Chilet ◽  
Carlos Loucera ◽  
Joaquín Dopazo

AbstractBackgroundIn spite of the abundance of genomic data, predictive models that describe phenotypes as a function of gene expression or mutations are difficult to obtain because they are affected by the curse of dimensionality, given the disbalance between samples and candidate genes. And this is especially dramatic in scenarios in which the availability of samples is difficult, such as the case of rare diseases.ResultsThe application of multi-output regression machine learning methodologies to predict the potential effect of external proteins over the signaling circuits that trigger Fanconi anemia related cell functionalities, inferred with a mechanistic model, allowed us to detect over 20 potential therapeutic targets.ConclusionsThe use of artificial intelligence methods for the prediction of potentially causal relationships between proteins of interest and cell activities related with disease-related phenotypes opens promising avenues for the systematic search of new targets in rare diseases.


2020 ◽  
Vol 20 (21) ◽  
pp. 1858-1867
Author(s):  
Xian Tan ◽  
Yang Yu ◽  
Kaiwen Duan ◽  
Jingbo Zhang ◽  
Pingping Sun ◽  
...  

Anticancer drug screening can accelerate drug discovery to save the lives of cancer patients, but cancer heterogeneity makes this screening challenging. The prediction of anticancer drug sensitivity is useful for anticancer drug development and the identification of biomarkers of drug sensitivity. Deep learning, as a branch of machine learning, is an important aspect of in silico research. Its outstanding computational performance means that it has been used for many biomedical purposes, such as medical image interpretation, biological sequence analysis, and drug discovery. Several studies have predicted anticancer drug sensitivity based on deep learning algorithms. The field of deep learning has made progress regarding model performance and multi-omics data integration. However, deep learning is limited by the number of studies performed and data sources available, so it is not perfect as a pre-clinical approach for use in the anticancer drug screening process. Improving the performance of deep learning models is a pressing issue for researchers. In this review, we introduce the research of anticancer drug sensitivity prediction and the use of deep learning in this research area. To provide a reference for future research, we also review some common data sources and machine learning methods. Lastly, we discuss the advantages and disadvantages of deep learning, as well as the limitations and future perspectives regarding this approach.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Mohanad Mohammed ◽  
Henry Mwambi ◽  
Innocent B. Mboya ◽  
Murtada K. Elbashir ◽  
Bernard Omolo

AbstractCancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p =  < 0.001, and p =  < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p =  < 0.001 and p =  < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p =  < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.


Sign in / Sign up

Export Citation Format

Share Document