scholarly journals GraphMI: Extracting Private Graph Data from Graph Neural Networks

Author(s):  
Zaixi Zhang ◽  
Qi Liu ◽  
Zhenya Huang ◽  
Hao Wang ◽  
Chengqiang Lu ◽  
...  

As machine learning becomes more widely used for critical applications, the need to study its implications in privacy becomes urgent. Given access to the target model and auxiliary information, model inversion attack aims to infer sensitive features of the training dataset, which leads to great privacy concerns. Despite its success in the grid domain, directly applying model inversion techniques on non grid domains such as graph achieves poor attack performance due to the difficulty to fully exploit the intrinsic properties of graphs and attributes of graph nodes used in GNN models. To bridge this gap, we present Graph Model Inversion attack, which aims to infer edges of the training graph by inverting Graph Neural Networks, one of the most popular graph analysis tools. Specifically, the projected gradient module in our method can tackle the discreteness of graph edges while preserving the sparsity and smoothness of graph features. Moreover, a well designed graph autoencoder module can efficiently exploit graph topology, node attributes, and target model parameters. With the proposed method, we study the connection between model inversion risk and edge influence and show that edges with greater influence are more likely to be recovered. Extensive experiments over several public datasets demonstrate the effectiveness of our method. We also show that differential privacy in its canonical form can hardly defend our attack while preserving decent utility.

2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Dejun Jiang ◽  
Zhenxing Wu ◽  
Chang-Yu Hsieh ◽  
Guangyong Chen ◽  
Ben Liao ◽  
...  

AbstractGraph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.


2020 ◽  
Vol 10 (24) ◽  
pp. 9110 ◽  
Author(s):  
José Luis Olazagoitia ◽  
Jesus Angel Perez ◽  
Francisco Badea

Accurate modeling of tire characteristics is one of the most challenging tasks. Many mathematical models can be used to fit measured data. Identification of the parameters of these models usually relies on least squares optimization techniques. Different researchers have shown that the proper selection of an initial set of parameters is key to obtain a successful fitting. Besides, the mathematical process to identify the right parameters is, in some cases, quite time-consuming and not adequate for fast computing. This paper investigates the possibility of using Artificial Neural Networks (ANN) to reliably identify tire model parameters. In this case, the Pacejka’s “Magic Formula” has been chosen for the identification due to its complex mathematical form which, in principle, could result in a more difficult learning than other formulations. The proposed methodology is based on the creation of a sufficiently large training dataset, without errors, by randomly choosing the MF parameters within a range compatible with reality. The results obtained in this paper suggest that the use of ANN to directly identify parameters in tire models for real test data is possible without the need of complicated cost functions, iterative fitting or initial iteration point definition. The errors in the identification are normally very low for every parameter and the fitting problem time is reduced to a few milliseconds for any new given data set, which makes this methodology very appropriate to be used in applications where the computing time needs to be reduced to a minimum.


2020 ◽  
Author(s):  
Dejun Jiang ◽  
Zhenxing Wu ◽  
Chang-Yu Hsieh ◽  
Guangyong Chen ◽  
Ben Liao ◽  
...  

Abstract Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.


2020 ◽  
Author(s):  
Junjie Chen ◽  
Wendy Hui Wang ◽  
Xinghua Shi

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.


2022 ◽  
Vol 40 (2) ◽  
pp. 1-28
Author(s):  
Wei Zhang ◽  
Zeyuan Chen ◽  
Hongyuan Zha ◽  
Jianyong Wang

Sequential product recommendation, aiming at predicting the products that a target user will interact with soon, has become a hotspot topic. Most of the sequential recommendation models focus on learning from users’ interacted product sequences in a purely data-driven manner. However, they largely overlook the knowledgeable substitutable and complementary relations between products. To address this issue, we propose a novel Substitutable and Complementary Graph-based Sequential Product Recommendation model, namely, SCG-SPRe. The innovations of SCG-SPRe lie in its two main modules: (1) The module of interactive graph neural networks jointly encodes the high-order product correlations in the substitutable graph and the complementary graph into two types of relation-specific product representations. (2) The module of kernel-enhanced transformer networks adaptively fuses multiple temporal kernels to characterize the unique temporal patterns between a candidate product to be recommended and any interacted product in a target behavior sequence. Thanks to the seamless integration of the two modules, SCG-SPRe obtains candidate-dependent user representations for different candidate products to compute the corresponding ranking scores. We conduct extensive experiments on three public datasets, demonstrating SCG-SPRe is superior to competitive sequential recommendation baselines and validating the benefits of explicitly modeling the product-product relations.


Author(s):  
Yi-Quan Li ◽  
Hao-Sen Chang ◽  
Daw-Tung Lin

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.


2020 ◽  
Author(s):  
Dejun Jiang ◽  
Zhenxing Wu ◽  
Chang-Yu Hsieh ◽  
Guangyong Chen ◽  
Ben Liao ◽  
...  

Abstract Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.


2020 ◽  
Author(s):  
Artur Schweidtmann ◽  
Jan Rittig ◽  
Andrea König ◽  
Martin Grohe ◽  
Alexander Mitsos ◽  
...  

<div>Prediction of combustion-related properties of (oxygenated) hydrocarbons is an important and challenging task for which quantitative structure-property relationship (QSPR) models are frequently employed. Recently, a machine learning method, graph neural networks (GNNs), has shown promising results for the prediction of structure-property relationships. GNNs utilize a graph representation of molecules, where atoms correspond to nodes and bonds to edges containing information about the molecular structure. More specifically, GNNs learn physico-chemical properties as a function of the molecular graph in a supervised learning setup using a backpropagation algorithm. This end-to-end learning approach eliminates the need for selection of molecular descriptors or structural groups, as it learns optimal fingerprints through graph convolutions and maps the fingerprints to the physico-chemical properties by deep learning. We develop GNN models for predicting three fuel ignition quality indicators, i.e., the derived cetane number (DCN), the research octane number (RON), and the motor octane number (MON), of oxygenated and non-oxygenated hydrocarbons. In light of limited experimental data in the order of hundreds, we propose a combination of multi-task learning, transfer learning, and ensemble learning. The results show competitive performance of the proposed GNN approach compared to state-of-the-art QSPR models making it a promising field for future research. The prediction tool is available via a web front-end at www.avt.rwth-aachen.de/gnn.</div>


2020 ◽  
Author(s):  
Zheng Lian ◽  
Jianhua Tao ◽  
Bin Liu ◽  
Jian Huang ◽  
Zhanlei Yang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document