GraphMI: Extracting Private Graph Data from Graph Neural Networks

As machine learning becomes more widely used for critical applications, the need to study its implications in privacy becomes urgent. Given access to the target model and auxiliary information, model inversion attack aims to infer sensitive features of the training dataset, which leads to great privacy concerns. Despite its success in the grid domain, directly applying model inversion techniques on non grid domains such as graph achieves poor attack performance due to the difficulty to fully exploit the intrinsic properties of graphs and attributes of graph nodes used in GNN models. To bridge this gap, we present Graph Model Inversion attack, which aims to infer edges of the training graph by inverting Graph Neural Networks, one of the most popular graph analysis tools. Specifically, the projected gradient module in our method can tackle the discreteness of graph edges while preserving the sparsity and smoothness of graph features. Moreover, a well designed graph autoencoder module can efficiently exploit graph topology, node attributes, and target model parameters. With the proposed method, we study the connection between model inversion risk and edge influence and show that edges with greater influence are more likely to be recovered. Extensive experiments over several public datasets demonstrate the effectiveness of our method. We also show that differential privacy in its canonical form can hardly defend our attack while preserving decent utility.

Download Full-text

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Journal of Cheminformatics ◽

10.1186/s13321-020-00479-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Dejun Jiang ◽

Zhenxing Wu ◽

Chang-Yu Hsieh ◽

Guangyong Chen ◽

Ben Liao ◽

...

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Domain Knowledge ◽

Prediction Models ◽

Computational Cost ◽

Large Dataset ◽

Predictive Capacity ◽

Classification Tasks ◽

Graph Neural Networks ◽

Public Datasets

AbstractGraph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Download Full-text

Identification of Tire Model Parameters with Artificial Neural Networks

Applied Sciences ◽

10.3390/app10249110 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9110 ◽

Cited By ~ 1

Author(s):

José Luis Olazagoitia ◽

Jesus Angel Perez ◽

Francisco Badea

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Computing Time ◽

Optimization Techniques ◽

Training Dataset ◽

Model Parameters ◽

Mathematical Form ◽

Tire Model ◽

Data Set ◽

Artificial Neural

Accurate modeling of tire characteristics is one of the most challenging tasks. Many mathematical models can be used to fit measured data. Identification of the parameters of these models usually relies on least squares optimization techniques. Different researchers have shown that the proper selection of an initial set of parameters is key to obtain a successful fitting. Besides, the mathematical process to identify the right parameters is, in some cases, quite time-consuming and not adequate for fast computing. This paper investigates the possibility of using Artificial Neural Networks (ANN) to reliably identify tire model parameters. In this case, the Pacejka’s “Magic Formula” has been chosen for the identification due to its complex mathematical form which, in principle, could result in a more difficult learning than other formulations. The proposed methodology is based on the creation of a sufficiently large training dataset, without errors, by randomly choosing the MF parameters within a range compatible with reality. The results obtained in this paper suggest that the use of ANN to directly identify parameters in tire models for real test data is possible without the need of complicated cost functions, iterative fitting or initial iteration point definition. The errors in the identification are normally very low for every parameter and the fitting problem time is reduced to a few milliseconds for any new given data set, which makes this methodology very appropriate to be used in applications where the computing time needs to be reduced to a minimum.

Download Full-text

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

10.21203/rs.3.rs-81439/v1 ◽

2020 ◽

Author(s):

Dejun Jiang ◽

Zhenxing Wu ◽

Chang-Yu Hsieh ◽

Guangyong Chen ◽

Ben Liao ◽

...

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Domain Knowledge ◽

Prediction Models ◽

Computational Cost ◽

Large Dataset ◽

Predictive Capacity ◽

Classification Tasks ◽

Graph Neural Networks ◽

Public Datasets

Abstract Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Download Full-text

Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data

10.1101/2020.08.03.235416 ◽

2020 ◽

Author(s):

Junjie Chen ◽

Wendy Hui Wang ◽

Xinghua Shi

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Defense Mechanism ◽

Genomic Data ◽

Training Dataset ◽

Model Accuracy ◽

Target Model ◽

Inference Attack ◽

The Cost ◽

Privacy Budget

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.

Download Full-text

Learning from Substitutable and Complementary Relations for Graph-based Sequential Product Recommendation

ACM Transactions on Information Systems ◽

10.1145/3464302 ◽

2022 ◽

Vol 40 (2) ◽

pp. 1-28

Author(s):

Wei Zhang ◽

Zeyuan Chen ◽

Hongyuan Zha ◽

Jianyong Wang

Keyword(s):

Neural Networks ◽

Data Driven ◽

Seamless Integration ◽

Specific Product ◽

Product Recommendation ◽

Target User ◽

Complementary Graph ◽

Sequential Product ◽

Graph Neural Networks ◽

Public Datasets

Sequential product recommendation, aiming at predicting the products that a target user will interact with soon, has become a hotspot topic. Most of the sequential recommendation models focus on learning from users’ interacted product sequences in a purely data-driven manner. However, they largely overlook the knowledgeable substitutable and complementary relations between products. To address this issue, we propose a novel Substitutable and Complementary Graph-based Sequential Product Recommendation model, namely, SCG-SPRe. The innovations of SCG-SPRe lie in its two main modules: (1) The module of interactive graph neural networks jointly encodes the high-order product correlations in the substitutable graph and the complementary graph into two types of relation-specific product representations. (2) The module of kernel-enhanced transformer networks adaptively fuses multiple temporal kernels to characterize the unique temporal patterns between a candidate product to be recommended and any interacted product in a target behavior sequence. Thanks to the seamless integration of the two modules, SCG-SPRe obtains candidate-dependent user representations for different candidate products to compute the corresponding ranking scores. We conduct extensive experiments on three public datasets, demonstrating SCG-SPRe is superior to competitive sequential recommendation baselines and validating the benefits of explicitly modeling the product-product relations.

Download Full-text

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

10.20944/preprints202112.0376.v1 ◽

2021 ◽

Author(s):

Yi-Quan Li ◽

Hao-Sen Chang ◽

Daw-Tung Lin

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Transfer Learning ◽

Character Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Model Parameters

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.

Download Full-text

Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-based and Graph-based Models

10.21203/rs.3.rs-79416/v1 ◽

2020 ◽

Author(s):

Dejun Jiang ◽

Zhenxing Wu ◽

Chang-Yu Hsieh ◽

Guangyong Chen ◽

Ben Liao ◽

...

Keyword(s):

Neural Networks ◽

Computational Efficiency ◽

Domain Knowledge ◽

Prediction Models ◽

Computational Cost ◽

Large Dataset ◽

Predictive Capacity ◽

Classification Tasks ◽

Graph Neural Networks ◽

Public Datasets

Abstract Graph neural networks (GNN) has been considered as an attractive modelling method for molecular property prediction, and numerous studies have shown that GNN could yield more promising results than traditional descriptor-based methods. In this study, based on 11 public datasets covering various property endpoints, the predictive capacity and computational efficiency of the prediction models developed by eight machine learning (ML) algorithms, including four descriptor-based models (SVM, XGBoost, RF and DNN) and four graph-based models (GCN, GAT, MPNN and Attentive FP), were extensively tested and compared. The results demonstrate that on average the descriptor-based models outperform the graph-based models in terms of prediction accuracy and computational efficiency. SVM generally achieves the best predictions for the regression tasks. Both RF and XGBoost can achieve reliable predictions for the classification tasks, and some of the graph-based models, such as Attentive FP and GCN, can yield outstanding performance for a fraction of larger or multi-task datasets. In terms of computational cost, XGBoost and RF are the two most efficient algorithms and only need a few seconds to train a model even for a large dataset. The model interpretations by the SHAP method can effectively explore the established domain knowledge for the descriptor-based models. Finally, we explored use of these models for virtual screening (VS) towards HIV and demonstrated that different ML algorithms offer diverse VS profiles. All in all, we believe that the off-the-shelf descriptor-based models still can be directly employed to accurately predict various chemical endpoints with excellent computability and interpretability.

Download Full-text

Graph Neural Networks for Prediction of Fuel Ignition Quality

10.26434/chemrxiv.12280325.v1 ◽

2020 ◽

Author(s):

Artur Schweidtmann ◽

Jan Rittig ◽

Andrea König ◽

Martin Grohe ◽

Alexander Mitsos ◽

...

Keyword(s):

Neural Networks ◽

Octane Number ◽

Molecular Graph ◽

Chemical Properties ◽

Graph Representation ◽

Structure Property ◽

Oxygenated Hydrocarbons ◽

Physico Chemical ◽

Ignition Quality ◽

Graph Neural Networks

<div>Prediction of combustion-related properties of (oxygenated) hydrocarbons is an important and challenging task for which quantitative structure-property relationship (QSPR) models are frequently employed. Recently, a machine learning method, graph neural networks (GNNs), has shown promising results for the prediction of structure-property relationships. GNNs utilize a graph representation of molecules, where atoms correspond to nodes and bonds to edges containing information about the molecular structure. More specifically, GNNs learn physico-chemical properties as a function of the molecular graph in a supervised learning setup using a backpropagation algorithm. This end-to-end learning approach eliminates the need for selection of molecular descriptors or structural groups, as it learns optimal fingerprints through graph convolutions and maps the fingerprints to the physico-chemical properties by deep learning. We develop GNN models for predicting three fuel ignition quality indicators, i.e., the derived cetane number (DCN), the research octane number (RON), and the motor octane number (MON), of oxygenated and non-oxygenated hydrocarbons. In light of limited experimental data in the order of hundreds, we propose a combination of multi-task learning, transfer learning, and ensemble learning. The results show competitive performance of the proposed GNN approach compared to state-of-the-art QSPR models making it a promising field for future research. The prediction tool is available via a web front-end at www.avt.rwth-aachen.de/gnn.</div>

Download Full-text