A Blockchain-Based Federated Learning Method for Smart Healthcare

The development of artificial intelligence and worldwide epidemic events has promoted the implementation of smart healthcare while bringing issues of data privacy, malicious attack, and service quality. The Medical Internet of Things (MIoT), along with the technologies of federated learning and blockchain, has become a feasible solution for these issues. In this paper, we present a blockchain-based federated learning method for smart healthcare in which the edge nodes maintain the blockchain to resist a single point of failure and MIoT devices implement the federated learning to make full of the distributed clinical data. In particular, we design an adaptive differential privacy algorithm to protect data privacy and gradient verification-based consensus protocol to detect poisoning attacks. We compare our method with two similar methods on a real-world diabetes dataset. Promising experimental results show that our method can achieve high model accuracy in acceptable running time while also showing good performance in reducing the privacy budget consumption and resisting poisoning attacks.

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Towards Understanding the Risks of Gradient Inversion in Federated Learning

10.21203/rs.3.rs-1147182/v2 ◽

2021 ◽

Author(s):

Ali Hatamizadeh ◽

Hongxu Yin ◽

Pavlo Molchanov ◽

Andriy Myronenko ◽

Wenqi Li ◽

...

Keyword(s):

Neural Networks ◽

Data Privacy ◽

Deep Neural Networks ◽

Differential Privacy ◽

Use Cases ◽

Training Data ◽

Model Accuracy ◽

Healthcare Applications ◽

Raw Data ◽

Collaborative Training

Abstract Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack that works for more realistic scenarios where the clients’ training involves updating the Batch Normalization (BN) statistics. Furthermore, we present new ways to measure and visualize potential data leakage in FL. Our work is a step towards establishing reproducible methods of measuring data leakage in FL and could help determine the optimal tradeoffs between privacy-preserving techniques, such as differential privacy, and model accuracy based on quantifiable metrics.

Download Full-text

Towards Understanding the Risks of Gradient Inversion in Federated Learning

10.21203/rs.3.rs-1147182/v1 ◽

2021 ◽

Author(s):

Ali Hatamizadeh ◽

Hongxu Yin ◽

Pavlo Molchanov ◽

Andriy Myronenko ◽

Wenqi Li ◽

...

Keyword(s):

Neural Networks ◽

Data Privacy ◽

Deep Neural Networks ◽

Differential Privacy ◽

Use Cases ◽

Training Data ◽

Model Accuracy ◽

Healthcare Applications ◽

Raw Data ◽

Collaborative Training

Download Full-text

Differential Privacy Protection Against Membership Inference Attack on Machine Learning for Genomic Data

10.1101/2020.08.03.235416 ◽

2020 ◽

Author(s):

Junjie Chen ◽

Wendy Hui Wang ◽

Xinghua Shi

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Defense Mechanism ◽

Genomic Data ◽

Training Dataset ◽

Model Accuracy ◽

Target Model ◽

Inference Attack ◽

The Cost ◽

Privacy Budget

Machine learning is powerful to model massive genomic data while genome privacy is a growing concern. Studies have shown that not only the raw data but also the trained model can potentially infringe genome privacy. An example is the membership inference attack (MIA), by which the adversary, who only queries a given target model without knowing its internal parameters, can determine whether a specific record was included in the training dataset of the target model. Differential privacy (DP) has been used to defend against MIA with rigorous privacy guarantee. In this paper, we investigate the vulnerability of machine learning against MIA on genomic data, and evaluate the effectiveness of using DP as a defense mechanism. We consider two widely-used machine learning models, namely Lasso and convolutional neural network (CNN), as the target model. We study the trade-off between the defense power against MIA and the prediction accuracy of the target model under various privacy settings of DP. Our results show that the relationship between the privacy budget and target model accuracy can be modeled as a log-like curve, thus a smaller privacy budget provides stronger privacy guarantee with the cost of losing more model accuracy. We also investigate the effect of model sparsity on model vulnerability against MIA. Our results demonstrate that in addition to prevent overfitting, model sparsity can work together with DP to significantly mitigate the risk of MIA.

Download Full-text

Federated Model Distillation with Noise-Free Differential Privacy

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/216 ◽

2021 ◽

Author(s):

Lichao Sun ◽

Lingjuan Lyu

Keyword(s):

Data Privacy ◽

Differential Privacy ◽

Random Noise ◽

Model Performance ◽

Training Data ◽

Heterogeneous Model ◽

The Public ◽

Inference Attacks ◽

Public Datasets ◽

Privacy Budget

Conventional federated learning directly averages model weights, which is only possible for collaboration between models with homogeneous architectures. Sharing prediction instead of weight removes this obstacle and eliminates the risk of white-box inference attacks in conventional federated learning. However, the predictions from local models are sensitive and would leak training data privacy to the public. To address this issue, one naive approach is adding the differentially private random noise to the predictions, which however brings a substantial trade-off between privacy budget and model performance. In this paper, we propose a novel framework called FEDMD-NFDP, which applies a Noise-FreeDifferential Privacy (NFDP) mechanism into a federated model distillation framework. Our extensive experimental results on various datasets validate that FEDMD-NFDP can deliver not only comparable utility and communication efficiency but also provide a noise-free differential privacy guarantee. We also demonstrate the feasibility of our FEDMD-NFDP by considering both IID and Non-IID settings, heterogeneous model architectures, and unlabelled public datasets from a different distribution.

Download Full-text

A Comprehensive Survey on Local Differential Privacy

Security and Communication Networks ◽

10.1155/2020/8829523 ◽

2020 ◽

Vol 2020 ◽

pp. 1-29 ◽

Cited By ~ 1

Author(s):

Xingxing Xiong ◽

Shubo Liu ◽

Dan Li ◽

Zhaohui Cai ◽

Xiaoguang Niu

Keyword(s):

Big Data ◽

Data Analysis ◽

Statistical Learning ◽

Data Privacy ◽

Statistical Estimation ◽

Differential Privacy ◽

Future Research ◽

Reference Source ◽

Complex Data ◽

Comprehensive Survey

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.

Download Full-text

Research trends and solutions for secure traffic management of SDN

APTIKOM Journal on Computer Science and Information Technologies ◽

10.34306/csit.v2i3.70 ◽

2020 ◽

Vol 2 (3) ◽

pp. 97-105

Author(s):

Ravi Shankar Pandey ◽

Vivek Srivastava ◽

Lal Babu Yadav

Keyword(s):

Traffic Management ◽

Single Point ◽

Research Trends ◽

Security Requirements ◽

New Paradigm ◽

Research Papers ◽

Malicious Attack ◽

Security Issues ◽

Free Environment ◽

New Research

Software Defined Network (SDN) decouples the responsibilities of route management and datatransmission of network devices present in network infrastructure. It integrates the control responsibility at thecentralized software component which is known as controller. This centralized aggregation of responsibilities mayresult the single point of failure in the case malicious attack at the controller side. These attacks may also affect thetraffic flow and network devices. The security issues due to such malicious attacks in SDN are dominating challengesin the implementation and utilization of opportunities provided by this new paradigm. In this paper we haveinvestigated the several research papers related to proposal of new research trends for security and suggestionswhich fulfil the security requirements like confidentiality, integrity, availability, authenticity, authorization,nonrepudiation, consistency, fast responsiveness and adaptation. We have also investigated the new future researchfor creating the attack free environment for implementing the SDN.

Download Full-text

Differential Privacy for Statistics: What we Know and What we Want to Learn

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v1i2.570 ◽

2010 ◽

Vol 1 (2) ◽

Cited By ~ 30

Author(s):

Cynthia Dwork ◽

Adam Smith

Keyword(s):

Research Agenda ◽

Data Privacy ◽

Differential Privacy ◽

Definition Of

We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.

Download Full-text

Effect of input variables on rainfall-runoff modeling using a deep learning method

10.5194/egusphere-egu21-4398 ◽

2021 ◽

Author(s):

Kazuki yokoo ◽

Kei ishida ◽

Takeyoshi nagasato ◽

Ali Ercan

Keyword(s):

Deep Learning ◽

Air Temperature ◽

Meteorological Variables ◽

Learning Method ◽

Rainfall Runoff ◽

Model Accuracy ◽

Runoff Modeling ◽

Input Variables ◽

Rainfall Runoff Model ◽

Runoff Model

<p>In recent years, deep learning has been applied to various issues in natural science, including hydrology. These application results show its high applicability. There are some studies that performed rainfall-runoff modeling by means of a deep learning method, LSTM (Long Short-Term Memory). LSTM is a kind of RNN (Recurrent Neural Networks) that is suitable for modeling time series data with long-term dependence. These studies showed the capability of LSTM for rainfall-runoff modeling. However, there are few studies that investigate the effects of input variables on the estimation accuracy. Therefore, this study, investigated the effects of the selection of input variables on the accuracy of a rainfall-runoff model by means of LSTM. As the study watershed, this study selected a snow-dominated watershed, the Ishikari River basin, which is in the Hokkaido region of Japan. The flow discharge was obtained at a gauging station near the outlet of the river as the target data. For the input data to the model, Meteorological variables were obtained from an atmospheric reanalysis dataset, ERA5, in addition to the gridded precipitation dataset. The selected meteorological variables were air temperature, evaporation, longwave radiation, shortwave radiation, and mean sea level pressure. Then, the rainfall-runoff model was trained with several combinations of the input variables. After the training, the model accuracy was compared among the combinations. The use of meteorological variables in addition to precipitation and air temperature as input improved the model accuracy. In some cases, however, the model accuracy was worsened by using more variables as input. The results indicate the importance to select adequate variables as input for rainfall-runoff modeling by LSTM.</p>

Download Full-text

Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text