ProxyFL: Decentralized Federated Learning through Proxy Model Sharing

Abstract Institutions in highly regulated domains such as finance and healthcare often have restrictive rules around data sharing. Federated learning is a distributed learning framework that enables multi-institutional collaborations on decentralized data with improved protection for each collaborator’s data privacy. In this paper, we propose a communication-efficient scheme for decentralized federated learning called ProxyFL, or proxy-based federated learning. Each participant in ProxyFL maintains two models, a private model, and a publicly shared proxy model designed to protect the participant’s privacy. Proxy models allow efficient information exchange among participants using the PushSum method without the need of a centralized server. The proposed method eliminates a significant limitation of canonical federated learning by allowing model heterogeneity; each participant can have a private model with any architecture. Furthermore, our protocol for communication by proxy leads to stronger privacy guarantees using differential privacy analysis. Experiments on popular image datasets, and a pan-cancer diagnostic problem using over 30,000 high-quality gigapixel histology whole slide images, show that ProxyFL can outperform existing alternatives with much less communication overhead and stronger privacy.

Download Full-text

Towards Communication-Efficient and Attack-Resistant Federated Edge Learning for Industrial Internet of Things

ACM Transactions on Internet Technology ◽

10.1145/3453169 ◽

2022 ◽

Vol 22 (3) ◽

pp. 1-22

Author(s):

Yi Liu ◽

Ruihui Zhao ◽

Jiawen Kang ◽

Abdulsalam Yassine ◽

Dusit Niyato ◽

...

Keyword(s):

Internet Of Things ◽

Data Privacy ◽

Differential Privacy ◽

Global Model ◽

Edge Computing ◽

Communication Overhead ◽

Industrial Internet Of Things ◽

Malicious Nodes ◽

Industrial Internet ◽

Communication Efficiency

Federated Edge Learning (FEL) allows edge nodes to train a global deep learning model collaboratively for edge computing in the Industrial Internet of Things (IIoT), which significantly promotes the development of Industrial 4.0. However, FEL faces two critical challenges: communication overhead and data privacy. FEL suffers from expensive communication overhead when training large-scale multi-node models. Furthermore, due to the vulnerability of FEL to gradient leakage and label-flipping attacks, the training process of the global model is easily compromised by adversaries. To address these challenges, we propose a communication-efficient and privacy-enhanced asynchronous FEL framework for edge computing in IIoT. First, we introduce an asynchronous model update scheme to reduce the computation time that edge nodes wait for global model aggregation. Second, we propose an asynchronous local differential privacy mechanism, which improves communication efficiency and mitigates gradient leakage attacks by adding well-designed noise to the gradients of edge nodes. Third, we design a cloud-side malicious node detection mechanism to detect malicious nodes by testing the local model quality. Such a mechanism can avoid malicious nodes participating in training to mitigate label-flipping attacks. Extensive experimental studies on two real-world datasets demonstrate that the proposed framework can not only improve communication efficiency but also mitigate malicious attacks while its accuracy is comparable to traditional FEL frameworks.

Download Full-text

Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees

10.32920/14639934.v1 ◽

2021 ◽

Author(s):

Hao Ren ◽

Hongwei Li ◽

Xiaohui Liang ◽

Shibo He ◽

Yuanshun Dai ◽

...

Keyword(s):

Data Aggregation ◽

Data Privacy ◽

Differential Privacy ◽

Weighted Average ◽

Health Data ◽

Communication Overhead ◽

Aggregated Data ◽

Body Area Sensor Networks ◽

Aggregation Scheme ◽

Cloud Servers

With the rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks (WBANs) is becoming a barrier to their development. Therefore, outsourcing the encrypted health data to the cloud has been an appealing strategy. However, date aggregation will become difficult. Some recently-proposed schemes try to address this problem. However, there are still some functions and privacy issues that are not discussed. In this paper, we propose a privacy-enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. Specifically, we achieve a new aggregation function, weighted average (WAAS), and design a privacy-enhanced aggregation scheme (PAAS) to protect the aggregated data from cloud servers. Besides, a histogram aggregation scheme with high accuracy is proposed. PMHA-DP supports fault tolerance while preserving data privacy. The performance evaluation shows that the proposal leads to less communication overhead than the existing one.

Download Full-text

Privacy-Enhanced and Multifunctional Health Data Aggregation under Differential Privacy Guarantees

10.32920/14639934 ◽

2021 ◽

Author(s):

Hao Ren ◽

Hongwei Li ◽

Xiaohui Liang ◽

Shibo He ◽

Yuanshun Dai ◽

...

Keyword(s):

Data Aggregation ◽

Data Privacy ◽

Differential Privacy ◽

Weighted Average ◽

Health Data ◽

Communication Overhead ◽

Aggregated Data ◽

Body Area Sensor Networks ◽

Aggregation Scheme ◽

Cloud Servers

Download Full-text

Predicting gastric cancer outcome from resected lymph node histopathology images using deep learning

Nature Communications ◽

10.1038/s41467-021-21674-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Xiaodong Wang ◽

Ying Chen ◽

Yunshu Gao ◽

Huiqing Zhang ◽

Zehui Guan ◽

...

Keyword(s):

Gastric Cancer ◽

Deep Learning ◽

Lymph Node ◽

Lymph Nodes ◽

Main Method ◽

Learning Framework ◽

Prognostic Assessment ◽

N Stage ◽

Cancer Outcome ◽

Whole Slide Images

AbstractN-staging is a determining factor for prognostic assessment and decision-making for stage-based cancer therapeutic strategies. Visual inspection of whole-slides of intact lymph nodes is currently the main method used by pathologists to calculate the number of metastatic lymph nodes (MLNs). Moreover, even at the same N stage, the outcome of patients varies dramatically. Here, we propose a deep-learning framework for analyzing lymph node whole-slide images (WSIs) to identify lymph nodes and tumor regions, and then to uncover tumor-area-to-MLN-area ratio (T/MLN). After training, our model’s tumor detection performance was comparable to that of experienced pathologists and achieved similar performance on two independent gastric cancer validation cohorts. Further, we demonstrate that T/MLN is an interpretable independent prognostic factor. These findings indicate that deep-learning models could assist not only pathologists in detecting lymph nodes with metastases but also oncologists in exploring new prognostic factors, especially those that are difficult to calculate manually.

Download Full-text

A Comprehensive Survey on Local Differential Privacy

Security and Communication Networks ◽

10.1155/2020/8829523 ◽

2020 ◽

Vol 2020 ◽

pp. 1-29 ◽

Cited By ~ 1

Author(s):

Xingxing Xiong ◽

Shubo Liu ◽

Dan Li ◽

Zhaohui Cai ◽

Xiaoguang Niu

Keyword(s):

Big Data ◽

Data Analysis ◽

Statistical Learning ◽

Data Privacy ◽

Statistical Estimation ◽

Differential Privacy ◽

Future Research ◽

Reference Source ◽

Complex Data ◽

Comprehensive Survey

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.

Download Full-text

A Proxy Peng-Robinson EOS for Efficient Modeling of Phase Behavior

10.2118/203914-ms ◽

2021 ◽

Author(s):

Mark Zhao ◽

Ryosuke Okuno

Keyword(s):

Phase Behavior ◽

Computational Cost ◽

Training Data ◽

Petroleum Engineering ◽

Fugacity Coefficient ◽

Fugacity Model ◽

Proxy Model ◽

Reservoir Conditions ◽

Fugacity Coefficients ◽

Proxy Models

Abstract Equation-of-state (EOS) compositional simulation is commonly used to model the interplay between phase behavior and fluid flow for various reservoir and surface processes. Because of its computational cost, however, there is a critical need for efficient phase-behavior calculations using an EOS. The objective of this research was to develop a proxy model for fugacity coefficient based on the Peng-Robinson EOS for rapid multiphase flash in compositional flow simulation. The proxy model as implemented in this research is to bypass the calculations of fugacity coefficients when the Peng-Robinson EOS has only one root, which is often the case at reservoir conditions. The proxy fugacity model was trained by artificial neural networks (ANN) with over 30 million fugacity coefficients based on the Peng-Robinson EOS. It accurately predicts the Peng- Robinson fugacity coefficient by using four parameters: Am, Bm, Bi, and ΣxiAij. Since these scalar parameters are general, not specific to particular compositions, pressures, and temperatures, the proxy model is applicable to petroleum engineering applications as equally as the original Peng-Robinson EOS. The proxy model is applied to multiphase flash calculations (phase-split and stability), where the cubic equation solutions and fugacity coefficient calculations are bypassed when the Peng-Robinson EOS has one root. The original fugacity coefficient is analytically calculated when the EOS has more than one root, but this occurs only occasionally at reservoir conditions. A case study shows the proxy fugacity model gave a speed-up factor of 3.4% in comparison to the conventional EOS calculation. Case studies also demonstrate accurate multiphase flash results (stability and phase split) and interchangeable proxy models for different fluid cases with different (numbers of) components. This is possible because it predicts the Peng-Robinson fugacity in the variable space that is not specific to composition, temperature, and pressure. For the same reason, non-zero binary iteration parameters do not impair the applicability, accuracy, robustness, and efficiency of the model. As the proxy models are specific to individual components, a combination of proxy models can be used to model for any mixture of components. Tuning of training hyperparameters and training data sampling method helped reduce the mean absolute percent error to less than 0.1% in the ANN modeling. To the best of our knowledge, this is the first generalized proxy model of the Peng-Robinson fugacity that is applicable to any mixture. The proposed model retains the conventional flash iteration, the convergence robustness, and the option of manual parameter tuning for fluid characterization.

Download Full-text

Differential Privacy for Statistics: What we Know and What we Want to Learn

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v1i2.570 ◽

2010 ◽

Vol 1 (2) ◽

Cited By ~ 30

Author(s):

Cynthia Dwork ◽

Adam Smith

Keyword(s):

Research Agenda ◽

Data Privacy ◽

Differential Privacy ◽

Definition Of

We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.

Download Full-text

An Unsupervised Aspect-Aware Recommendation Model with Explanation Text Generation

ACM Transactions on Information Systems ◽

10.1145/3483611 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-29

Author(s):

Peijie Sun ◽

Le Wu ◽

Kun Zhang ◽

Yu Su ◽

Meng Wang

Keyword(s):

Data Privacy ◽

Auxiliary Information ◽

Generation Process ◽

Text Generation ◽

Generation Task ◽

Auxiliary Data ◽

Fine Grained ◽

Aspect Extraction ◽

Learning Framework ◽

Real World Datasets

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.

Download Full-text

Privacy Preserving Data Mining on Unstructured Data

Privacy and Security Policies in Big Data - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-5225-2486-1.ch008 ◽

2017 ◽

pp. 167-190

Author(s):

Trupti Vishwambhar Kenekar ◽

Ajay R. Dani

Keyword(s):

Data Mining ◽

Big Data ◽

Structure Data ◽

Data Privacy ◽

Differential Privacy ◽

Unstructured Data ◽

Map Reduce ◽

Individual Data ◽

Data Set ◽

Privacy Preserving Data Mining

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.

Download Full-text