scholarly journals ProxyFL: Decentralized Federated Learning through Proxy Model Sharing

Author(s):  
Shivam Kalra ◽  
Junfeng Wen ◽  
Jesse Cresswell ◽  
Maksims Volkovs ◽  
Hamid Tizhoosh

Abstract Institutions in highly regulated domains such as finance and healthcare often have restrictive rules around data sharing. Federated learning is a distributed learning framework that enables multi-institutional collaborations on decentralized data with improved protection for each collaborator’s data privacy. In this paper, we propose a communication-efficient scheme for decentralized federated learning called ProxyFL, or proxy-based federated learning. Each participant in ProxyFL maintains two models, a private model, and a publicly shared proxy model designed to protect the participant’s privacy. Proxy models allow efficient information exchange among participants using the PushSum method without the need of a centralized server. The proposed method eliminates a significant limitation of canonical federated learning by allowing model heterogeneity; each participant can have a private model with any architecture. Furthermore, our protocol for communication by proxy leads to stronger privacy guarantees using differential privacy analysis. Experiments on popular image datasets, and a pan-cancer diagnostic problem using over 30,000 high-quality gigapixel histology whole slide images, show that ProxyFL can outperform existing alternatives with much less communication overhead and stronger privacy.

2022 ◽  
Vol 22 (3) ◽  
pp. 1-22
Author(s):  
Yi Liu ◽  
Ruihui Zhao ◽  
Jiawen Kang ◽  
Abdulsalam Yassine ◽  
Dusit Niyato ◽  
...  

Federated Edge Learning (FEL) allows edge nodes to train a global deep learning model collaboratively for edge computing in the Industrial Internet of Things (IIoT), which significantly promotes the development of Industrial 4.0. However, FEL faces two critical challenges: communication overhead and data privacy. FEL suffers from expensive communication overhead when training large-scale multi-node models. Furthermore, due to the vulnerability of FEL to gradient leakage and label-flipping attacks, the training process of the global model is easily compromised by adversaries. To address these challenges, we propose a communication-efficient and privacy-enhanced asynchronous FEL framework for edge computing in IIoT. First, we introduce an asynchronous model update scheme to reduce the computation time that edge nodes wait for global model aggregation. Second, we propose an asynchronous local differential privacy mechanism, which improves communication efficiency and mitigates gradient leakage attacks by adding well-designed noise to the gradients of edge nodes. Third, we design a cloud-side malicious node detection mechanism to detect malicious nodes by testing the local model quality. Such a mechanism can avoid malicious nodes participating in training to mitigate label-flipping attacks. Extensive experimental studies on two real-world datasets demonstrate that the proposed framework can not only improve communication efficiency but also mitigate malicious attacks while its accuracy is comparable to traditional FEL frameworks.


2021 ◽  
Author(s):  
Hao Ren ◽  
Hongwei Li ◽  
Xiaohui Liang ◽  
Shibo He ◽  
Yuanshun Dai ◽  
...  

With the rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks (WBANs) is becoming a barrier to their development. Therefore, outsourcing the encrypted health data to the cloud has been an appealing strategy. However, date aggregation will become difficult. Some recently-proposed schemes try to address this problem. However, there are still some functions and privacy issues that are not discussed. In this paper, we propose a privacy-enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. Specifically, we achieve a new aggregation function, weighted average (WAAS), and design a privacy-enhanced aggregation scheme (PAAS) to protect the aggregated data from cloud servers. Besides, a histogram aggregation scheme with high accuracy is proposed. PMHA-DP supports fault tolerance while preserving data privacy. The performance evaluation shows that the proposal leads to less communication overhead than the existing one.


2021 ◽  
Author(s):  
Hao Ren ◽  
Hongwei Li ◽  
Xiaohui Liang ◽  
Shibo He ◽  
Yuanshun Dai ◽  
...  

With the rapid growth of the health data scale, the limited storage and computation resources of wireless body area sensor networks (WBANs) is becoming a barrier to their development. Therefore, outsourcing the encrypted health data to the cloud has been an appealing strategy. However, date aggregation will become difficult. Some recently-proposed schemes try to address this problem. However, there are still some functions and privacy issues that are not discussed. In this paper, we propose a privacy-enhanced and multifunctional health data aggregation scheme (PMHA-DP) under differential privacy. Specifically, we achieve a new aggregation function, weighted average (WAAS), and design a privacy-enhanced aggregation scheme (PAAS) to protect the aggregated data from cloud servers. Besides, a histogram aggregation scheme with high accuracy is proposed. PMHA-DP supports fault tolerance while preserving data privacy. The performance evaluation shows that the proposal leads to less communication overhead than the existing one.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Xiaodong Wang ◽  
Ying Chen ◽  
Yunshu Gao ◽  
Huiqing Zhang ◽  
Zehui Guan ◽  
...  

AbstractN-staging is a determining factor for prognostic assessment and decision-making for stage-based cancer therapeutic strategies. Visual inspection of whole-slides of intact lymph nodes is currently the main method used by pathologists to calculate the number of metastatic lymph nodes (MLNs). Moreover, even at the same N stage, the outcome of patients varies dramatically. Here, we propose a deep-learning framework for analyzing lymph node whole-slide images (WSIs) to identify lymph nodes and tumor regions, and then to uncover tumor-area-to-MLN-area ratio (T/MLN). After training, our model’s tumor detection performance was comparable to that of experienced pathologists and achieved similar performance on two independent gastric cancer validation cohorts. Further, we demonstrate that T/MLN is an interpretable independent prognostic factor. These findings indicate that deep-learning models could assist not only pathologists in detecting lymph nodes with metastases but also oncologists in exploring new prognostic factors, especially those that are difficult to calculate manually.


2020 ◽  
Vol 2020 ◽  
pp. 1-29 ◽  
Author(s):  
Xingxing Xiong ◽  
Shubo Liu ◽  
Dan Li ◽  
Zhaohui Cai ◽  
Xiaoguang Niu

With the advent of the era of big data, privacy issues have been becoming a hot topic in public. Local differential privacy (LDP) is a state-of-the-art privacy preservation technique that allows to perform big data analysis (e.g., statistical estimation, statistical learning, and data mining) while guaranteeing each individual participant’s privacy. In this paper, we present a comprehensive survey of LDP. We first give an overview on the fundamental knowledge of LDP and its frameworks. We then introduce the mainstream privatization mechanisms and methods in detail from the perspective of frequency oracle and give insights into recent studied on private basic statistical estimation (e.g., frequency estimation and mean estimation) and complex statistical estimation (e.g., multivariate distribution estimation and private estimation over complex data) under LDP. Furthermore, we present current research circumstances on LDP including the private statistical learning/inferencing, private statistical data analysis, privacy amplification techniques for LDP, and some application fields under LDP. Finally, we identify future research directions and open challenges for LDP. This survey can serve as a good reference source for the research of LDP to deal with various privacy-related scenarios to be encountered in practice.


2021 ◽  
Author(s):  
Mark Zhao ◽  
Ryosuke Okuno

Abstract Equation-of-state (EOS) compositional simulation is commonly used to model the interplay between phase behavior and fluid flow for various reservoir and surface processes. Because of its computational cost, however, there is a critical need for efficient phase-behavior calculations using an EOS. The objective of this research was to develop a proxy model for fugacity coefficient based on the Peng-Robinson EOS for rapid multiphase flash in compositional flow simulation. The proxy model as implemented in this research is to bypass the calculations of fugacity coefficients when the Peng-Robinson EOS has only one root, which is often the case at reservoir conditions. The proxy fugacity model was trained by artificial neural networks (ANN) with over 30 million fugacity coefficients based on the Peng-Robinson EOS. It accurately predicts the Peng- Robinson fugacity coefficient by using four parameters: Am, Bm, Bi, and ΣxiAij. Since these scalar parameters are general, not specific to particular compositions, pressures, and temperatures, the proxy model is applicable to petroleum engineering applications as equally as the original Peng-Robinson EOS. The proxy model is applied to multiphase flash calculations (phase-split and stability), where the cubic equation solutions and fugacity coefficient calculations are bypassed when the Peng-Robinson EOS has one root. The original fugacity coefficient is analytically calculated when the EOS has more than one root, but this occurs only occasionally at reservoir conditions. A case study shows the proxy fugacity model gave a speed-up factor of 3.4% in comparison to the conventional EOS calculation. Case studies also demonstrate accurate multiphase flash results (stability and phase split) and interchangeable proxy models for different fluid cases with different (numbers of) components. This is possible because it predicts the Peng-Robinson fugacity in the variable space that is not specific to composition, temperature, and pressure. For the same reason, non-zero binary iteration parameters do not impair the applicability, accuracy, robustness, and efficiency of the model. As the proxy models are specific to individual components, a combination of proxy models can be used to model for any mixture of components. Tuning of training hyperparameters and training data sampling method helped reduce the mean absolute percent error to less than 0.1% in the ANN modeling. To the best of our knowledge, this is the first generalized proxy model of the Peng-Robinson fugacity that is applicable to any mixture. The proposed model retains the conventional flash iteration, the convergence robustness, and the option of manual parameter tuning for fluid characterization.


Author(s):  
Cynthia Dwork ◽  
Adam Smith

We motivate and review the definition of differential privacy, survey some results on differentially private statistical estimators, and outline a research agenda. This survey is based on two presentations given by the authors at an NCHS/CDC sponsored workshop on data privacy in May 2008.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-29
Author(s):  
Peijie Sun ◽  
Le Wu ◽  
Kun Zhang ◽  
Yu Su ◽  
Meng Wang

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.


Author(s):  
Trupti Vishwambhar Kenekar ◽  
Ajay R. Dani

As Big Data is group of structured, unstructured and semi-structure data collected from various sources, it is important to mine and provide privacy to individual data. Differential Privacy is one the best measure which provides strong privacy guarantee. The chapter proposed differentially private frequent item set mining using map reduce requires less time for privately mining large dataset. The chapter discussed problem of preserving data privacy, different challenges to preserving data privacy in big data environment, Data privacy techniques and their applications to unstructured data. The analyses of experimental results on structured and unstructured data set are also presented.


Sign in / Sign up

Export Citation Format

Share Document