scholarly journals Privacy-Preserving Federated Learning Framework with General Aggregation and Multiparty Entity Matching

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Zhou Zhou ◽  
Youliang Tian ◽  
Changgen Peng

The requirement for data sharing and privacy has brought increasing attention to federated learning. However, the existing aggregation models are too specialized and deal less with users’ withdrawal issue. Moreover, protocols for multiparty entity matching are rarely covered. Thus, there is no systematic framework to perform federated learning tasks. In this paper, we systematically propose a privacy-preserving federated learning framework (PFLF) where we first construct a general secure aggregation model in federated learning scenarios by combining the Shamir secret sharing with homomorphic cryptography to ensure that the aggregated value can be decrypted correctly only when the number of participants is greater than t . Furthermore, we propose a multiparty entity matching protocol by employing secure multiparty computing to solve the entity alignment problems and a logistic regression algorithm to achieve privacy-preserving model training and support the withdrawal of users in vertical federated learning (VFL) scenarios. Finally, the security analyses prove that PFLF preserves the data privacy in the honest-but-curious model, and the experimental evaluations show PFLF attains consistent accuracy with the original model and demonstrates the practical feasibility.

Author(s):  
Harsh Kasyap ◽  
Somanath Tripathy

Clinical trials and drug discovery would not be effective without the collaboration of institutions. Earlier, it has been at the cost of individual’s privacy. Several pacts and compliances have been enforced to avoid data breaches. The existing schemes collect the participant’s data to a central repository for learning predictions as the collaboration is indispensable for research advances. The current COVID pandemic has put a question mark on our existing setup where the existing data repository has proved to be obsolete. There is a need for contemporary data collection, processing, and learning. The smartphones and devices held by the last person of the society have also made them a potential contributor. It demands to design a distributed and decentralized Collaborative Learning system that would make the knowledge inference from every data point. Federated Learning [21], proposed by Google, brings the concept of in-place model training by keeping the data intact to the device. Though it is privacy-preserving in nature, however, it is susceptible to inference, poisoning, and Sybil attacks. Blockchain is a decentralized programming paradigm that provides a broader control of the system, making it attack resistant. It poses challenges of high computing power, storage, and latency. These emerging technologies can contribute to the desired learning system and motivate them to address their security and efficiency issues. This article systematizes the security issues in Federated Learning, its corresponding mitigation strategies, and Blockchain’s challenges. Further, a Blockchain-based Federated Learning architecture with two layers of participation is presented, which improves the global model accuracy and guarantees participant’s privacy. It leverages the channel mechanism of Blockchain for parallel model training and distribution. It facilitates establishing decentralized trust between the participants and the gateways using the Blockchain, which helps to have only honest participants.


2022 ◽  
Vol 25 (3) ◽  
pp. 18-22
Author(s):  
Ticao Zhang ◽  
Shiwen Mao

With the growing concern on data privacy and security, it is undesirable to collect data from all users to perform machine learning tasks. Federated learning, a decentralized learning framework, was proposed to construct a shared prediction model while keeping owners' data on their own devices. This paper presents an introduction to the emerging federated learning standard and discusses its various aspects, including i) an overview of federated learning, ii) types of federated learning, iii) major concerns and the performance evaluation criteria of federated learning, and iv) associated regulatory requirements. The purpose of this paper is to provide an understanding of the standard and facilitate its usage in model building across organizations while meeting privacy and security concerns.


Author(s):  
Wanlu Zhang ◽  
Qigang Wang ◽  
Mei Li

Background: As artificial intelligence and big data analysis develop rapidly, data privacy, especially patient medical data privacy, is getting more and more attention. Objective: To strengthen the protection of private data while ensuring the model training process, this article introduces a multi-Blockchain-based decentralized collaborative machine learning training method for medical image analysis. In this way, researchers from different medical institutions are able to collaborate to train models without exchanging sensitive patient data. Method: Partial parameter update method is applied to prevent indirect privacy leakage during model propagation. With the peer-to-peer communication in the multi-Blockchain system, a machine learning task can leverage auxiliary information from another similar task in another Blockchain. In addition, after the collaborative training process, personalized models of different medical institutions will be trained. Results: The experimental results show that our method achieves similar performance with the centralized model-training method by collecting data sets of all participants and prevents private data leakage at the same time. Transferring auxiliary information from similar task on another Blockchain has also been proven to effectively accelerate model convergence and improve model accuracy, especially in the scenario of absence of data. Personalization training process further improves model performance. Conclusion: Our approach can effectively help researchers from different organizations to achieve collaborative training without disclosing their private data.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Qi Dou ◽  
Tiffany Y. So ◽  
Meirui Jiang ◽  
Quande Liu ◽  
Varut Vardhanabhuti ◽  
...  

AbstractData privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.


2021 ◽  
Vol 13 (4) ◽  
pp. 94
Author(s):  
Haokun Fang ◽  
Quan Qian

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.


2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Hua Dai ◽  
Hui Ren ◽  
Zhiye Chen ◽  
Geng Yang ◽  
Xun Yi

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.


2021 ◽  
Vol 2083 (3) ◽  
pp. 032059
Author(s):  
Qiang Chen ◽  
Meiling Deng

Abstract Regression algorithms are commonly used in machine learning. Based on encryption and privacy protection methods, the current key hot technology regression algorithm and the same encryption technology are studied. This paper proposes a PPLAR based algorithm. The correlation between data items is obtained by logistic regression formula. The algorithm is distributed and parallelized on Hadoop platform to improve the computing speed of the cluster while ensuring the average absolute error of the algorithm.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-29
Author(s):  
Peijie Sun ◽  
Le Wu ◽  
Kun Zhang ◽  
Yu Su ◽  
Meng Wang

Review based recommendation utilizes both users’ rating records and the associated reviews for recommendation. Recently, with the rapid demand for explanations of recommendation results, reviews are used to train the encoder–decoder models for explanation text generation. As most of the reviews are general text without detailed evaluation, some researchers leveraged auxiliary information of users or items to enrich the generated explanation text. Nevertheless, the auxiliary data is not available in most scenarios and may suffer from data privacy problems. In this article, we argue that the reviews contain abundant semantic information to express the users’ feelings for various aspects of items, while these information are not fully explored in current explanation text generation task. To this end, we study how to generate more fine-grained explanation text in review based recommendation without any auxiliary data. Though the idea is simple, it is non-trivial since the aspect is hidden and unlabeled. Besides, it is also very challenging to inject aspect information for generating explanation text with noisy review input. To solve these challenges, we first leverage an advanced unsupervised neural aspect extraction model to learn the aspect-aware representation of each review sentence. Thus, users and items can be represented in the aspect space based on their historical associated reviews. After that, we detail how to better predict ratings and generate explanation text with the user and item representations in the aspect space. We further dynamically assign review sentences which contain larger proportion of aspect words with larger weights to control the text generation process, and jointly optimize rating prediction accuracy and explanation text generation quality with a multi-task learning framework. Finally, extensive experimental results on three real-world datasets demonstrate the superiority of our proposed model for both recommendation accuracy and explainability.


Sign in / Sign up

Export Citation Format

Share Document