scholarly journals Stochastic Channel-Based Federated Learning With Neural Network Pruning for Medical Data Privacy Preservation: Model Development and Experimental Validation (Preprint)

2019 ◽  
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Ziwei Chen ◽  
Hui Liu ◽  
Dianbo Liu

BACKGROUND Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. OBJECTIVE To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. METHODS We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. RESULTS We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. CONCLUSIONS In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.

10.2196/17265 ◽  
2020 ◽  
Vol 4 (12) ◽  
pp. e17265
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Ziwei Chen ◽  
Hui Liu ◽  
Dianbo Liu

Background Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. Objective To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. Methods We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. Results We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. Conclusions In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.


2019 ◽  
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Hui Liu ◽  
Dianbo Liu

BACKGROUND Artificial neural network has achieved unprecedented success in a wide variety of domains such as classifying, predicting and recognizing objects. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. OBJECTIVE To address this problem, we propose a privacy-preserving method for the distributed system. The proposed method, Stochastic Channel-Based Federated Learning (SCBF), enables the participants to train a high-performance model cooperatively without sharing their inputs. METHODS Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop and upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, is applied to the algorithm based on the validation set. RESULTS We construct a distributed system consisting of 5 clients and 1 server. Our trials show that the Stochastic Channel-Based Federated Learning method can achieve an AUCROC of 0.9776 and an AUCPR of 0.9695 with 10% channels shared with the server. Compared with Federated Averaging algorithm, the proposed method achieves 0.05388 higher in AUCROC and 0.09695 higher in AUCPR. In addition, our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR. CONCLUSIONS In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method, which reveals all the parameters of local models to the server. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.


Author(s):  
Mahmoud Barhamgi ◽  
Djamal Benslimane ◽  
Chirine Ghedira ◽  
Brahim Medjahed

Recent years have witnessed a growing interest in using Web services as a reliable means for medical data sharing inside and across healthcare organizations. In such service-based data sharing environments, Web service composition emerged as a viable approach to query data scattered across independent locations. Patient data privacy preservation is an important aspect that must be considered when composing medical Web services. In this paper, the authors show how data privacy can be preserved when composing and executing Web services. Privacy constraints are expressed in the form of RDF queries over a mediated ontology. Query rewriting algorithms are defined to process those queries while preserving users’ privacy.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 2109
Author(s):  
Liming Fang ◽  
Minghui Li ◽  
Lu Zhou ◽  
Hanyi Zhang ◽  
Chunpeng Ge

A smart watch is a kind of emerging wearable device in the Internet of Things. The security and privacy problems are the main obstacles that hinder the wide deployment of smart watches. Existing security mechanisms do not achieve a balance between the privacy-preserving and data access control. In this paper, we propose a fine-grained privacy-preserving access control architecture for smart watches (FPAS). In FPAS, we leverage the identity-based authentication scheme to protect the devices from malicious connection and policy-based access control for data privacy preservation. The core policy of FPAS is two-fold: (1) utilizing a homomorphic and re-encrypted scheme to ensure that the ciphertext information can be correctly calculated; (2) dividing the data requester by different attributes to avoid unauthorized access. We present a concrete scheme based on the above prototype and analyze the security of the FPAS. The performance and evaluation demonstrate that the FPAS scheme is efficient, practical, and extensible.


2021 ◽  
Author(s):  
Rohit Ravindra Nikam ◽  
Rekha Shahapurkar

Data mining is a technique that explores the necessary data is extracted from large data sets. Privacy protection of data mining is about hiding the sensitive information or identity of breach security or without losing data usability. Sensitive data contains confidential information about individuals, businesses, and governments who must not agree upon before sharing or publishing his privacy data. Conserving data mining privacy has become a critical research area. Various evaluation metrics such as performance in terms of time efficiency, data utility, and degree of complexity or resistance to data mining techniques are used to estimate the privacy preservation of data mining techniques. Social media and smart phones produce tons of data every minute. To decision making, the voluminous data produced from the different sources can be processed and analyzed. But data analytics are vulnerable to breaches of privacy. One of the data analytics frameworks is recommendation systems commonly used by e-commerce sites such as Amazon, Flip Kart to recommend items to customers based on their purchasing habits that lead to characterized. This paper presents various techniques of privacy conservation, such as data anonymization, data randomization, generalization, data permutation, etc. such techniques which existing researchers use. We also analyze the gap between various processes and privacy preservation methods and illustrate how to overcome such issues with new innovative methods. Finally, our research describes the outcome summary of the entire literature.


2021 ◽  
Vol 13 (10) ◽  
pp. 247
Author(s):  
Baocheng Wang ◽  
Zetao Li

Recently, with the great development of e-health, more and more countries have made certain achievements in the field of electronic medical treatment. The digitization of medical equipment and the structuralization of electronic medical records are the general trends. While bringing convenience to people, the explosive growth of medical data will further promote the value of mining medical data. Obviously, finding out how to safely store such a large amount of data is a problem that urgently needs to be solved. Additionally, the particularity of medical data makes it necessarily subject to great privacy protection needs. This reinforces the importance of designing a safe solution to ensure data privacy. Many existing schemes are based on single-server architecture, which have some natural defects (such as single-point faults). Although blockchain can help solve such problems, there are still some deficiencies in privacy protection. To solve these problems, this paper designs a medical data privacy protection system, which integrates blockchain, group signature, and asymmetric encryption to realize reliable medical data sharing between medical institutions and protect the data privacy of patients. This paper proves theoretically that it meets our security and privacy requirements, and proves its practicability through system implementation.


2014 ◽  
Vol 8 (1) ◽  
pp. 13-21 ◽  
Author(s):  
ARKADIUSZ LIBER

Introduction: Medical documentation must be protected against damage or loss, in compliance with its integrity and credibility and the opportunity to a permanent access by the authorized staff and, finally, protected against the access of unauthorized persons. Anonymization is one of the methods to safeguard the data against the disclosure.Aim of the study: The study aims at the analysis of methods of anonymization, the analysis of methods of the protection of anonymized data and the study of a new security type of privacy enabling to control sensitive data by the entity which the data concerns.Material and methods: The analytical and algebraic methods were used.Results: The study ought to deliver the materials supporting the choice and analysis of the ways of the anonymization of medical data, and develop a new privacy protection solution enabling the control of sensitive data by entities whom this data concerns.Conclusions: In the paper, the analysis of solutions of data anonymizing used for medical data privacy protection was con-ducted. The methods, such as k-Anonymity, (X,y)- Anonymity, (a,k)- Anonymity, (k,e)-Anonymity, (X,y)-Privacy, LKC-Privacy, l-Diversity, (X,y)-Linkability, t-Closeness, Confidence Bounding and Personalized Privacy were described, explained and analyzed. The analysis of solutions to control sensitive data by their owners was also conducted. Apart from the existing methods of the anonymization, the analysis of methods of the anonimized data protection was conducted, in particular the methods of: d-Presence, e-Differential Privacy, (d,g)-Privacy, (a,b)-Distributing Privacy and protections against (c,t)-Isolation were analyzed. The author introduced a new solution of the controlled protection of privacy. The solution is based on marking a protected field and multi-key encryption of the sensitive value. The suggested way of fields marking is in accordance to the XML standard. For the encryption (n,p) different key cipher was selected. To decipher the content the p keys of n is used. The proposed solution enables to apply brand new methods for the control of privacy of disclosing sensitive data.


Cyber Crime ◽  
2013 ◽  
pp. 310-324
Author(s):  
Mahmoud Barhamgi ◽  
Djamal Benslimane ◽  
Chirine Ghedira ◽  
Brahim Medjahed

Recent years have witnessed a growing interest in using Web services as a reliable means for medical data sharing inside and across healthcare organizations. In such service-based data sharing environments, Web service composition emerged as a viable approach to query data scattered across independent locations. Patient data privacy preservation is an important aspect that must be considered when composing medical Web services. In this paper, the authors show how data privacy can be preserved when composing and executing Web services. Privacy constraints are expressed in the form of RDF queries over a mediated ontology. Query rewriting algorithms are defined to process those queries while preserving users’ privacy.


Author(s):  
Ramani Selvanambi ◽  
Samarth Bhutani ◽  
Komal Veauli

In yesteryears, the healthcare data related to each patient was limited. It was stored and controlled by the hospital authorities and was seldom regulated. With the increase in awareness and technology, the amount of medical data per person has increased exponentially. All this data is essential for the correct diagnosis of the patient. The patients also want access to their data to seek medical advice from different doctors. This raises several challenges like security, privacy, data regulation, etc. As health-related data are privacy-sensitive, the increase in data stored increases the risk of data exposure. Data availability and privacy are essential in healthcare. The availability of correct information is critical for the treatment of the patient. Information not easily accessed by the patients also complicates seeking medical advice from different hospitals. However, if data is easily accessible to everyone, it makes privacy and security difficult. Blockchains to store and secure data will not only ensure data privacy but will also provide a common method of data regulation.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Zhe Ding ◽  
Zhen Qin ◽  
Zhiguang Qin

Data mining techniques are applied to identify hidden patterns in large amounts of patient data. These patterns can assist physicians in making more accurate diagnosis. For different physical conditions of patients, the same physiological index corresponds to a different symptom association probability for each patient. Data mining technologies based on certain data cannot be directly applied to these patients’ data. Patient data are sensitive data. An adversary with sufficient background information can make use of the patterns mined from uncertain medical data to obtain the sensitive information of patients. In this paper, a new algorithm is presented to determine the top K most frequent itemsets from uncertain medical data and to protect data privacy. Based on traditional algorithms for mining frequent itemsets from uncertain data, our algorithm applies sparse vector algorithm and the Laplace mechanism to ensure differential privacy for the top K most frequent itemsets for uncertain medical data and the expected supports of these frequent itemsets. We prove that our algorithm can guarantee differential privacy in theory. Moreover, we carry out experiments with four real-world scenario datasets and two synthetic datasets. The experimental results demonstrate the performance of our algorithm.


Sign in / Sign up

Export Citation Format

Share Document