Privacy Preserving: Stochastic Channel-Based Federated Learning with Neural Network Pruning (Preprint)

2019 ◽  
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Hui Liu ◽  
Dianbo Liu

BACKGROUND Artificial neural network has achieved unprecedented success in a wide variety of domains such as classifying, predicting and recognizing objects. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns and people want to take control over their sensitive information during both training and using processes. OBJECTIVE To address this problem, we propose a privacy-preserving method for the distributed system. The proposed method, Stochastic Channel-Based Federated Learning (SCBF), enables the participants to train a high-performance model cooperatively without sharing their inputs. METHODS Specifically, we design, implement and evaluate a channel-based update algorithm for the central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop and upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, is applied to the algorithm based on the validation set. RESULTS We construct a distributed system consisting of 5 clients and 1 server. Our trials show that the Stochastic Channel-Based Federated Learning method can achieve an AUCROC of 0.9776 and an AUCPR of 0.9695 with 10% channels shared with the server. Compared with Federated Averaging algorithm, the proposed method achieves 0.05388 higher in AUCROC and 0.09695 higher in AUCPR. In addition, our experiment shows that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUCROC performance and a reduction of 0.0068 in AUCPR. CONCLUSIONS In the experiment, our model presents better performances and higher saturating speed than the Federated Averaging method, which reveals all the parameters of local models to the server. We also demonstrate that the saturating rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.

2019 ◽  
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Ziwei Chen ◽  
Hui Liu ◽  
Dianbo Liu

BACKGROUND Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. OBJECTIVE To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. METHODS We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. RESULTS We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. CONCLUSIONS In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.


10.2196/17265 ◽  
2020 ◽  
Vol 4 (12) ◽  
pp. e17265
Author(s):  
Rulin Shao ◽  
Hongyu He ◽  
Ziwei Chen ◽  
Hui Liu ◽  
Dianbo Liu

Background Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. Objective To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. Methods We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. Results We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. Conclusions In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.


Author(s):  
Neelu khare ◽  
Kumaran U.

The tremendous growth of social networking systems enables the active participation of a wide variety of users. This has led to an increased probability of security and privacy concerns. In order to solve the issue, the article defines a secure and privacy-preserving approach to protect user data across Cloud-based online social networks. The proposed approach models social networks as a directed graph, such that a user can share sensitive information with other users only if there exists a directed edge from one user to another. The connectivity between data users data is efficiently shared using an attribute-based encryption (ABE) with different data access levels. The proposed ABE technique makes use of a trapdoor function to re-encrypt the data without the use of proxy re-encryption techniques. Experimental evaluation states that the proposed approach provides comparatively better results than the existing techniques.


2012 ◽  
Vol 21 (01) ◽  
pp. 1250009 ◽  
Author(s):  
YOUWEN ZHU ◽  
LIUSHENG HUANG ◽  
TSUYOSHI TAKAGI ◽  
MINGWU ZHANG

Recently, growing privacy concerns have received more and more attention and it becomes a significant topic on how to preserve private-sensitive information from being violated in distributed cooperative computation. In this paper, we first propose a novel-general privacy-preserving online analytical processing model based on secure multiparty computation. Then, based on the new model, two schemes to privacy-preserving count aggregate query over both horizontally partitioned data and vertically partitioned data are proposed. Additionally, we also propose several efficient subprotocols that serve as the basic secure buildings. Furthermore, we analyze correctness, security, communication cost, and computation complexity of our proposed protocols, and show that the new schemes are secure, having good linear complexity and that the query results are exactly accurate.


Author(s):  
Priya Ranjan ◽  
Raj Kumar Paul

With the increase of digital data on servers different approach of data mining is applied for the retrieval of interesting information in decision making. A major social concern of data mining is the issue of privacy and data security. So privacy preserving mining come in existence, as it validates those data mining algorithms that do not disclose sensitive information. This work provides privacy for sensitive rules that discriminate data on the basis of community, gender, country, etc. Rules are obtained by aprior algorithm of association rule mining. Those rules which contain sensitive item set with minimum threshold value are considered as sensitive. Perturbation technique is used for the hiding of sensitive rules. The age of large database is now a big issue. So researchers try to develop a high performance platform to efficiently secure these kind of data before publishing. Here proposed work has resolve this issue of digital data security by finding the relation between the columns of the dataset which is based on the highly relative association patterns. Here use of super modularity is also done which balance the risk and utilization of the data. Experiment is done on large dataset which have all kind of attribute for implementing proposed work features. The experiments showed that the proposed algorithms perform well on large databases. It work better as the Maximum lost pattern percentage is zero a certain value of support.


Energies ◽  
2019 ◽  
Vol 12 (7) ◽  
pp. 1237
Author(s):  
Jong-Hyuk Im ◽  
Hee-Yong Kwon ◽  
Seong-Yun Jeon ◽  
Mun-Kyu Lee

The development of smart meters that can frequently measure and report power consumption has enabledelectricity providers to offer various time-varying rates, including time-of-use and real-time pricing plans. High-resolution power consumption data, however, raise serious privacy concerns because sensitive information regarding an individual’s lifestyle can be revealed by analyzing these data. Although extensive research has been conducted to address these privacy concerns, previous approaches have reduced the quality of measured data. In this paper, we propose a new privacy-preserving electricity billing method that does not sacrifice data quality for privacy. The proposed method is based on the novel use of functional encryption. Experimental results on a prototype system using a real-world smart meter device and data prove the feasibility of the proposed method.


2018 ◽  
Author(s):  
Maria Fernandes ◽  
Jérémie Decouchant ◽  
Marcus Völp ◽  
Francisco M Couto ◽  
Paulo Esteves-Veríssimo

AbstractThe advent of high throughput next-generation sequencing (NGS) machines made DNA sequencing cheaper, but also put pressure on the genomic life-cycle, which includes aligning millions of short DNA sequences, called reads, to a reference genome. On the performance side, efficient algorithms have been developed, and parallelized on public clouds. On the privacy side, since genomic data are utterly sensitive, several cryptographic mechanisms have been proposed to align reads securely, with a lower performance than the former, which in turn are not secure. This manuscript proposes a novel contribution to improving the privacy performance product in current genomic studies. Building on recent works that argue that genomics data needs to be × treated according to a threat-risk analysis, we introduce a multi-level sensitivity classification of genomic variations. Our classification prevents the amplification of possible privacy attacks, thanks to promoting and partitioning mechanisms among sensitivity levels. Thanks to this classification, reads can be aligned, stored, and later accessed, using different security levels. We then extend a recent filter, which detects the reads that carry sensitive information, to classify reads into sensitivity levels. Finally, based on a review of the existing alignment methods, we show that adapting alignment algorithms to reads sensitivity allows high performance gains, whilst enforcing high privacy levels. Our results indicate that using sensitivity levels is feasible to optimize the performance of privacy preserving alignment, if one combines the advantages of private and public clouds.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Arjan Sammani ◽  
Ayoub Bagheri ◽  
Peter G. M. van der Heijden ◽  
Anneline S. J. M. te Riele ◽  
Annette F. Baas ◽  
...  

AbstractStandard reference terminology of diagnoses and risk factors is crucial for billing, epidemiological studies, and inter/intranational comparisons of diseases. The International Classification of Disease (ICD) is a standardized and widely used method, but the manual classification is an enormously time-consuming endeavor. Natural language processing together with machine learning allows automated structuring of diagnoses using ICD-10 codes, but the limited performance of machine learning models, the necessity of gigantic datasets, and poor reliability of terminal parts of these codes restricted clinical usability. We aimed to create a high performing pipeline for automated classification of reliable ICD-10 codes in the free medical text in cardiology. We focussed on frequently used and well-defined three- and four-digit ICD-10 codes that still have enough granularity to be clinically relevant such as atrial fibrillation (I48), acute myocardial infarction (I21), or dilated cardiomyopathy (I42.0). Our pipeline uses a deep neural network known as a Bidirectional Gated Recurrent Unit Neural Network and was trained and tested with 5548 discharge letters and validated in 5089 discharge and procedural letters. As in clinical practice discharge letters may be labeled with more than one code, we assessed the single- and multilabel performance of main diagnoses and cardiovascular risk factors. We investigated using both the entire body of text and only the summary paragraph, supplemented by age and sex. Given the privacy-sensitive information included in discharge letters, we added a de-identification step. The performance was high, with F1 scores of 0.76–0.99 for three-character and 0.87–0.98 for four-character ICD-10 codes, and was best when using complete discharge letters. Adding variables age/sex did not affect results. For model interpretability, word coefficients were provided and qualitative assessment of classification was manually performed. Because of its high performance, this pipeline can be useful to decrease the administrative burden of classifying discharge diagnoses and may serve as a scaffold for reimbursement and research applications.


2021 ◽  
Vol 2 (1) ◽  
pp. 40-52
Author(s):  
Chukwudi Paul Obite ◽  
Ugochinyere Ihuoma Nwosu ◽  
Desmond Chekwube Bartholomew

This study modeled the US Dollar and Nigerian Naira exchange rates during COVID-19 pandemic period using a classical statistical method – Autoregressive Integrated Moving Average (ARIMA) – and two machine learning methods – Artificial Neural Network (ANN) and Random Forest (RF). The data were divided into two sets namely: the training set and the test set. The training set was used to obtain the parameters of the model, and the performance of the estimated model was validated on the test set that served as new data. Though the ARIMA and random forest performed slightly better than the neural network in the training set, their performance in the test set was poor. The neural network with 5 nodes in the input layer, 5 nodes in the hidden layer and 1 node in the output layer (ANN (5,5,1)) performed better on the new data set (test set) and is chosen as the best model to forecast for future USD to NGN exchange rate. The information from the high-performance model (ANN (5, 5, 1)) for modeling the USD to NGN exchange rate will assist econometric trading of the currencies and offer both speculative and precautionary assistance to individuals, households, firms and nations who use the currencies locally and for international trade.


2021 ◽  
Vol 7 (29) ◽  
pp. eabh0648
Author(s):  
Xing Mou ◽  
Jianshi Tang ◽  
Yingjie Lyu ◽  
Qingtian Zhang ◽  
Siyao Yang ◽  
...  

Inspired by the human brain, nonvolatile memories (NVMs)–based neuromorphic computing emerges as a promising paradigm to build power-efficient computing hardware for artificial intelligence. However, existing NVMs still suffer from physically imperfect device characteristics. In this work, a topotactic phase transition random-access memory (TPT-RAM) with a unique diffusive nonvolatile dual mode based on SrCoOx is demonstrated. The reversible phase transition of SrCoOx is well controlled by oxygen ion migrations along the highly ordered oxygen vacancy channels, enabling reproducible analog switching characteristics with reduced variability. Combining density functional theory and kinetic Monte Carlo simulations, the orientation-dependent switching mechanism of TPT-RAM is investigated synergistically. Furthermore, the dual-mode TPT-RAM is used to mimic the selective stabilization of developing synapses and implement neural network pruning, reducing ~84.2% of redundant synapses while improving the image classification accuracy to 99%. Our work points out a new direction to design bioplausible memristive synapses for neuromorphic computing.


Sign in / Sign up

Export Citation Format

Share Document