scholarly journals Privacy and Trust Redefined in Federated Machine Learning

2021 ◽  
Vol 3 (2) ◽  
pp. 333-356
Author(s):  
Pavlos Papadopoulos ◽  
Will Abramson ◽  
Adam J. Hall ◽  
Nikolaos Pitropakis ◽  
William J. Buchanan

A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive data such as healthcare records, accessing this information is challenging and often prohibited. Luckily, privacy-preserving technologies have been developed to overcome this hurdle by distributing the computation of the training and ensuring the data privacy to their owners. The distribution of the computation to multiple participating entities introduces new privacy complications and risks. In this paper, we present a privacy-preserving decentralised workflow that facilitates trusted federated learning among participants. Our proof-of-concept defines a trust framework instantiated using decentralised identity technologies being developed under Hyperledger projects Aries/Indy/Ursa. Only entities in possession of Verifiable Credentials issued from the appropriate authorities are able to establish secure, authenticated communication channels authorised to participate in a federated learning workflow related to mental health data.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Qi Dou ◽  
Tiffany Y. So ◽  
Meirui Jiang ◽  
Quande Liu ◽  
Varut Vardhanabhuti ◽  
...  

AbstractData privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.


Author(s):  
Dhamanpreet Kaur ◽  
Matthew Sobiesk ◽  
Shubham Patil ◽  
Jin Liu ◽  
Puran Bhagat ◽  
...  

Abstract Objective This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. Materials and Methods We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. Results Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. Discussion Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. Conclusion We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.


2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Hua Dai ◽  
Hui Ren ◽  
Zhiye Chen ◽  
Geng Yang ◽  
Xun Yi

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.


2014 ◽  
Vol 25 (3) ◽  
pp. 48-71 ◽  
Author(s):  
Stepan Kozak ◽  
David Novak ◽  
Pavel Zezula

The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space and ensures data privacy and thus is suitable for search systems outsourced for example in a cloud environment. In comparison with other approaches, the index is fully dynamic (update operations are efficient) and its aim is to transfer as much load from clients to the server as possible.


Author(s):  
Divya Asok ◽  
Chitra P. ◽  
Bharathiraja Muthurajan

In the past years, the usage of internet and quantity of digital data generated by large organizations, firms, and governments have paved the way for the researchers to focus on security issues of private data. This collected data is usually related to a definite necessity. For example, in the medical field, health record systems are used for the exchange of medical data. In addition to services based on users' current location, many potential services rely on users' location history or their spatial-temporal provenance. However, most of the collected data contain data identifying individual which is sensitive. With the increase of machine learning applications around every corner of the society, it could significantly contribute to the preservation of privacy of both individuals and institutions. This chapter gives a wider perspective on the current literature on privacy ML and deep learning techniques, along with the non-cryptographic differential privacy approach for ensuring sensitive data privacy.


2019 ◽  
Vol 104 (6) ◽  
pp. e35.3-e36
Author(s):  
C King ◽  
L Bracken ◽  
E McDonough ◽  
M Pirmohamed ◽  
M Peak ◽  
...  

BackgroundThere are multiple pharmacogenomic studies in children’s asthma. It has not been established how (or if) children, young people or their parents/legal guardians would accept use of their genetic information to guide their treatment.AimTo determine the views of CYP, and parents/legal guardians, on aspects of using genetic testing to guide management of childhood asthma.MethodsFocus group session with both the Liverpool’s young people advisory group (YPAG), and Parents’ group, at Alder Hey Children’s Hospital. Group members completed anonymous questionnaires determining the importance and privacy associated with different themes of data, with a special focus on health data.ResultsThere were 11 responders, five parents/guardians and six CYP. Both the parents and the CYP considered personal data, such as date of birth, NI number and name, both the most important and the most private. Health data was considered the second most important, and private, although parents rated data from social media data an equal second in terms of privacy. Within healthcare data, CYP considered data regarding their mental health, followed by medical conditions and genomic data, as the sources to be of highest importance. Parents considered their child’s illnesses most important, followed by genomic data. In relation to privacy, CYP considered genomic data first followed by information concerning their mental health. The parents considered genomic data highest for data privacy.ConclusionFrom this session it is clear that health data in general, and genetic data in particular, has a high value of importance to CYP and parents, but there are variations in how data is prioritised. These pilot data will inform a large scale patient and parent acceptability study in personalised medicine and childhood asthma (CHANGE study).Disclosure(s)Nothing to disclose


Computers ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 1 ◽  
Author(s):  
Yeong-Cherng Hsu ◽  
Chih-Hsin Hsueh ◽  
Ja-Ling Wu

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 62058-62070 ◽  
Author(s):  
Wei She ◽  
Zhi-Hao Gu ◽  
Xu-Kang Lyu ◽  
Qi Liu ◽  
Zhao Tian ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Siliang Dong ◽  
Zhixin Zeng ◽  
Yining Liu

Electricity theft occurs from time to time in the smart grid, which can cause great losses to the power supplier, so it is necessary to prevent the occurrence of electricity theft. Using machine learning as an electricity theft detection tool can quickly lock participants suspected of electricity theft; however, directly publishing user data to the detector for machine learning-based detection may expose user privacy. In this paper, we propose a real-time fault-tolerant and privacy-preserving electricity theft detection (FPETD) scheme that combines n -source anonymity and a convolutional neural network (CNN). In our scheme, we designed a fault-tolerant raw data collection protocol to collect electricity data and cut off the correspondence between users and their data, thereby ensuring the fault tolerance and data privacy during the electricity theft detection process. Experiments have proven that our dimensionality reduction method makes our model have an accuracy rate of 92.86% for detecting electricity theft, which is much better than others.


2020 ◽  
Vol 8 (6) ◽  
pp. 1945-1949

Digital era generates a huge amount of data in many sectors like education, medical, banking, business, marketing, etc. which can be used for research motive, analysis, prediction of trends, statistics, etc. Data mining techniques are useful in finding patterns, trends, and knowledge from such huge data. The data holders are not ready to share data because there are chances of privacy leakage. Sharing of such data immensely helps researchers to obtain knowledge from it, especially medical data. Privacy preserving data mining is one way where researchers will get mine data for gaining knowledge without breaching the privacy. In the medical sector there is a branch called the mental health section, where high confidentiality of data is maintained and is needed. Owners are not ready to share data for research motives. Mental health is nowadays a topic that is most frequently discussed when it comes to research. PPDM allows sharing data with the researcher, where the privacy of data is maintained by using perturbation techniques giving relief to doctors (owner of data). The current paper experiments and analyses different perturbation methods to preserve privacy in data mining


Sign in / Sign up

Export Citation Format

Share Document