Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study

AbstractData privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.

Download Full-text

Privacy-Preserving Sorting Algorithms Based on Logistic Map for Clouds

Security and Communication Networks ◽

10.1155/2018/2373545 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10

Author(s):

Hua Dai ◽

Hui Ren ◽

Zhiye Chen ◽

Geng Yang ◽

Xun Yi

Keyword(s):

Data Privacy ◽

Logistic Map ◽

Security Analysis ◽

Privacy Preserving ◽

Service Recommendation ◽

Sensitive Data ◽

Encrypted Data ◽

Sorting Algorithms ◽

Common Operation ◽

Cloud Servers

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.

Download Full-text

Privacy-Preserving Outsourced Similarity Search

Journal of Database Management ◽

10.4018/jdm.2014070103 ◽

2014 ◽

Vol 25 (3) ◽

pp. 48-71 ◽

Cited By ~ 1

Author(s):

Stepan Kozak ◽

David Novak ◽

Pavel Zezula

Keyword(s):

Similarity Search ◽

Data Privacy ◽

System Analysis ◽

Keyword Search ◽

Evaluation Criteria ◽

Similarity Index ◽

Data Retrieval ◽

Privacy Preserving ◽

Scientific Data ◽

Sensitive Data

The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space and ensures data privacy and thus is suitable for search systems outsourced for example in a cloud environment. In comparison with other approaches, the index is fully dynamic (update operations are efficient) and its aim is to transfer as much load from clients to the server as possible.

Download Full-text

A Privacy Preserving Cloud-Based K-NN Search Scheme with Lightweight User Loads

Computers ◽

10.3390/computers9010001 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Yeong-Cherng Hsu ◽

Chih-Hsin Hsueh ◽

Ja-Ling Wu

Keyword(s):

Data Privacy ◽

Nearest Neighbor ◽

Search Algorithm ◽

Data Access ◽

Privacy Preserving ◽

Secret Key ◽

K Nearest Neighbor ◽

Sensitive Data ◽

Cloud Data ◽

Cloud Server

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.

Download Full-text

Homomorphic Consortium Blockchain for Smart Home System Sensitive Data Privacy Preserving

IEEE Access ◽

10.1109/access.2019.2916345 ◽

2019 ◽

Vol 7 ◽

pp. 62058-62070 ◽

Cited By ~ 10

Author(s):

Wei She ◽

Zhi-Hao Gu ◽

Xu-Kang Lyu ◽

Qi Liu ◽

Zhao Tian ◽

...

Keyword(s):

Smart Home ◽

Data Privacy ◽

Privacy Preserving ◽

Sensitive Data

Download Full-text

Trust Hardware Based Secured Privacy Preserving Computation System for Three-Dimensional Data

Electronics ◽

10.3390/electronics10131546 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1546

Author(s):

Munan Yuan ◽

Xiaofeng Li ◽

Xiru Li ◽

Haibo Tan ◽

Jinlin Xu

Keyword(s):

Data Privacy ◽

High Performance ◽

Distributed Storage ◽

Three Dimensional ◽

Privacy Preserving ◽

Sensitive Data ◽

Privacy And Security ◽

Time Consumption ◽

Blockchain Technology ◽

3D Data

Three-dimensional (3D) data are easily collected in an unconscious way and are sensitive to lead biological characteristics exposure. Privacy and ownership have become important disputed issues for the 3D data application field. In this paper, we design a privacy-preserving computation system (SPPCS) for sensitive data protection, based on distributed storage, trusted execution environment (TEE) and blockchain technology. The SPPCS separates a storage and analysis calculation from consensus to build a hierarchical computation architecture. Based on a similarity computation of graph structures, the SPPCS finds data requirement matching lists to avoid invalid transactions. With TEE technology, the SPPCS implements a dual hybrid isolation model to restrict access to raw data and obscure the connections among transaction parties. To validate confidential performance, we implement a prototype of SPPCS with Ethereum and Intel Software Guard Extensions (SGX). The evaluation results derived from test datasets show that (1) the enhanced security and increased time consumption (490 ms in this paper) of multiple SGX nodes need to be balanced; (2) for a single SGX node to enhance data security and preserve privacy, an increased time consumption of about 260 ms is acceptable; (3) the transaction relationship cannot be inferred from records on-chain. The proposed SPPCS implements data privacy and security protection with high performance.

Download Full-text

Practical Implementation of Privacy Preserving Clustering Methods Using a Partially Homomorphic Encryption Algorithm

Electronics ◽

10.3390/electronics9020229 ◽

2020 ◽

Vol 9 (2) ◽

pp. 229

Author(s):

Ferhat Ozgur Catak ◽

Ismail Aydin ◽

Ogerta Elezaj ◽

Sule Yildirim-Yayilgan

Keyword(s):

Data Privacy ◽

Homomorphic Encryption ◽

Clustering Algorithms ◽

Privacy Preserving ◽

Key Exchange ◽

Cloud System ◽

Clustering Methods ◽

Sensitive Data ◽

Processing Power ◽

High Processing

The protection and processing of sensitive data in big data systems are common problems as the increase in data size increases the need for high processing power. Protection of the sensitive data on a system that contains multiple connections with different privacy policies, also brings the need to use proper cryptographic key exchange methods for each party, as extra work. Homomorphic encryption methods can perform similar arithmetic operations on encrypted data in the same way as a plain format of the data. Thus, these methods provide data privacy, as data are processed in the encrypted domain, without the need for a plain form and this allows outsourcing of the computations to cloud systems. This also brings simplicity on key exchange sessions for all sides. In this paper, we propose novel privacy preserving clustering methods, alongside homomorphic encryption schemes that can run on a common high performance computation platform, such as a cloud system. As a result, the parties of this system will not need to possess high processing power because the most power demanding tasks would be done on any cloud system provider. Our system offers a privacy preserving distance matrix calculation for several clustering algorithms. Considering both encrypted and plain forms of the same data for different key and data lengths, our privacy preserving training method’s performance results are obtained for four different data clustering algorithms, while considering six different evaluation metrics.

Download Full-text

Energy-efficient and privacy-preserving spatial range aggregation query processing in wireless sensor networks

International Journal of Distributed Sensor Networks ◽

10.1177/1550147719861005 ◽

2019 ◽

Vol 15 (7) ◽

pp. 155014771986100 ◽

Cited By ~ 1

Author(s):

Liang Liu ◽

Zhenhai Hu ◽

Lisong Wang

Keyword(s):

Sensor Networks ◽

Query Processing ◽

Energy Efficient ◽

Privacy Protection ◽

Data Privacy ◽

Privacy Preserving ◽

Sensitive Data ◽

Sensory Data ◽

Spatial Range ◽

Query Algorithm

The existing privacy-preserving aggregation query processing methods in sensor networks rely on pre-established network topology and require all nodes in the network to participate in query processing. Maintaining the topology results in a large amount of energy overhead, and in many cases, the user is interested only in the aggregated query results of some areas in the network, and thus, the participation of the entire network node is not necessary. Aiming to solve this problem, this article proposes a spatial range aggregation query algorithm for a dynamic sensor network with privacy protection (energy-efficient privacy-preserving data aggregation). The algorithm does not rely on the pre-established topology but considers only the query area that the user is interested in, abandoning all nodes to participate in distributing the query messages while gathering the sensory data in the query range. To protect node data privacy, Shamir’s secret sharing technology is used to prevent internal attackers from stealing the sensitive data of the surrounding nodes. The analysis and experimental results show that the proposed algorithm outperforms the existing algorithms in terms of energy and privacy protection.

Download Full-text

Privacy and Trust Redefined in Federated Machine Learning

Machine Learning and Knowledge Extraction ◽

10.3390/make3020017 ◽

2021 ◽

Vol 3 (2) ◽

pp. 333-356

Author(s):

Pavlos Papadopoulos ◽

Will Abramson ◽

Adam J. Hall ◽

Nikolaos Pitropakis ◽

William J. Buchanan

Keyword(s):

Mental Health ◽

Machine Learning ◽

Data Privacy ◽

Communication Channels ◽

Privacy Preserving ◽

Health Data ◽

Proof Of Concept ◽

Sensitive Data ◽

Privacy Issue ◽

Highly Sensitive

A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive data such as healthcare records, accessing this information is challenging and often prohibited. Luckily, privacy-preserving technologies have been developed to overcome this hurdle by distributing the computation of the training and ensuring the data privacy to their owners. The distribution of the computation to multiple participating entities introduces new privacy complications and risks. In this paper, we present a privacy-preserving decentralised workflow that facilitates trusted federated learning among participants. Our proof-of-concept defines a trust framework instantiated using decentralised identity technologies being developed under Hyperledger projects Aries/Indy/Ursa. Only entities in possession of Verifiable Credentials issued from the appropriate authorities are able to establish secure, authenticated communication channels authorised to participate in a federated learning workflow related to mental health data.

Download Full-text

Privacy Preserving Mining in Code Profiling Data

International Journal of Engineering and Management Research ◽

10.31033/ijemr.8.5.5 ◽

2018 ◽

Vol 8 (5) ◽

Author(s):

Meenakshi Kathayat

Keyword(s):

Data Mining ◽

Software Engineering ◽

Future Development ◽

Data Privacy ◽

Software Metrics ◽

Privacy Preserving ◽

Sensitive Data ◽

Privacy Preserving Data Mining ◽

Encrypted Data

Privacy preserving data mining is an important issue nowadays for data mining. Since various organizations and people are generating sensitive data or information these days. They don’t want to share their sensitive data however that data can be useful for data mining purpose. So, due to privacy preserving mining that data can be mined usefully without harming the privacy of that data. Privacy can be preserved by applying encryption on database which is to be mined because now the data is secure due to encryption. Code profiling is a field in software engineering where we can apply data mining to discover some knowledge so that it will be useful in future development of software. In this work we have applied privacy preserving mining in code profiling data such as software metrics of various codes. Results of data mining on actual and encrypted data are compared for accuracy. We have also analyzed the results of privacy preserving mining in code profiling data and found interesting results.

Download Full-text

Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i6.pp4524-4532 ◽

2018 ◽

Vol 8 (6) ◽

pp. 4524

Author(s):

D. Radhika ◽

D. Aruna Kumari

Keyword(s):

Data Mining ◽

Big Data ◽

Data Privacy ◽

Hybrid Approach ◽

Privacy Preserving ◽

Data Publishing ◽

Distributed Data Mining ◽

Distributed Data ◽

Public Cloud ◽

Sensitive Data

Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming.

Download Full-text