Data privacy-preserving distributed knowledge discovery based on the blockchain

AbstractData privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.

Download Full-text

Privacy-Preserving Sorting Algorithms Based on Logistic Map for Clouds

Security and Communication Networks ◽

10.1155/2018/2373545 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10

Author(s):

Hua Dai ◽

Hui Ren ◽

Zhiye Chen ◽

Geng Yang ◽

Xun Yi

Keyword(s):

Data Privacy ◽

Logistic Map ◽

Security Analysis ◽

Privacy Preserving ◽

Service Recommendation ◽

Sensitive Data ◽

Encrypted Data ◽

Sorting Algorithms ◽

Common Operation ◽

Cloud Servers

Outsourcing data in clouds is adopted by more and more companies and individuals due to the profits from data sharing and parallel, elastic, and on-demand computing. However, it forces data owners to lose control of their own data, which causes privacy-preserving problems on sensitive data. Sorting is a common operation in many areas, such as machine learning, service recommendation, and data query. It is a challenge to implement privacy-preserving sorting over encrypted data without leaking privacy of sensitive data. In this paper, we propose privacy-preserving sorting algorithms which are on the basis of the logistic map. Secure comparable codes are constructed by logistic map functions, which can be utilized to compare the corresponding encrypted data items even without knowing their plaintext values. Data owners firstly encrypt their data and generate the corresponding comparable codes and then outsource them to clouds. Cloud servers are capable of sorting the outsourced encrypted data in accordance with their corresponding comparable codes by the proposed privacy-preserving sorting algorithms. Security analysis and experimental results show that the proposed algorithms can protect data privacy, while providing efficient sorting on encrypted data.

Download Full-text

Designing Services for Distributed Knowledge Discovery

Service-Oriented Distributed Knowledge Discovery - Chapman & Hall/CRC Data Mining and Knowledge Discovery Series ◽

10.1201/b12990-4 ◽

2012 ◽

Cited By ~ 1

Keyword(s):

Knowledge Discovery ◽

Distributed Knowledge

Download Full-text

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch143 ◽

2008 ◽

pp. 2379-2401 ◽

Cited By ~ 1

Author(s):

Igor Nai Fovino

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Privacy Preserving ◽

Research Field ◽

Sensitive Information ◽

Discovery Process ◽

Privacy Preserving Data Mining ◽

Mining Technology ◽

Evaluation Methodologies ◽

New Research

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.

Download Full-text

A Generic Privacy Breach Preventing Methodology for Cloud Based Web Service

Standards and Standardization ◽

10.4018/978-1-4666-8111-8.ch021 ◽

2015 ◽

pp. 426-458 ◽

Cited By ~ 1

Author(s):

S. R. Murugaiyan ◽

D. Chandramohan ◽

T. Vengattaraman ◽

P. Dhavachelvan

Keyword(s):

Private Information ◽

Data Privacy ◽

Privacy Preserving ◽

Digital Data ◽

Cloud Data ◽

Privacy Breach ◽

Cloud Data Storage ◽

User Data ◽

Information Technology Development ◽

User Data Privacy

The present focuses on the Cloud storage services are having a critical issue in handling the user's private information and its confidentiality. The User data privacy preserving is a vital facet of online storage in cloud computing. The information in cloud data storage is underneath, staid molests of baffling addict endeavor, and it may leads to user clandestine in a roar privacy breach. Moreover, privacy preservation is an indeed research pasture in contemporary information technology development. Preserving User Data in Cloud Service (PUDCS) happens due to the data privacy breach results to a rhythmic way of intruding high confidential digital storage area and barter those information into business by embezzle others information. This paper focuses on preventing (hush-hush) digital data using the proposed privacy preserving framework. It also describes the prevention of stored data and de-identifying unauthorized user attempts, log monitoring and maintaining it in the cloud for promoting allusion to providers and users.

Download Full-text

Privacy-Preserving Outsourced Similarity Search

Journal of Database Management ◽

10.4018/jdm.2014070103 ◽

2014 ◽

Vol 25 (3) ◽

pp. 48-71 ◽

Cited By ~ 1

Author(s):

Stepan Kozak ◽

David Novak ◽

Pavel Zezula

Keyword(s):

Similarity Search ◽

Data Privacy ◽

System Analysis ◽

Keyword Search ◽

Evaluation Criteria ◽

Similarity Index ◽

Data Retrieval ◽

Privacy Preserving ◽

Scientific Data ◽

Sensitive Data

The general trend in data management is to outsource data to 3rd party systems that would provide data retrieval as a service. This approach naturally brings privacy concerns about the (potentially sensitive) data. Recently, quite extensive research has been done on privacy-preserving outsourcing of traditional exact-match and keyword search. However, not much attention has been paid to outsourcing of similarity search, which is essential in content-based retrieval in current multimedia, sensor or scientific data. In this paper, the authors propose a scheme of outsourcing similarity search. They define evaluation criteria for these systems with an emphasis on usability, privacy and efficiency in real applications. These criteria can be used as a general guideline for a practical system analysis and we use them to survey and mutually compare existing approaches. As the main result, the authors propose a novel dynamic similarity index EM-Index that works for an arbitrary metric space and ensures data privacy and thus is suitable for search systems outsourced for example in a cloud environment. In comparison with other approaches, the index is fully dynamic (update operations are efficient) and its aim is to transfer as much load from clients to the server as possible.

Download Full-text

Using Grids for Distributed Knowledge Discovery

Mathematical Methods for Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-528-3.ch017 ◽

2011 ◽

pp. 284-298 ◽

Cited By ~ 3

Author(s):

Antonio Congiusta ◽

Domenico Talia ◽

Paolo Trunfio

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

High Performance ◽

Data Transfer ◽

Grid Services ◽

Distributed Knowledge ◽

Data Intensive ◽

Knowledge Grid ◽

Complex Knowledge ◽

High Level

Knowledge discovery is a compute and data intensive process that allows for finding patterns, trends, and models in large datasets. The Grid can be effectively exploited for deploying knowledge discovery applications because of the high-performance it can offer and its distributed infrastructure. For effective use of Grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation. To such purpose, we designed the Knowledge Grid, a high-level environment providing for Grid-based knowledge discovery tools and services. Such services allow users to create and manage complex knowledge discovery applications, composed as workflows that integrate data sources and data mining tools provided as distributed Grid services. This chapter describes the Knowledge Grid architecture and describes how its components can be used to design and implement distributed knowledge discovery applications. Then, the chapter describes how the Knowledge Grid services can be made accessible using the Open Grid Services Architecture (OGSA) model.

Download Full-text

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch012 ◽

2008 ◽

pp. 277-301

Author(s):

Igor Nai Fovino

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Privacy Preserving ◽

Research Field ◽

Sensitive Information ◽

Discovery Process ◽

Privacy Preserving Data Mining ◽

Mining Technology ◽

Evaluation Methodologies ◽

New Research

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.

Download Full-text

Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

Information ◽

10.3390/info11030166 ◽

2020 ◽

Vol 11 (3) ◽

pp. 166

Author(s):

Yuelei Xiao ◽

Haiqi Li

Keyword(s):

Data Privacy ◽

Privacy Preserving ◽

Information Loss ◽

Experimental Results ◽

Data Publishing ◽

Security Level ◽

Sensitive Attribute ◽

Data Volume ◽

Security Levels ◽

Privacy Preserving Data Publishing

Privacy preserving data publishing has received considerable attention for publishing useful information while preserving data privacy. The existing privacy preserving data publishing methods for multiple sensitive attributes do not consider the situation that different values of a sensitive attribute may have different sensitivity requirements. To solve this problem, we defined three security levels for different sensitive attribute values that have different sensitivity requirements, and given an L s l -diversity model for multiple sensitive attributes. Following this, we proposed three specific greed algorithms based on the maximal-bucket first (MBF), maximal single-dimension-capacity first (MSDCF) and maximal multi-dimension-capacity first (MMDCF) algorithms and the maximal security-level first (MSLF) greed policy, named as MBF based on MSLF (MBF-MSLF), MSDCF based on MSLF (MSDCF-MSLF) and MMDCF based on MSLF (MMDCF-MSLF), to implement the L s l -diversity model for multiple sensitive attributes. The experimental results show that the three algorithms can greatly reduce the information loss of the published microdata, but their runtime is only a small increase, and their information loss tends to be stable with the increasing of data volume. And they can solve the problem that the information loss of MBF, MSDCF and MMDCF increases greatly with the increasing of sensitive attribute number.

Download Full-text

Data privacy-preserving distributed knowledge discovery based on the blockchain

Privacy preserving distributed knowledge discovery: survey and future directions

Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study

Privacy-Preserving Sorting Algorithms Based on Logistic Map for Clouds

Designing Services for Distributed Knowledge Discovery

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

A Generic Privacy Breach Preventing Methodology for Cloud Based Web Service

Privacy-Preserving Outsourced Similarity Search

Using Grids for Distributed Knowledge Discovery

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Privacy Preserving Data Publishing for Multiple Sensitive Attributes Based on Security Level

Export Citation Format