Privacy-Preserving Methods for Feature Engineering Using Blockchain: Review, Evaluation, and Proof of Concept (Preprint)

Mapping Intimacies ◽

10.2196/preprints.13600 ◽

2019 ◽

Author(s):

Michael Jones ◽

Matthew Johnson ◽

Mark Shervey ◽

Joel T Dudley ◽

Noah Zimmerman

Keyword(s):

Data Collection ◽

Data Privacy ◽

Hybrid Approach ◽

Privacy Preserving ◽

Third Party ◽

Feature Engineering ◽

Smart Contracts ◽

Trusted Third Party ◽

Secure Hardware ◽

Cryptographic Techniques

BACKGROUND The protection of private data is a key responsibility for research studies that collect identifiable information from study participants. Limiting the scope of data collection and preventing secondary use of the data are effective strategies for managing these risks. An ideal framework for data collection would incorporate feature engineering, a process where secondary features are derived from sensitive raw data in a secure environment without a trusted third party. OBJECTIVE This study aimed to compare current approaches based on how they maintain data privacy and the practicality of their implementations. These approaches include traditional approaches that rely on trusted third parties, and cryptographic, secure hardware, and blockchain-based techniques. METHODS A set of properties were defined for evaluating each approach. A qualitative comparison was presented based on these properties. The evaluation of each approach was framed with a use case of sharing geolocation data for biomedical research. RESULTS We found that approaches that rely on a trusted third party for preserving participant privacy do not provide sufficiently strong guarantees that sensitive data will not be exposed in modern data ecosystems. Cryptographic techniques incorporate strong privacy-preserving paradigms but are appropriate only for select use cases or are currently limited because of computational complexity. Blockchain smart contracts alone are insufficient to provide data privacy because transactional data are public. Trusted execution environments (TEEs) may have hardware vulnerabilities and lack visibility into how data are processed. Hybrid approaches combining blockchain and cryptographic techniques or blockchain and TEEs provide promising frameworks for privacy preservation. For reference, we provide a software implementation where users can privately share features of their geolocation data using the hybrid approach combining blockchain with TEEs as a supplement. CONCLUSIONS Blockchain technology and smart contracts enable the development of new privacy-preserving feature engineering methods by obviating dependence on trusted parties and providing immutable, auditable data processing workflows. The overlap between blockchain and cryptographic techniques or blockchain and secure hardware technologies are promising fields for addressing important data privacy needs. Hybrid blockchain and TEE frameworks currently provide practical tools for implementing experimental privacy-preserving applications.

Download Full-text

Privacy-Preserving Methods for Feature Engineering Using Blockchain: Review, Evaluation, and Proof of Concept

Journal of Medical Internet Research ◽

10.2196/13600 ◽

2019 ◽

Vol 21 (8) ◽

pp. e13600 ◽

Cited By ~ 6

Author(s):

Michael Jones ◽

Matthew Johnson ◽

Mark Shervey ◽

Joel T Dudley ◽

Noah Zimmerman

Keyword(s):

Data Collection ◽

Data Privacy ◽

Hybrid Approach ◽

Privacy Preserving ◽

Third Party ◽

Feature Engineering ◽

Smart Contracts ◽

Trusted Third Party ◽

Secure Hardware ◽

Cryptographic Techniques

Background The protection of private data is a key responsibility for research studies that collect identifiable information from study participants. Limiting the scope of data collection and preventing secondary use of the data are effective strategies for managing these risks. An ideal framework for data collection would incorporate feature engineering, a process where secondary features are derived from sensitive raw data in a secure environment without a trusted third party. Objective This study aimed to compare current approaches based on how they maintain data privacy and the practicality of their implementations. These approaches include traditional approaches that rely on trusted third parties, and cryptographic, secure hardware, and blockchain-based techniques. Methods A set of properties were defined for evaluating each approach. A qualitative comparison was presented based on these properties. The evaluation of each approach was framed with a use case of sharing geolocation data for biomedical research. Results We found that approaches that rely on a trusted third party for preserving participant privacy do not provide sufficiently strong guarantees that sensitive data will not be exposed in modern data ecosystems. Cryptographic techniques incorporate strong privacy-preserving paradigms but are appropriate only for select use cases or are currently limited because of computational complexity. Blockchain smart contracts alone are insufficient to provide data privacy because transactional data are public. Trusted execution environments (TEEs) may have hardware vulnerabilities and lack visibility into how data are processed. Hybrid approaches combining blockchain and cryptographic techniques or blockchain and TEEs provide promising frameworks for privacy preservation. For reference, we provide a software implementation where users can privately share features of their geolocation data using the hybrid approach combining blockchain with TEEs as a supplement. Conclusions Blockchain technology and smart contracts enable the development of new privacy-preserving feature engineering methods by obviating dependence on trusted parties and providing immutable, auditable data processing workflows. The overlap between blockchain and cryptographic techniques or blockchain and secure hardware technologies are promising fields for addressing important data privacy needs. Hybrid blockchain and TEE frameworks currently provide practical tools for implementing experimental privacy-preserving applications.

Download Full-text

Privacy-Preserving K-Nearest Neighbors Training over Blockchain-Based Encrypted Health Data

Electronics ◽

10.3390/electronics9122096 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2096

Author(s):

Rakib Ul Haque ◽

A S M Touhidul Hasan ◽

Qingshan Jiang ◽

Qiang Qu

Keyword(s):

Data Privacy ◽

Building Blocks ◽

Privacy Preserving ◽

Training Data ◽

Third Party ◽

Supervised Machine Learning ◽

Primary Concern ◽

Privacy Issue ◽

Trusted Third Party ◽

Blockchain Technology

Numerous works focus on the data privacy issue of the Internet of Things (IoT) when training a supervised Machine Learning (ML) classifier. Most of the existing solutions assume that the classifier’s training data can be obtained securely from different IoT data providers. The primary concern is data privacy when training a K-Nearest Neighbour (K-NN) classifier with IoT data from various entities. This paper proposes secure K-NN, which provides a privacy-preserving K-NN training over IoT data. It employs Blockchain technology with a partial homomorphic cryptosystem (PHC) known as Paillier in order to protect all participants (i.e., IoT data analyst C and IoT data provider P) data privacy. When C analyzes the IoT data of P, both participants’ privacy issue arises and requires a trusted third party. To protect each candidate’s privacy and remove the dependency on a third-party, we assemble secure building blocks in secure K-NN based on Blockchain technology. Firstly, a protected data-sharing platform is developed among various P, where encrypted IoT data is registered on a shared ledger. Secondly, the secure polynomial operation (SPO), secure biasing operations (SBO), and secure comparison (SC) are designed using the homomorphic property of Paillier. It shows that secure K-NN does not need any trusted third-party at the time of interaction, and rigorous security analysis demonstrates that secure K-NN protects sensitive data privacy for each P and C. The secure K-NN achieved 97.84%, 82.33%, and 76.33% precisions on BCWD, HDD, and DD datasets. The performance of secure K-NN is precisely similar to the general K-NN and outperforms all the previous state of art methods.

Download Full-text

PrivCrowd: A Secure Blockchain-Based Crowdsourcing Framework with Fine-Grained Worker Selection

Wireless Communications and Mobile Computing ◽

10.1155/2021/3758782 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Qiliang Yang ◽

Tao Wang ◽

Wenbo Zhang ◽

Bo Yang ◽

Yong Yu ◽

...

Keyword(s):

Data Collection ◽

Data Privacy ◽

Single Point ◽

Sensitive Information ◽

Smart Contracts ◽

Encryption Scheme ◽

Functional Encryption ◽

Privacy And Security ◽

Fine Grained ◽

Worker Selection

Blockchain-based crowdsourcing systems can mitigate some known limitations of the centralized crowdsourcing platform, such as single point of failure and Sybil attacks. However, blockchain-based crowdsourcing systems still endure the issues of privacy and security. Participants’ sensitive information (e.g., identity, address, and expertise) have the risk of privacy disclosure. Sensitive crowdsourcing tasks such as location-based data collection and labeling images including faces also need privacy-preserving. Moreover, current work fails to balance the anonymity and public auditing of workers. In this paper, we present a secure blockchain-based crowdsourcing framework with fine-grained worker selection, named PrivCrowd which exploits a functional encryption scheme to protect the data privacy of tasks and to select workers by matching the attributes. In PrivCrowd, requesters and workers can achieve both exchange and evaluation fairness by calling smart contracts. Solutions collection also can be done in a secure, sound, and noninteractive way. Experiment results show the feasibility, usability, and efficiency of PrivCrowd.

Download Full-text

Privacy Preserving Naïve Bayes Classifier for Horizontally Distribution Scenario Using Un-trusted Third Party

IOSR Journal of Computer Engineering ◽

10.9790/0661-0760412 ◽

2012 ◽

Vol 7 (6) ◽

pp. 04-12 ◽

Cited By ~ 1

Author(s):

Alka Gangrade

Keyword(s):

Naive Bayes ◽

Privacy Preserving ◽

Naïve Bayes ◽

Third Party ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Trusted Third Party

Download Full-text

A Novel Privacy-Preserving Scheme in IoT-Based Social Distancing Technologies

10.5121/csit.2021.111820 ◽

2021 ◽

Author(s):

Arwa Alrawais ◽

Fatemah Alharbi ◽

Moteeb Almoteri ◽

Sara A Aljwair ◽

Sara SAljwair

Keyword(s):

Data Privacy ◽

Personal Information ◽

Fog Computing ◽

Mortality Rates ◽

Privacy Preserving ◽

Third Party ◽

Massive Data ◽

Social Distancing ◽

The World ◽

Efficiency And Effectiveness

The COVID-19 pandemic has swapped the world, causing enormous cases, which led to high mortality rates across the globe. Internet of Things (IoT) based social distancing techniques and many current and emerging technologies have contributed to the fight against the spread of pandemics and reduce the number of positive cases. These technologies generate massive data, which will pose a significant threat to data owners’ privacy by revealing their lifestyle and personal information since that data is stored and managed by a third party like a cloud. This paper provides a new privacy-preserving scheme based on anonymization using an improved slicing technique and implying distributed fog computing. Our implementation shows that the proposed approach ensures data privacy against a third party intending to violate it for any purpose. Furthermore, our results illustrate our scheme’s efficiency and effectiveness.

Download Full-text

Privacy Preserving Scalable Authentication Protocol with Partially Trusted Third Party for Distributed Internet-of-Things

10.5220/0010599500002998 ◽

2021 ◽

Author(s):

Hiral Trivedi ◽

Sankita Patel

Keyword(s):

Internet Of Things ◽

Privacy Preserving ◽

Authentication Protocol ◽

Third Party ◽

Trusted Third Party ◽

Scalable Authentication

Download Full-text

Privacy Preserving Classification of Biomedical Data With Secure Removing of Duplicate Records

Research Anthology on Privatizing and Securing Data ◽

10.4018/978-1-7998-8954-0.ch026 ◽

2021 ◽

pp. 569-588

Author(s):

Boudheb Tarik ◽

Elberrichi Zakaria

Keyword(s):

Data Mining ◽

Data Privacy ◽

Privacy Preserving ◽

Third Party ◽

Distributed Data ◽

Biomedical Data ◽

Collaborative Models ◽

Highly Sensitive ◽

Complete Access

Classifying data is to automatically assign predefined classes to data. It is one of the main applications of data mining. Having complete access to all data is critical for building accurate models. Data can be highly sensitive, such as biomedical data, which cannot be disclosed or shared with third party, because it can harm individuals and organizations. The challenge is how to preserve privacy and usefulness of data. Privacy preserving classification addresses this problem. Collaborative models are constructed over networks without violating the data owners' privacy. In this article, the authors address two problems: privacy records deduplication of the same records and privacy-preserving classification. They propose a randomized hash technic for deduplication and an enhanced privacy preserving classification of biomedical data over horizontally distributed data based on two homomorphic encryptions. No private, intermediate or final results are disclosed. Experimentations show that their solution is efficient and secure without loss of accuracy.

Download Full-text

A Privacy-Preserving Data Collection and Processing Framework for Third-Party UAV Services

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) ◽

10.1109/trustcom50675.2020.00095 ◽

2020 ◽

Author(s):

Tianyuan Liu ◽

Hongpeng Guo ◽

Claudiu Danilov ◽

Klara Nahrstedt

Keyword(s):

Data Collection ◽

Privacy Preserving ◽

Third Party ◽

Processing Framework

Download Full-text

Mainzelliste SecureEpiLinker (MainSEL): Privacy-Preserving Record Linkage using Secure Multi-Party Computation

Bioinformatics ◽

10.1093/bioinformatics/btaa764 ◽

2020 ◽

Author(s):

Sebastian Stammler ◽

Tobias Kussel ◽

Phillipp Schoppmann ◽

Florian Stampe ◽

Galina Tremper ◽

...

Keyword(s):

Record Linkage ◽

Fault Tolerant ◽

Source Code ◽

Privacy Preserving ◽

Third Party ◽

Data Sets ◽

Real World Data ◽

Trusted Third Party ◽

Record Keeping ◽

Network Connection

Abstract Motivation Record Linkage has versatile applications in real-world data analysis contexts, where several data sets need to be linked on the record level in the absence of any exact identifier connecting related records. An example are medical databases of patients, spread across institutions, that have to be linked on personally identifiable entries like name, date of birth or ZIP code. At the same time, privacy laws may prohibit the exchange of this personally identifiable information (PII) across institutional boundaries, ruling out the outsourcing of the record linkage task to a trusted third party. We propose to employ privacy-preserving record linkage (PPRL) techniques that prevent, to various degrees, the leakage of PII while still allowing for the linkage of related records. Results We develop a framework for fault-tolerant PPRL using secure multi-party computation with the medical record keeping software Mainzelliste as the data source. Our solution does not rely on any trusted third party and all PII is guaranteed to not leak under common cryptographic security assumptions. Benchmarks show the feasibility of our approach in realistic networking settings: linkage of a patient record against a database of 10.000 records can be done in 48s over a heavily delayed (100ms) network connection, or 3.9s with a low-latency connection. Availability and implementation The source code of the sMPC node is freely available on Github at https://github.com/medicalinformatics/SecureEpilinker subject to the AGPLv3 license. The source code of the modified Mainzelliste is available at https://github.com/medicalinformatics/MainzellisteSEL.

Download Full-text

Federated Trusted Third Party as an Approach for Privacy Preserving Record Linkage in a Large Network of University Medicines in Pandemic Research

10.21203/rs.3.rs-1053445/v1 ◽

2021 ◽

Author(s):

Christopher Hampf ◽

Martin Bialke ◽

Hauke Hund ◽

Christian Fegeler ◽

Stefan Lang ◽

...

Keyword(s):

Data Integration ◽

Record Linkage ◽

Privacy Preserving ◽

Medical Data ◽

Third Party ◽

Bloom Filters ◽

Large Network ◽

Trusted Third Party ◽

University Medicine ◽

Expansion Stage

Abstract BackgroundThe Federal Ministry of Research and Education funded the Network of University Medicine for establishing an infrastructure for pandemic research. This includes the development of a COVID-19 Data Exchange Platform (CODEX) that provides standardised and harmonised data sets for COVID-19 research. Nearly all university hospitals in Germany are part of the project and transmit medical data from the local data integration centres to the CODEX platform. The medical data on a person that has been collected at several sites is to be made available on the CODEX platform in a merged form. To enable this, a federated trusted third party (fTTP) will be established, which will allow the pseudonymised merging of the medical data. The fTTP implements privacy preserving record linkage based on Bloom filters and assigns pseudonyms to enable re-pseudonymisation during data transfer to the CODEX platform.ResultsThe fTTP was implemented conceptually and technically. For this purpose, the processes that are necessary for data delivery were modelled. The resulting communication relationships were identified and corresponding interfaces were specified. These were developed according to the specifications in FHIR and validated with the help of external partners. Existing tools such as the identity management system E-PIX® were further developed accordingly so that sites can generate Bloom filters based on person identifying information. An extension for the comparison of Bloom filters was implemented for the federated trust third party. The correct implementation was shown in the form of a demonstrator and the connection of two data integration centres.ConclusionsThis article describes how the fTTP was modelled and implemented. In a first expansion stage, the fTTP was exemplarily connected through two sites and its functionality was demonstrated. Further expansion stages, which are already planned, have been technically specified and will be implemented in the future in order to also handle cases in which the privacy preserving record linkage achieves ambiguous results. The first expansion stage of the fTTP is available in the University Medicine network and will be connected by all participating sites in the ongoing test phase.

Download Full-text