Techniques and Challenges while Applying Machine Learning Algorithms in Privacy Preserving Fashion

Proceeding International Conference on Science and Engineering ◽

10.14421/icse.v3.600 ◽

2020 ◽

Vol 3 ◽

pp. xix-xix

Author(s):

Artrim Kjamilji

Keyword(s):

Machine Learning ◽

Private Information ◽

Cyber Security ◽

Credit Card ◽

Differential Privacy ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Machine Learning Algorithms ◽

Garbled Circuits ◽

Private Data

Nowadays many different entities collect data of the same nature, but in slightly different environments. In this sense different hospitals collect data about their patients’ symptoms and corresponding disease diagnoses, different banks collect transactions of their customers’ bank accounts, multiple cyber-security companies collect data about log files and corresponding attacks, etc. It is shown that if those different entities would merge their privately collected data in a single dataset and use it to train a machine learning (ML) model, they often end up with a trained model that outperforms the human experts of the corresponding fields in terms of accurate predictions. However, there is a drawback. Due to privacy concerns, empowered by laws and ethical reasons, no entity is willing to share with others their privately collected data. The same problem appears during the classification case over an already trained ML model. On one hand, a user that has an unclassified query (record), doesn’t want to share with the server that owns the trained model neither the content of the query (which might contain private data such as credit card number, IP address, etc.), nor the final prediction (classification) of the query. On the other hand, the owner of the trained model doesn’t want to leak any parameter of the trained model to the user. In order to overcome those shortcomings, several cryptographic and probabilistic techniques have been proposed during the last few years to enable both privacy preserving training and privacy preserving classification schemes. Some of them include anonymization and k-anonymity, differential privacy, secure multiparty computation (MPC), federated learning, Private Information Retrieval (PIR), Oblivious Transfer (OT), garbled circuits and/or homomorphic encryption, to name a few. Theoretical analyses and experimental results show that the current privacy preserving schemes are suitable for real-case deployment, while the accuracy of most of them differ little or not at all with the schemes that work in non-privacy preserving fashion.

Download Full-text

Securely Outsourcing ID3 Decision Tree in Cloud Computing

Wireless Communications and Mobile Computing ◽

10.1155/2018/2385150 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Ye Li ◽

Zoe L. Jiang ◽

Xuan Wang ◽

Junbin Fang ◽

En Zhang ◽

...

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Decision Tree ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Security Model ◽

Garbled Circuits ◽

Private Data ◽

Malicious Model ◽

And Performance

With the wide application of Internet of Things (IoT), a huge number of data are collected from IoT networks and are required to be processed, such as data mining. Although it is popular to outsource storage and computation to cloud, it may invade privacy of participants’ information. Cryptography-based privacy-preserving data mining has been proposed to protect the privacy of participating parties’ data for this process. However, it is still an open problem to handle with multiparticipant’s ciphertext computation and analysis. And these algorithms rely on the semihonest security model which requires all parties to follow the protocol rules. In this paper, we address the challenge of outsourcing ID3 decision tree algorithm in the malicious model. Particularly, to securely store and compute private data, the two-participant symmetric homomorphic encryption supporting addition and multiplication is proposed. To keep from malicious behaviors of cloud computing server, the secure garbled circuits are adopted to propose the privacy-preserving weight average protocol. Security and performance are analyzed.

Download Full-text

Privacy-Preserving High-dimensional Data Collection with Federated Generative Autoencoder

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2022-0024 ◽

2021 ◽

Vol 2022 (1) ◽

pp. 481-500

Author(s):

Xue Jiang ◽

Xuebing Zhou ◽

Jens Grossklags

Keyword(s):

Machine Learning ◽

Data Collection ◽

Private Information ◽

Differential Privacy ◽

High Dimensional Data ◽

Synthetic Data ◽

Privacy Preserving ◽

High Dimensional ◽

Data Utility ◽

High Utility

Abstract Business intelligence and AI services often involve the collection of copious amounts of multidimensional personal data. Since these data usually contain sensitive information of individuals, the direct collection can lead to privacy violations. Local differential privacy (LDP) is currently considered a state-ofthe-art solution for privacy-preserving data collection. However, existing LDP algorithms are not applicable to high-dimensional data; not only because of the increase in computation and communication cost, but also poor data utility. In this paper, we aim at addressing the curse-of-dimensionality problem in LDP-based high-dimensional data collection. Based on the idea of machine learning and data synthesis, we propose DP-Fed-Wae, an efficient privacy-preserving framework for collecting high-dimensional categorical data. With the combination of a generative autoencoder, federated learning, and differential privacy, our framework is capable of privately learning the statistical distributions of local data and generating high utility synthetic data on the server side without revealing users’ private information. We have evaluated the framework in terms of data utility and privacy protection on a number of real-world datasets containing 68–124 classification attributes. We show that our framework outperforms the LDP-based baseline algorithms in capturing joint distributions and correlations of attributes and generating high-utility synthetic data. With a local privacy guarantee ∈ = 8, the machine learning models trained with the synthetic data generated by the baseline algorithm cause an accuracy loss of 10% ~ 30%, whereas the accuracy loss is significantly reduced to less than 3% and at best even less than 1% with our framework. Extensive experimental results demonstrate the capability and efficiency of our framework in synthesizing high-dimensional data while striking a satisfactory utility-privacy balance.

Download Full-text

Efficient Secure Building Blocks with Application to Privacy Preserving Machine Learning Algorithms

IEEE Access ◽

10.1109/access.2021.3049216 ◽

2021 ◽

pp. 1-1

Author(s):

Artrim Kjamilji ◽

Erkay Savas ◽

Albert Levi

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Building Blocks ◽

Privacy Preserving ◽

Machine Learning Algorithms

Download Full-text

Privacy Preserving Machine Learning with Homomorphic Encryption and Federated Learning

Future Internet ◽

10.3390/fi13040094 ◽

2021 ◽

Vol 13 (4) ◽

pp. 94

Author(s):

Haokun Fang ◽

Quan Qian

Keyword(s):

Machine Learning ◽

Homomorphic Encryption ◽

Privacy Preserving ◽

Great Success ◽

Learning Framework ◽

Computational Overhead ◽

Important Concern ◽

Speed Up ◽

Key Length ◽

Core Idea

Privacy protection has been an important concern with the great success of machine learning. In this paper, it proposes a multi-party privacy preserving machine learning framework, named PFMLP, based on partially homomorphic encryption and federated learning. The core idea is all learning parties just transmitting the encrypted gradients by homomorphic encryption. From experiments, the model trained by PFMLP has almost the same accuracy, and the deviation is less than 1%. Considering the computational overhead of homomorphic encryption, we use an improved Paillier algorithm which can speed up the training by 25–28%. Moreover, comparisons on encryption key length, the learning network structure, number of learning clients, etc. are also discussed in detail in the paper.

Download Full-text

Effects of Noise on Machine Learning Algorithms Using Local Differential Privacy Techniques

2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) ◽

10.1109/iemtronics52119.2021.9422609 ◽

2021 ◽

Author(s):

Krishna Chaitanya Gadepally ◽

Sameer Mangalampalli

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

THE USE OF DATA MINING IN DETECTING CREDIT CARD FRAUD

10.31234/osf.io/uhqcs ◽

2022 ◽

Author(s):

Kingsley Austin

Keyword(s):

Machine Learning ◽

Credit Card ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

High Detection Rate ◽

Credit Card Fraud ◽

Real Time Processing ◽

Detection Systems ◽

Hybrid Approaches ◽

Use Of Data

Abstract— Credit card fraud is a serious problem for e-commerce retailers with UK merchants reporting losses of $574.2M in 2020. As a result, effective fraud detection systems must be in place to ensure that payments are processed securely in an online environment. From the literature, the detection of credit card fraud is challenging due to dataset imbalance (genuine versus fraudulent transactions), real-time processing requirements, and the dynamic behavior of fraudsters and customers. It is proposed in this paper that the use of machine learning could be an effective solution for combating credit card fraud.According to research, machine learning techniques can play a role in overcoming the identified challenges while ensuring a high detection rate of fraudulent transactions, both directly and indirectly. Even though both supervised and unsupervised machine learning algorithms have been suggested, the flaws in both methods point to the necessity for hybrid approaches.

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

DETECTING CREDIT CARD FRAUD USING MACHINE LEARNING ALGORITHMS

InterConf ◽

10.51582/interconf.19-20.08.2021.037 ◽

2021 ◽

pp. 393-403

Author(s):

Olexander Shmatko ◽

Volodimir Fedorchenko ◽

Dmytro Prochukhan

Keyword(s):

Machine Learning ◽

Financial Services ◽

Credit Card ◽

Banking Sector ◽

Learning Algorithm ◽

Learning Algorithms ◽

Credit Cards ◽

Internet Banking ◽

Machine Learning Algorithms ◽

Credit Card Fraud

Today the banking sector offers its clients many different financial services such as ATM cards, Internet banking, Debit card, and Credit card, which allows attracting a large number of new customers. This article proposes an information system for detecting credit card fraud using a machine learning algorithm. Usually, credit cards are used by the customer around the clock, so the bank's server can track all transactions using machine learning algorithms. It must find or predict fraud detection. The dataset contains characteristics for each transaction and fraudulent transactions need to be classified and detected. For these purposes, the work proposes the use of the Random Forest algorithm.

Download Full-text

Controlled Functional Encryption Revisited: Multi-Authority Extensions and Efficient Schemes for Quadratic Functions

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0003 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 21-42

Author(s):

Miguel Ambrona ◽

Dario Fiore ◽

Claudio Soriente

Keyword(s):

Homomorphic Encryption ◽

Linear Functions ◽

Quadratic Functions ◽

Functional Encryption ◽

Fine Grained ◽

Multiple Data ◽

Garbled Circuits ◽

Private Data ◽

Data Owner ◽

Single Data

AbstractIn a Functional Encryption scheme (FE), a trusted authority enables designated parties to compute specific functions over encrypted data. As such, FE promises to break the tension between industrial interest in the potential of data mining and user concerns around the use of private data. FE allows the authority to decide who can compute and what can be computed, but it does not allow the authority to control which ciphertexts can be mined. This issue was recently addressed by Naveed et al., that introduced so-called Controlled Functional encryption (or C-FE), a cryptographic framework that extends FE and allows the authority to exert fine-grained control on the ciphertexts being mined. In this work we extend C-FE in several directions. First, we distribute the role of (and the trust in) the authority across several parties by defining multi-authority C-FE (or mCFE). Next, we provide an efficient instantiation that enables computation of quadratic functions on inputs provided by multiple data-owners, whereas previous work only provides an instantiation for linear functions over data supplied by a single data-owner and resorts to garbled circuits for more complex functions. Our scheme leverages CCA2 encryption and linearly-homomorphic encryption. We also implement a prototype and use it to showcase the potential of our instantiation.

Download Full-text