Using Cryptography For Privacy-Preserving Data Mining

Data Mining and Knowledge Discovery Technologies ◽

10.4018/978-1-60566-218-3.ch014 ◽

2008 ◽

pp. 175-194

Author(s):

Justin Zhan

Keyword(s):

Data Mining ◽

Data Privacy ◽

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor ◽

Privacy Concerns ◽

Private Data ◽

Definition Of ◽

Types Of Information ◽

Neighbor Classification

To conduct data mining, we often need to collect data from various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The goal of this paper is to provide solutions for privacy-preserving k-nearest neighbor classification which is one of data mining tasks. Our goal is to obtain accurate data mining results without disclosing private data. We propose a formal definition of privacy and show that our solutions preserve data privacy.

Download Full-text

A Privacy Preserving Cloud-Based K-NN Search Scheme with Lightweight User Loads

Computers ◽

10.3390/computers9010001 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Yeong-Cherng Hsu ◽

Chih-Hsin Hsueh ◽

Ja-Ling Wu

Keyword(s):

Data Privacy ◽

Nearest Neighbor ◽

Search Algorithm ◽

Data Access ◽

Privacy Preserving ◽

Secret Key ◽

K Nearest Neighbor ◽

Sensitive Data ◽

Cloud Data ◽

Cloud Server

With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving K-nearest neighbor (K-NN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attribute-wise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in sub-liner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.

Download Full-text

Privacy preserving k-nearest neighbor classification over encrypted database in outsourced cloud environments

World Wide Web ◽

10.1007/s11280-018-0539-4 ◽

2018 ◽

Vol 22 (1) ◽

pp. 101-123 ◽

Cited By ~ 7

Author(s):

Wei Wu ◽

Udaya Parampalli ◽

Jian Liu ◽

Ming Xian

Keyword(s):

Nearest Neighbor ◽

Privacy Preserving ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Cloud Environments ◽

Encrypted Database ◽

Neighbor Classification

Download Full-text

PPDM-TAN: A Privacy-Preserving Multi-Party Classifier

Computation ◽

10.3390/computation9010006 ◽

2021 ◽

Vol 9 (1) ◽

pp. 6

Author(s):

Maria Eleni Skarkala ◽

Manolis Maragoudakis ◽

Stefanos Gritzalis ◽

Lilian Mitrou

Keyword(s):

Data Mining ◽

Private Information ◽

Security Analysis ◽

Privacy Preserving ◽

Data Mining Algorithm ◽

Bayes Classifier ◽

Sensitive Data ◽

Privacy Concerns ◽

Information Privacy Concerns ◽

Private Data

Distributed medical, financial, or social databases are analyzed daily for the discovery of patterns and useful information. Privacy concerns have emerged as some database segments contain sensitive data. Data mining techniques are used to parse, process, and manage enormous amounts of data while ensuring the preservation of private information. Cryptography, as shown by previous research, is the most accurate approach to acquiring knowledge while maintaining privacy. In this paper, we present an extension of a privacy-preserving data mining algorithm, thoroughly designed and developed for both horizontally and vertically partitioned databases, which contain either nominal or numeric attribute values. The proposed algorithm exploits the multi-candidate election schema to construct a privacy-preserving tree-augmented naive Bayesian classifier, a more robust variation of the classical naive Bayes classifier. The exploitation of the Paillier cryptosystem and the distinctive homomorphic primitive shows in the security analysis that privacy is ensured and the proposed algorithm provides strong defences against common attacks. Experiments deriving the benefits of real world databases demonstrate the preservation of private data while mining processes occur and the efficient handling of both database partition types.

Download Full-text

Research on Privacy Preserving Classification Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.1863 ◽

2015 ◽

Vol 713-715 ◽

pp. 1863-1867 ◽

Cited By ~ 1

Author(s):

Xun Yi Ren ◽

Wu Yuan

Keyword(s):

Data Mining ◽

Decision Tree ◽

Data Privacy ◽

Privacy Preserving ◽

Classification Algorithm ◽

Decision Tree Classification ◽

Private Data

In the process of data mining, how to operate the data mining as well as protect the private data is a problem must be solved. This paper proposed an improvement of decision tree classification algorithm. Homomorphism encryption system, digital envelopes technology and secret sorting are applied to protect the data privacy. Our contribution is a privacy preserving protocol consist of homomorphism encryption system and secret sorting. Analysis shows that this algorithm can get right results as well as protect the privacy of the private data.

Download Full-text

Perancangan Aplikasi Prediksi Kelulusan Tepat Waktu Bagi Mahasiswa Baru Dengan Teknik Data Mining (Studi Kasus: Data Akademik Mahasiswa STMIK Dipanegara Makassar)

Creative Information Technology Journal ◽

10.24076/citec.2014v1i4.27 ◽

2015 ◽

Vol 1 (4) ◽

pp. 270

Author(s):

Muhammad Syukri Mustafa ◽

I. Wayan Simpen

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Test Results ◽

K Nearest Neighbor ◽

Accuracy Rate ◽

Sample Data ◽

New Students ◽

K Nearest Neighbor Algorithm ◽

Using Data ◽

Existing Data

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.

Download Full-text

Exclusive lasso-based k-nearest-neighbor classification

Neural Computing and Applications ◽

10.1007/s00521-021-06069-5 ◽

2021 ◽

Author(s):

Lin Qiu ◽

Yanpeng Qu ◽

Changjing Shang ◽

Longzhi Yang ◽

Fei Chao ◽

...

Keyword(s):

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

Gear crack level identification based on weighted K nearest neighbor classification algorithm

Mechanical Systems and Signal Processing ◽

10.1016/j.ymssp.2009.01.009 ◽

2009 ◽

Vol 23 (5) ◽

pp. 1535-1547 ◽

Cited By ~ 146

Author(s):

Yaguo Lei ◽

Ming J. Zuo

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Gear Crack ◽

Neighbor Classification ◽

Level Identification

Download Full-text

Data Mining Approach to Analyze COVID-19 Clinical Dataset

10.53350/pjmhs211561812 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1812-1819

Author(s):

Azita Yazdani ◽

Ramin Ravangard ◽

Roxana Sharifian

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Clinical Signs ◽

Study Data ◽

Mining Machine ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Approach

The new coronavirus has been spreading since the beginning of 2020 and many efforts have been made to develop vaccines to help patients recover. It is now clear that the world needs a rapid solution to curb the spread of COVID-19 worldwide with non-clinical approaches such as data mining, enhanced intelligence, and other artificial intelligence techniques. These approaches can be effective in reducing the burden on the health care system to provide the best possible way to diagnose and predict the COVID-19 epidemic. In this study, data mining models for early detection of Covid-19 in patients were developed using the epidemiological dataset of patients and individuals suspected of having Covid-19 in Iran. C4.5, support vector machine, Naive Bayes, logistic regression, Random Forest, and k-nearest neighbor algorithm were used directly on the dataset using Rapid miner to develop the models. By receiving clinical signs, this model diagnosis the risk of contracting the COVID-19 virus. Examination of the models in this study has shown that the support vector machine with 93.41% accuracy is more efficient in the diagnosis of patients with COVID-19 pandemic, which is the best model among other developed models. Keywords: COVID-19, Data mining, Machine Learning, Artificial Intelligence, Classification

Download Full-text