Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Weijia Lu

doi:10.1007/s10723-019-09503-0

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

Computer and Information Science ◽

10.5539/cis.v14n2p26 ◽

2021 ◽

Vol 14 (2) ◽

pp. 26

Author(s):

Na Li ◽

Lianguan Huang ◽

Yanling Li ◽

Meng Sun

Keyword(s):

Data Mining ◽

Big Data ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Locality Sensitive Hashing ◽

Sensitive Information ◽

The Public ◽

Big Data Mining ◽

Euclidean Distances ◽

Computational Resources

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.

Download Full-text

Research on Parallel Adaptive Canopy-K-Means Clustering Algorithm for Big Data Mining Based on Cloud Platform

Journal of Grid Computing ◽

10.1007/s10723-019-09504-z ◽

2020 ◽

Vol 18 (2) ◽

pp. 263-273 ◽

Cited By ~ 1

Author(s):

Dongliang Xia ◽

Feifei Ning ◽

Weina He

Keyword(s):

Data Mining ◽

Big Data ◽

Clustering Algorithm ◽

Cloud Platform ◽

Big Data Mining

Download Full-text

An Empirical Perusal of Distance Measures for Clustering with Big Data Mining

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8078.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 606-616 ◽

Cited By ~ 1

Keyword(s):

Data Mining ◽

Big Data ◽

Clustering Algorithm ◽

Distance Measure ◽

Confusion Matrix ◽

Heterogeneous Data ◽

Distance Measures ◽

Research Perspective ◽

Big Data Mining ◽

Data Criterion

The distance measure is the core idea of data mining techniques such as classification, clustering, and statistical analysis and so on. All clustering taxonomies such as partition, hierarchical, density, grid, model, fuzzy and graphs used to distance measures for the data point’s categorization under difference cluster, cluster construction and validation. Big data mining is the advanced concept of data mining respect to the big data dimensions. When traditional clustering algorithm is used under the big data mining the distance measure is needed for scalable under big data mining and support to a huge size dataset, heterogeneous data and sources, and velocity characteristics of the big data. From a theoretically, practically and the existing research perspective, the paper focuses on volume, variety, and velocity big data criterion for identifying a distance measure for the big data mining and recognize how to distance measure works under clustering taxonomy. This study also analyzed all distance measures accuracy with the help of a confusion matrix through clustering.

Download Full-text

Retrieving Information and Discovering Knowledge from Unstructured Data Using Big Data Mining Technique: Heavy Oil Fields Example

10.2523/17805-ms ◽

2014 ◽

Cited By ~ 1

Author(s):

Wenkuang Wu ◽

Xiaoguang Lu ◽

Ben Cox ◽

Guoqiang Li ◽

Lihua Lin ◽

...

Keyword(s):

Data Mining ◽

Big Data ◽

Heavy Oil ◽

Oil Fields ◽

Unstructured Data ◽

Data Mining Technique ◽

Big Data Mining ◽

Mining Technique

Download Full-text

An Overview on Big Data Mining Using Evolutionary Techniques

2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT) ◽

10.1109/3ict51146.2020.9312016 ◽

2020 ◽

Author(s):

Fadia Alaeddin ◽

Ala' Khalifeh ◽

Khalid A. Darabkh

Keyword(s):

Data Mining ◽

Big Data ◽

Big Data Mining

Download Full-text

Research on Personalized Recommendation System Based on Big Data Mining Technology

2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE) ◽

10.1109/icmcce51767.2020.00052 ◽

2020 ◽

Author(s):

Hongwei Li

Keyword(s):

Data Mining ◽

Big Data ◽

Recommendation System ◽

Personalized Recommendation ◽

Mining Technology ◽

Big Data Mining

Download Full-text

Empirical Study of Big Data Mining Technology in English Teaching Integration and Optimization Analysis

Proceedings of the 2020 International Conference on Computers, Information Processing and Advanced Education ◽

10.1145/3419635.3419734 ◽

2020 ◽

Author(s):

Chenxia Qiu

Keyword(s):

Data Mining ◽

Big Data ◽

Empirical Study ◽

Optimization Analysis ◽

Mining Technology ◽

English Teaching ◽

Big Data Mining

Download Full-text

Cloud-Based Big Data Mining & Analyzing Services Platform Integrating R

2013 International Conference on Advanced Cloud and Big Data ◽

10.1109/cbd.2013.13 ◽

2013 ◽

Cited By ~ 10

Author(s):

Feng Ye ◽

Zhi-Jian Wang ◽

Fa-Chao Zhou ◽

Ya-Pu Wang ◽

Yuan-Chao Zhou

Keyword(s):

Data Mining ◽

Big Data ◽

Big Data Mining

Download Full-text

Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data

2014 IEEE International Congress on Big Data ◽

10.1109/bigdata.congress.2014.53 ◽

2014 ◽

Cited By ~ 31

Author(s):

Carson Kai-Sang Leung ◽

Richard Kyle MacKinnon ◽

Fan Jiang

Keyword(s):

Data Mining ◽

Big Data ◽

Uncertain Data ◽

Search Space ◽

Big Data Mining

Download Full-text

Big data mining based coordinated control discrete algorithm of independent micro grid with PV and energy

Microprocessors and Microsystems ◽

10.1016/j.micpro.2020.103808 ◽

2021 ◽

Vol 82 ◽

pp. 103808

Author(s):

Peng Tian ◽

Lin Zhang

Keyword(s):

Data Mining ◽

Big Data ◽

Coordinated Control ◽

Big Data Mining ◽

Discrete Algorithm ◽

Micro Grid

Download Full-text

Improved K-Means Clustering Algorithm for Big Data Mining under Hadoop Parallel Framework

Efficient and Privacy-Preserving Multi-User Outsourced K-Means Clustering

Research on Parallel Adaptive Canopy-K-Means Clustering Algorithm for Big Data Mining Based on Cloud Platform

An Empirical Perusal of Distance Measures for Clustering with Big Data Mining

Retrieving Information and Discovering Knowledge from Unstructured Data Using Big Data Mining Technique: Heavy Oil Fields Example

An Overview on Big Data Mining Using Evolutionary Techniques

Research on Personalized Recommendation System Based on Big Data Mining Technology

Empirical Study of Big Data Mining Technology in English Teaching Integration and Optimization Analysis

Cloud-Based Big Data Mining &amp; Analyzing Services Platform Integrating R

Reducing the Search Space for Big Data Mining for Interesting Patterns from Uncertain Data

Big data mining based coordinated control discrete algorithm of independent micro grid with PV and energy

Cloud-Based Big Data Mining & Analyzing Services Platform Integrating R