Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.Kata Kunci: Illiterate, Data mining, K-Means ClusteringData Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf berdasarkan propinsi di Indonesia. Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu 1 node kelompok tinggi, kelompok rendah memiliki 27 node, dan kelompok sedang 6 node. Hasil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan pemberantasan buta huruf di Indonesia berdasarakn propinsi.Kata Kunci: Buta Huruf, Data mining, K-Means Clustering

Download Full-text

Large-scale data clustering algorithm based on quantum immune regulation network

2017 IEEE Symposium Series on Computational Intelligence (SSCI) ◽

10.1109/ssci.2017.8285302 ◽

2017 ◽

Author(s):

Yangyang Li ◽

Xiaoyu Bai ◽

Xiaoju Hou ◽

Licheng Jiao

Keyword(s):

Immune Regulation ◽

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Large Scale Data ◽

Regulation Network ◽

Scale Data

Download Full-text

A Spark-Based Artificial Bee Colony Algorithm for Large-Scale Data Clustering

2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2018.00204 ◽

2018 ◽

Author(s):

Yanjie Wang ◽

Quan Qian

Keyword(s):

Data Clustering ◽

Artificial Bee Colony Algorithm ◽

Large Scale ◽

Artificial Bee Colony ◽

Bee Colony ◽

Large Scale Data ◽

Scale Data

Download Full-text

Improvement of K-Means Algorithm for Accelerated Big Data Clustering

International Journal of Information Technologies and Systems Approach ◽

10.4018/ijitsa.2021070107 ◽

2021 ◽

Vol 14 (2) ◽

pp. 99-119

Author(s):

Chunqiong Wu ◽

Bingwen Yan ◽

Rongrui Yu ◽

Zhangshu Huang ◽

Baoqin Yu ◽

...

Keyword(s):

Data Mining ◽

Data Clustering ◽

Large Scale ◽

Rapid Development ◽

Large Data ◽

Data Retrieval ◽

Research Directions ◽

Large Scale Data ◽

Rich Information ◽

Scale Data

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.

Download Full-text

GPU enhanced parallel computing for large scale data clustering

Future Generation Computer Systems ◽

10.1016/j.future.2012.07.009 ◽

2013 ◽

Vol 29 (7) ◽

pp. 1736-1741 ◽

Cited By ~ 13

Author(s):

Xiaohui Cui ◽

Jesse St. Charles ◽

Thomas Potok

Keyword(s):

Parallel Computing ◽

Data Clustering ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2018.06.018 ◽

2018 ◽

Vol 12 (6) ◽

Keyword(s):

Parallel Processing ◽

Large Scale ◽

Compression Algorithm ◽

Large Scale Data ◽

Scale Data

Download Full-text

Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm

Journal of Artificial Intelligence and Soft Computing Research ◽

10.1515/jaiscr-2016-0003 ◽

2016 ◽

Vol 6 (1) ◽

pp. 23-33 ◽

Cited By ~ 23

Author(s):

Ahmed M. Serdah ◽

Wesam M. Ashour

Keyword(s):

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Affinity Propagation ◽

Clustering Method ◽

Data Set ◽

Local Cluster ◽

Large Scale Data ◽

Scale Data

Abstract Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.

Download Full-text

Distributed Entity Resolution Based on Similarity Join for Large-Scale Data Clustering

Web-Age Information Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-08010-9_16 ◽

2014 ◽

pp. 138-149 ◽

Cited By ~ 1

Author(s):

Tiezheng Nie ◽

Wang-chien Lee ◽

Derong Shen ◽

Ge Yu ◽

Yue Kou

Keyword(s):

Data Clustering ◽

Large Scale ◽

Entity Resolution ◽

Similarity Join ◽

Large Scale Data ◽

Scale Data

Download Full-text