Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing

2006 ◽  
Vol 36 (3) ◽  
pp. 219-234 ◽  
Author(s):  
Hiroyuki Takizawa ◽  
Hiroaki Kobayashi
2020 ◽  
Vol 7 (3) ◽  
pp. 230
Author(s):  
Saifullah Saifullah ◽  
Nani Hidayati

<p><em>Data Mining is a method that is often needed in large-scale data processing, so data mining has important access to the fields of life including industry, finance, weather, science and technology. In data mining techniques there are methods that can be used, namely classification, clustering, regression, variable selection, and market basket analysis. Illiteracy is one of the factors that hinder the quality of human resources. One of the basic things that must be fulfilled to improve the quality of human resources is the eradication of illiteracy among the community. The purpose of this study is to determine the clustering of illiterate communities based on provinces in Indonesia. The results of the study are illiterate data clustering according to the age proportion of 15-44 namely 1 high group node, low group has 27 nodes, and medium group 6 nodes. The results of this study become input for the government to determine illiteracy eradication policies in Indonesia based on provinces.</em></p><p><strong>Kata Kunci</strong>: <em>Illiterate</em><em>, Data mining, K-Means Clustering</em></p><p><em>Data Mining termasuk metode yang sering dibutuhkan dalam pengolahan data berskala besar, maka data mining mempunyai akses penting pada bidang kehidupan diantaranya yaitu bidang industri, bidang keuangan, cuaca, ilmu dan teknologi. Pada teknik data mining terdapat metode-metode yang dapat digunakan yaitu klasifikasi, clustering, regresi, seleksi variabel, dan market basket analisis. Buta huruf merupakan salah satu faktor yang menghambat kualitas sumber daya manusia. Salah satu hal mendasar yang harus dipenuhi untuk meningkatkan kualitas sumber daya manusia adalah pemberantasan buta huruf di kalangan masyarakat</em><em> </em><em>Adapun tujuan penelitian ini adalah menetukan clustering masyarakat buta huruf</em><em> berdasarkan propinsi di Indonesia</em><em>.</em><em> </em><em>Hasil dari penelitian adalah data clustering buta huruf menurut propisi umur 15-44 yaitu</em><em> 1 node</em><em> kelompok tinggi</em><em>,  kelompok rendah memiliki 27 node</em><em>, dan kelompok  sedang  6 node. Ha</em><em>sil penelitian ini menjadi bahan masukan kepada pemerintah untuk menentukan kebijakan</em><em> </em><em>pemberantasan buta huruf di Indonesia berdasarakn propinsi</em><em>.</em></p><p><strong>Kata Kunci</strong>: Buta Huruf, Data mining, <em>K-Means Clustering</em><em></em></p>


Author(s):  
Chunqiong Wu ◽  
Bingwen Yan ◽  
Rongrui Yu ◽  
Zhangshu Huang ◽  
Baoqin Yu ◽  
...  

With the rapid development of the computer level, especially in recent years, “Internet +,” cloud platforms, etc. have been used in various industries, and various types of data have grown in large quantities. Behind these large amounts of data often contain very rich information, relying on traditional data retrieval and analysis methods, and data management models can no longer meet our needs for data acquisition and management. Therefore, data mining technology has become one of the solutions to how to quickly obtain useful information in today's society. Effectively processing large-scale data clustering is one of the important research directions in data mining. The k-means algorithm is the simplest and most basic method in processing large-scale data clustering. The k-means algorithm has the advantages of simple operation, fast speed, and good scalability in processing large data, but it also often exposes fatal defects in data processing. In view of some defects exposed by the traditional k-means algorithm, this paper mainly improves and analyzes from two aspects.


2013 ◽  
Vol 29 (7) ◽  
pp. 1736-1741 ◽  
Author(s):  
Xiaohui Cui ◽  
Jesse St. Charles ◽  
Thomas Potok

Author(s):  
Ahmed M. Serdah ◽  
Wesam M. Ashour

Abstract Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.


Sign in / Sign up

Export Citation Format

Share Document