Ensemble Clustering Data Mining and Databases

Author(s):  
Slawomir T. Wierzchon

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution – so-called EM algorithm can be applied. But in practice the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering do this task. In this article we clarify the idea of consensus clustering, and we present conceptual frames for such a compound analysis. Next the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.

Author(s):  
Slawomir T. Wierzchon

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution, so-called EM algorithm can be applied. But in practice, the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering does this task. In this chapter, the authors clarify the idea of consensus clustering, and they present conceptual frames for such a compound analysis. Next, the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.


2020 ◽  
Vol 68 (5-6) ◽  
pp. 341-353
Author(s):  
Sunčica Milutinović ◽  
Olivera Grljević

The subject of this research is evaluation of legal and international accounting regulations in terms of major deficiencies from the perspective of their users. Identification of the shortcomings of the current accounting regulations is important for improvement of laws governing accounting and audit practices, as they are in the process of public debate in the Republic of Serbia. The research was conducted to provide answers to the following research questions: What are the main deficiencies of regulations as considered by accountants? How do accountants get information on accounting regulations? And are the two research questions related? The targeted population comprises accountants and auditors employed in the private sector. Data collection was carried throughout the period of six months, during which we collected 338 fully completed questionnaires for the purposes of the study. Collected data was analysed using clustering data mining technique. Clustering algorithms enabled segmentation of surveyed accountants into well-separated and homogeneous groups of similar accountants. Analysis of the resulting clusters gave insights into the opinion and stance of accountants who exhibit similar characteristics. These insights form a solid basis for drawing conclusions on deficiencies of accounting regulations perceived by accountants in Serbia, which they are dealing with in day-to-day business.


Author(s):  
Nibras Othman Abdul Wahid ◽  
Saif Aamer Fadhil ◽  
Noor Abbood Jasim

Unsupervised data clustering investigation is a standout amongst the most valuable apparatuses and an enlightening undertaking in data mining that looks to characterize homogeneous gatherings of articles depending on likeness and is utilized in numerous applications. One of the key issues in data mining is clustering data that have pulled in much consideration. One of the famous clustering algorithms is K-means clustering that has been effectively connected to numerous issues. Scientists recommended enhancing the nature of K-means, optimization algorithms were hybridized. In this paper, a heuristic calculation, Lion Optimization Algorithm (LOA), and Genetic Algorithm (GA) were adjusted for K-Means data clustering by altering the fundamental parameters of LOA calculation, which is propelled from the characteristic enlivened calculations. The uncommon way of life of lions and their participation attributes has been the essential inspiration for the advancement of this improvement calculation. The GA is utilized when it is required to reallocate the clusters using the genetic operators, crossover, and mutation. The outcomes of the examination of this calculation mirror the capacity of this methodology in clustering examination on the number of benchmark datasets from UCI Machine Learning Repository.


2020 ◽  
Vol 3 (3) ◽  
pp. 187-201
Author(s):  
Sufajar Butsianto ◽  
Nindi Tya Mayangwulan

Penggunaan mobil di Indonesia setiap tahunnya selalu meningkat dan membuat perusahaan otomotif berlomba-lomba dalam peningkatan penjualannya. Tujuan dari penelitian ini untuk mengelompokan data penjualan kedalam sebuah cluster dengan metode Data Mining Algoritma K-Means Clustering. Data Penjualan nantinya akan dikelompokan berdasarkan kemiripan data tersebut sehingga data dengan karakteristik yang sama akan berada dalam satu cluster. Atribut yang digunakan adalah brand dan penjualan. Cluster yang terbentuk setelah dilakukan proses K-Means Clustering terbagi menjadi tiga cluster yaitu Cluster 0 jumlah anggota 235 dengan presentase 26% dikategorikan Laris, Cluster 1 jumlah anggota 604 dengan presentase 67% dikategorikan Kurang Laris, dan Cluster 2 jumlah angota 61 dengan presentase 7% dikategorikan Paling Laris, dari proses clustering diatas dapat diperoleh validasi DBI (Davies Bouldin Index) dengan nilai 0,341


2018 ◽  
Vol 3 (1) ◽  
pp. 001
Author(s):  
Zulhendra Zulhendra ◽  
Gunadi Widi Nurcahyo ◽  
Julius Santony

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.


2020 ◽  
Vol 25 (1) ◽  
pp. 76-88
Author(s):  
Suhandio Handoko ◽  
Fauziah Fauziah ◽  
Endah Tri Esti Handayani
Keyword(s):  

Perkembangan industri telekomunikasi saat ini sangat pesat karena telekomunikasi sudah menjadi kebutuhan utama bagi masyarakat sehingga banyak perusahaan yang bergerak di industry telekomunikasi. Banyaknya industry Telekomunikasi menuntut para pengembang untuk menemukan strategi atau suatu pola yang dapat meningkatkan penjualan dan pemasaran produk, salah satu strateginya adalah dengan memanfaatkan data transaksi. Paket data merupakan produk dibidang telekomunikasi. Proses Clustering saat ini masih di lakukan secara manual sehingga membutuhkan waktu, proses perhitungan dan ketelitian yang tinggi. Pada penelitian ini dibuat aplikasi berbasis website dengan tujuan untuk mempermudah Clustering data sehingga dapat digunakan sebagai referensi dalam perencanaan promosi produk telkomsel ke berbagai daerah. Metode yang digunakan untuk mengatasi permasalahan tersebut yaitu metode Clustering dengan menggunakan Algoritma K-Means. Algoritma K-Means merupakan algoritma pengelompokkan sejumlah data menjadi menjadi kelompok-kelompok data tertentu. Pada penelitian ini data penjualan dikelompokkan menjadi 3 yaitu data penjualan rendah, data penjualan sedang dan data penjualan tinggi. Pengujian clustering dengan algoritma K-Means pada aplikasi terhadap data transaksi penjualan paket telkomsel diperoleh persentase kesesuaian yaitu 100% dibandingkan dengan clustering manual.


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


Author(s):  
P. Tamijiselvy ◽  
N. Kavitha ◽  
K. M. Keerthana ◽  
D. Menakha

The degree of aortic calcification has been appeared to be a risk pointer for vascular occasions including cardiovascular events. The created strategy is fully automated data mining algorithm to segment and measure calcification using Low-dose Chest CT in smokers of age 50 to 70 .The identification of subjects with increased cardiovascular risk can be detected by using data mining algorithms. This paper presents a method for automatic detection of coronary artery calcifications in low-dose chest CT scans using effective clustering algorithms with three phases as Pre-Processing, Segmentation and clustering. Fuzzy C Means algorithm provides accuracy of 80.23% demonstrate that Fuzzy C means detects the Cardio Vascular Disease at early stage.


2021 ◽  
Vol 8 (1) ◽  
pp. 83
Author(s):  
Bagus Muhammad Islami ◽  
Cepy Sukmayadi ◽  
Tesa Nur Padilah

Abstrak: Masalah kesehatan yang ada di dalam masyarakat terutama di negara- negara berkembang seperti Indonesia dipengaruhi oleh dua faktor yaitu aspek fisik dan aspek non fisik. Berdasarkan data yang diperoleh dari karawangkab.bps.go.id data dibagi menjadi 3 cluster yaitu sedikit, sedang dan terbanyak. Algoritma yang digunakan adalah K-Means cluster yang diimplementsikan menggunakan Microsoft Excel dan Rapidminer Studio. Hasil pengolahan data fasilitas kesehatan di karawang menghasilkan 3 cluster dengan cluster 1 yang mempunyai fasilitas kesehatan sedikit sebanyak 23 kecamatan, cluster 2 yang mempunyai fasilitas kesehatan sedang sebanyak 5 kecamatan dan cluster 3 yang mempunyai fasilitas kesehatan terbanyak terdapat 2 kecamatan. Kinerja yang dihasilkan dari algoritma K-means menghasilkan nilai Davies Boildin Index sebesar 0,109.   Kata kunci: clustering, data mining, fasilitas kesehatan, K-Means.   Abstract: Health problems that exist in society, especially in developing countries like Indonesia, are built by two factors, namely physical and non-physical aspects. Based on data obtained from karawangkab.bps.go.id the data is divided into 3 clusters, namely the least, medium and the most. The algorithm used is the K-Means cluster which is implemented using Microsoft Excel and Rapidminer Studio. The results of data processing of health facilities in Karawang produce 3 clusters with cluster 1 which has 23 sub-districts of health facilities, cluster 2 which has medium health facilities as many as 5 districts and cluster 3 which has the most health facilities in 2 districts. The performance resulting from the K-means algorithm results in a Davies Boildin Index value of 0.109.   Keywords: clustering, data mining, health facilities, K-Means.


Sign in / Sign up

Export Citation Format

Share Document