Ensemble Clustering Data Mining and Databases

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch170 ◽

2018 ◽

pp. 1962-1973

Author(s):

Slawomir T. Wierzchon

Keyword(s):

Data Mining ◽

Data Structure ◽

Em Algorithm ◽

Normal Distribution ◽

Clustering Algorithms ◽

Consensus Clustering ◽

New Directions ◽

Consensus Procedure ◽

Basic Approaches ◽

Clustering Data

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution – so-called EM algorithm can be applied. But in practice the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering do this task. In this article we clarify the idea of consensus clustering, and we present conceptual frames for such a compound analysis. Next the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.

Download Full-text

Ensemble Clustering Data Mining and Databases

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch041 ◽

2019 ◽

pp. 563-576

Author(s):

Slawomir T. Wierzchon

Keyword(s):

Data Mining ◽

Data Structure ◽

Em Algorithm ◽

Normal Distribution ◽

Clustering Algorithms ◽

Consensus Clustering ◽

New Directions ◽

Consensus Procedure ◽

Basic Approaches ◽

Clustering Data

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution, so-called EM algorithm can be applied. But in practice, the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering does this task. In this chapter, the authors clarify the idea of consensus clustering, and they present conceptual frames for such a compound analysis. Next, the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.

Download Full-text

Analysis of accountants' attitudes on regulation using data mining

Ekonomika preduzeca ◽

10.5937/ekopre2006341m ◽

2020 ◽

Vol 68 (5-6) ◽

pp. 341-353

Author(s):

Sunčica Milutinović ◽

Olivera Grljević

Keyword(s):

Data Mining ◽

Clustering Algorithms ◽

Data Mining Technique ◽

Homogeneous Groups ◽

Current Accounting ◽

Research Questions ◽

The Republic ◽

Using Data ◽

Clustering Data ◽

Sector Data

The subject of this research is evaluation of legal and international accounting regulations in terms of major deficiencies from the perspective of their users. Identification of the shortcomings of the current accounting regulations is important for improvement of laws governing accounting and audit practices, as they are in the process of public debate in the Republic of Serbia. The research was conducted to provide answers to the following research questions: What are the main deficiencies of regulations as considered by accountants? How do accountants get information on accounting regulations? And are the two research questions related? The targeted population comprises accountants and auditors employed in the private sector. Data collection was carried throughout the period of six months, during which we collected 338 fully completed questionnaires for the purposes of the study. Collected data was analysed using clustering data mining technique. Clustering algorithms enabled segmentation of surveyed accountants into well-separated and homogeneous groups of similar accountants. Analysis of the resulting clusters gave insights into the opinion and stance of accountants who exhibit similar characteristics. These insights form a solid basis for drawing conclusions on deficiencies of accounting regulations perceived by accountants in Serbia, which they are dealing with in day-to-day business.

Download Full-text

Integrated Algorithm for Unsupervised Data Clustering Problems in Data Mining

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.54.5.40 ◽

2019 ◽

Vol 54 (5) ◽

Author(s):

Nibras Othman Abdul Wahid ◽

Saif Aamer Fadhil ◽

Noor Abbood Jasim

Keyword(s):

Data Mining ◽

Data Clustering ◽

Clustering Algorithms ◽

Genetic Operators ◽

Way Of Life ◽

Fundamental Parameters ◽

Benchmark Datasets ◽

Key Issues ◽

Clustering Data ◽

Lion Optimization Algorithm

Unsupervised data clustering investigation is a standout amongst the most valuable apparatuses and an enlightening undertaking in data mining that looks to characterize homogeneous gatherings of articles depending on likeness and is utilized in numerous applications. One of the key issues in data mining is clustering data that have pulled in much consideration. One of the famous clustering algorithms is K-means clustering that has been effectively connected to numerous issues. Scientists recommended enhancing the nature of K-means, optimization algorithms were hybridized. In this paper, a heuristic calculation, Lion Optimization Algorithm (LOA), and Genetic Algorithm (GA) were adjusted for K-Means data clustering by altering the fundamental parameters of LOA calculation, which is propelled from the characteristic enlivened calculations. The uncommon way of life of lions and their participation attributes has been the essential inspiration for the advancement of this improvement calculation. The GA is utilized when it is required to reallocate the clusters using the genetic operators, crossover, and mutation. The outcomes of the examination of this calculation mirror the capacity of this methodology in clustering examination on the number of benchmark datasets from UCI Machine Learning Repository.

Download Full-text

Penerapan Data Mining Untuk Prediksi Penjualan Mobil Menggunakan Metode K-Means Clustering

Jurnal Nasional Komputasi dan Teknologi Informasi (JNKTI) ◽

10.32672/jnkti.v3i3.2428 ◽

2020 ◽

Vol 3 (3) ◽

pp. 187-201

Author(s):

Sufajar Butsianto ◽

Nindi Tya Mayangwulan

Keyword(s):

Data Mining ◽

Clustering Data ◽

Cluster 2

Penggunaan mobil di Indonesia setiap tahunnya selalu meningkat dan membuat perusahaan otomotif berlomba-lomba dalam peningkatan penjualannya. Tujuan dari penelitian ini untuk mengelompokan data penjualan kedalam sebuah cluster dengan metode Data Mining Algoritma K-Means Clustering. Data Penjualan nantinya akan dikelompokan berdasarkan kemiripan data tersebut sehingga data dengan karakteristik yang sama akan berada dalam satu cluster. Atribut yang digunakan adalah brand dan penjualan. Cluster yang terbentuk setelah dilakukan proses K-Means Clustering terbagi menjadi tiga cluster yaitu Cluster 0 jumlah anggota 235 dengan presentase 26% dikategorikan Laris, Cluster 1 jumlah anggota 604 dengan presentase 67% dikategorikan Kurang Laris, dan Cluster 2 jumlah angota 61 dengan presentase 7% dikategorikan Paling Laris, dari proses clustering diatas dapat diperoleh validasi DBI (Davies Bouldin Index) dengan nilai 0,341

Download Full-text

K-MEANS CLUSTERING ALGORITHM FOR SERVICE DATA ANALYSIS BASED ON CUSTOMERS COMBINATION

Unes journal of Information System ◽

10.31933/ujis.3.1.001-007.2018 ◽

2018 ◽

Vol 3 (1) ◽

pp. 001

Author(s):

Zulhendra Zulhendra ◽

Gunadi Widi Nurcahyo ◽

Julius Santony

Keyword(s):

Data Mining ◽

Data Analysis ◽

Clustering Algorithm ◽

Customer Complaints ◽

Using Data ◽

Clustering Data ◽

Service Data ◽

Selection Of

In this study using Data Mining, namely K-Means Clustering. Data Mining can be used in searching for a large enough data analysis that aims to enable Indocomputer to know and classify service data based on customer complaints using Weka Software. In this study using the algorithm K-Means Clustering to predict or classify complaints about hardware damage on Payakumbuh Indocomputer. And can find out the data of Laptop brands most do service on Indocomputer Payakumbuh as one of the recommendations to consumers for the selection of Laptops.

Download Full-text

IMPLEMENTASI DATA MINING UNTUK MENENTUKAN TINGKAT PENJUALAN PAKET DATA TELKOMSEL MENGGUNAKAN METODE K-MEANS CLUSTERING

Jurnal Ilmiah Teknologi dan Rekayasa ◽

10.35760/tr.2020.25i1.2677 ◽

2020 ◽

Vol 25 (1) ◽

pp. 76-88

Author(s):

Suhandio Handoko ◽

Fauziah Fauziah ◽

Endah Tri Esti Handayani

Keyword(s):

Data Mining ◽

Clustering Data

Perkembangan industri telekomunikasi saat ini sangat pesat karena telekomunikasi sudah menjadi kebutuhan utama bagi masyarakat sehingga banyak perusahaan yang bergerak di industry telekomunikasi. Banyaknya industry Telekomunikasi menuntut para pengembang untuk menemukan strategi atau suatu pola yang dapat meningkatkan penjualan dan pemasaran produk, salah satu strateginya adalah dengan memanfaatkan data transaksi. Paket data merupakan produk dibidang telekomunikasi. Proses Clustering saat ini masih di lakukan secara manual sehingga membutuhkan waktu, proses perhitungan dan ketelitian yang tinggi. Pada penelitian ini dibuat aplikasi berbasis website dengan tujuan untuk mempermudah Clustering data sehingga dapat digunakan sebagai referensi dalam perencanaan promosi produk telkomsel ke berbagai daerah. Metode yang digunakan untuk mengatasi permasalahan tersebut yaitu metode Clustering dengan menggunakan Algoritma K-Means. Algoritma K-Means merupakan algoritma pengelompokkan sejumlah data menjadi menjadi kelompok-kelompok data tertentu. Pada penelitian ini data penjualan dikelompokkan menjadi 3 yaitu data penjualan rendah, data penjualan sedang dan data penjualan tinggi. Pengujian clustering dengan algoritma K-Means pada aplikasi terhadap data transaksi penjualan paket telkomsel diperoleh persentase kesesuaian yaitu 100% dibandingkan dengan clustering manual.

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

EM algorithm using overparameterization for the multivariate skew-normal distribution

Econometrics and Statistics ◽

10.1016/j.ecosta.2021.03.003 ◽

2021 ◽

Author(s):

Toshihiro Abe ◽

Hironori Fujisawa ◽

Takayuki Kawashima ◽

Christophe Ley

Keyword(s):

Em Algorithm ◽

Normal Distribution ◽

Skew Normal Distribution ◽

Skew Normal

Download Full-text

An Efficient Clustering Approach for Automatic Detection of Calcification in Low Dose Chest CT

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195231 ◽

2019 ◽

pp. 163-168

Author(s):

P. Tamijiselvy ◽

N. Kavitha ◽

K. M. Keerthana ◽

D. Menakha

Keyword(s):

Data Mining ◽

Low Dose ◽

Early Stage ◽

Clustering Algorithms ◽

Automatic Detection ◽

Chest Ct ◽

Data Mining Algorithm ◽

Fuzzy C Means ◽

Data Mining Algorithms ◽

Using Data

The degree of aortic calcification has been appeared to be a risk pointer for vascular occasions including cardiovascular events. The created strategy is fully automated data mining algorithm to segment and measure calcification using Low-dose Chest CT in smokers of age 50 to 70 .The identification of subjects with increased cardiovascular risk can be detected by using data mining algorithms. This paper presents a method for automatic detection of coronary artery calcifications in low-dose chest CT scans using effective clustering algorithms with three phases as Pre-Processing, Segmentation and clustering. Fuzzy C Means algorithm provides accuracy of 80.23% demonstrate that Fuzzy C means detects the Cardio Vascular Disease at early stage.

Download Full-text

Clustering Fasilitas Kesehatan Berdasarkan Kecamatan Di Karawang Dengan Algoritma K-Means

BINA INSANI ICT JOURNAL ◽

10.51211/biict.v8i1.1488 ◽

2021 ◽

Vol 8 (1) ◽

pp. 83

Author(s):

Bagus Muhammad Islami ◽

Cepy Sukmayadi ◽

Tesa Nur Padilah

Keyword(s):

Data Mining ◽

Developing Countries ◽

Data Processing ◽

Health Problems ◽

Health Facilities ◽

Microsoft Excel ◽

Two Factors ◽

Clustering Data ◽

Index Value ◽

Cluster 2

Abstrak: Masalah kesehatan yang ada di dalam masyarakat terutama di negara- negara berkembang seperti Indonesia dipengaruhi oleh dua faktor yaitu aspek fisik dan aspek non fisik. Berdasarkan data yang diperoleh dari karawangkab.bps.go.id data dibagi menjadi 3 cluster yaitu sedikit, sedang dan terbanyak. Algoritma yang digunakan adalah K-Means cluster yang diimplementsikan menggunakan Microsoft Excel dan Rapidminer Studio. Hasil pengolahan data fasilitas kesehatan di karawang menghasilkan 3 cluster dengan cluster 1 yang mempunyai fasilitas kesehatan sedikit sebanyak 23 kecamatan, cluster 2 yang mempunyai fasilitas kesehatan sedang sebanyak 5 kecamatan dan cluster 3 yang mempunyai fasilitas kesehatan terbanyak terdapat 2 kecamatan. Kinerja yang dihasilkan dari algoritma K-means menghasilkan nilai Davies Boildin Index sebesar 0,109. Kata kunci: clustering, data mining, fasilitas kesehatan, K-Means. Abstract: Health problems that exist in society, especially in developing countries like Indonesia, are built by two factors, namely physical and non-physical aspects. Based on data obtained from karawangkab.bps.go.id the data is divided into 3 clusters, namely the least, medium and the most. The algorithm used is the K-Means cluster which is implemented using Microsoft Excel and Rapidminer Studio. The results of data processing of health facilities in Karawang produce 3 clusters with cluster 1 which has 23 sub-districts of health facilities, cluster 2 which has medium health facilities as many as 5 districts and cluster 3 which has the most health facilities in 2 districts. The performance resulting from the K-means algorithm results in a Davies Boildin Index value of 0.109. Keywords: clustering, data mining, health facilities, K-Means.

Download Full-text