Conceptual ClusteringAnalysis  in Data Mining: A Study

Clustering on unsupervised learning handles with instances, which are not classified already and not having class attribute with them. Applying algorithms is to find useful but items on unknown classes. Approach of unsupervised learning is about instances are automatically making into meaningful groups basing on its similarity. This paper we study about the basic clustering methods in data mining on unsupervised learning such as ensembles distributed clustering and its algorithms.

Download Full-text

Klasifikasi pada Tempat Tinggal Menurut Provinsi dan Jenis Kepemilikan Berdasarkan Algoritma K-Means

STRING (Satuan Tulisan Riset dan Inovasi Teknologi) ◽

10.30998/string.v4i3.5932 ◽

2020 ◽

Vol 4 (3) ◽

pp. 247

Author(s):

Dwi Swasono Rachmad

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Residential Buildings ◽

Government Agency ◽

Role Of Government ◽

The Republic ◽

Household Processing ◽

Central Statistics

Housing is derived from the word house which means a place that has a place to live which will stay or stop in a certain time. Housing is a residence that has been grouped into a place that has facilities and infrastructure. The problem in this study focuses on the type of residential ownership in the form of SHM ART, SHM Non ART, NON SHM and others. These four types can be used to know the percentage of ownership in all provinces in Indonesia. Due to the fact that there is still a lot of information about the type of certificate ownership, there is still not much ownership. Therefore, the use of the k-Means algorithm as a data mining concept in the form of clusters, where the data already has parameters or values that fall into the category of unsupervised learning. That data produced the best. The data was obtained from published sources of the Republic of Indonesia government agency, namely the Central Statistics Agency data with the category of household processing with self-owned residential buildings purchased from developers or non-developers by province and type of ownership in 2016 throughout Indonesia. In conducting the dataset, researchers used the RapidMiner application as a clustering process application. This research shows that there are more types of ownership in the SHM ART, but for other values it is still smaller than the value in other types of ownership which is the second largest value. So, in this case, the role of government in providing assistance in the process of ownership in order to become SHM ART is very important.

Download Full-text

DATA MINING FOR THE MANAGEMENT OF SOFTWARE DEVELOPMENT PROCESS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194004001841 ◽

2004 ◽

Vol 14 (06) ◽

pp. 665-695 ◽

Cited By ~ 6

Author(s):

J. L. ÁLVAREZ-MACÍAS ◽

J. MATA-VÁZQUEZ ◽

J. C. RIQUELME-SANTOS

Keyword(s):

Data Mining ◽

Software Development ◽

Unsupervised Learning ◽

Supervised Learning ◽

Development Process ◽

A Priori ◽

Post Mortem ◽

Software Development Process ◽

A Priori Analysis ◽

Mining Tools

In this paper we present a new method for the application of data mining tools on the management phase of software development process. Specifically, we describe two tools, the first one based on supervised learning, and the second one on unsupervised learning. The goal of this method is to induce a set of management rules that make easy the development process to the managers. Depending on how and to what is this method applied, it will permit an a priori analysis, a monitoring of the project or a post-mortem analysis.

Download Full-text

Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data

10.32920/ryerson.14668125.v1 ◽

2021 ◽

Author(s):

Mahdi Shahbaba

Keyword(s):

Objective Function ◽

Unsupervised Learning ◽

Minimum Spanning Tree ◽

Statistical Testing ◽

Adjusted Rand Index ◽

Order Selection ◽

Clustering Methods ◽

Conventional Methods ◽

Anderson Darling ◽

Statistical Testing Method

This thesis focuses on clustering for the purpose of unsupervised learning. One topic of our interest is on estimating the correct number of clusters (CNC). In conventional clustering approaches, such as X-means, G-means, PG-means and Dip-means, estimating the CNC is a preprocessing step prior to finding the centers and clusters. In another word, the first step estimates the CNC and the second step finds the clusters. Each step having different objective function to minimize. Here, we propose minimum averaged central error (MACE)-means clustering and use one objective function to simultaneously estimate the CNC and provide the cluster centers. We have shown superiority of MACEmeans over the conventional methods in term of estimating the CNC with comparable complexity. In addition, on average MACE-means results in better values for adjusted rand index (ARI) and variation of information (VI). Next topic of our interest is order selection step of the conventional methods which is usually a statistical testing method such as Kolmogrov-Smrinov test, Anderson-Darling test, and Hartigan's Dip test. We propose a new statistical test denoted by Sigtest (signature testing). The conventional statistical testing approaches rely on a particular assumption on the probability distribution of each cluster. Sigtest on the other hand can be used with any prior distribution assumption on the clusters. By replacing the statistical testing of the mentioned conventional approaches with Sigtest, we have shown that the clustering methods are improved in terms of having more accurate CNC as well as ARI and VI. Conventional clustering approaches fail in arbitrary shaped clustering. Our last contribution of the thesis is in arbitrary shaped clustering. The proposed method denoted by minimum Pathways is Arbitrary Shaped (minPAS) clustering is proposed based on a unique minimum spanning tree structure of the data. Our simulation results show advantage of minPAS over the state-of-the-art arbitrary shaped clustering methods such as DBSCAN and Affinity Propagation in terms of accuracy, ARI and VI indexes.

Download Full-text

Computer-Aided Teaching System Based on Data Mining

Wireless Communications and Mobile Computing ◽

10.1155/2021/3373535 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yonghua Tang ◽

Qiang Fan ◽

Peng Liu

Keyword(s):

Data Mining ◽

Spectral Clustering ◽

Traditional Teaching ◽

Clustering Methods ◽

Mining System ◽

Teaching Resources ◽

Teaching System ◽

Data Mining Algorithms ◽

Computer Aided ◽

Mining Algorithms

The traditional teaching model cannot adapt to the teaching needs of the era of smart teaching. Based on this, this paper combines data mining technology to carry out teaching reforms, constructs a computer-aided system based on data mining, and constructs teaching system functions based on actual conditions. The constructed system can carry out multisubject teaching. Moreover, this paper uses a data mining system to mine teaching resources and uses spectral clustering methods to integrate multiple teaching resources to improve the practicability of data mining algorithms. In addition, this paper combines digital technology to deal with teaching resources. Finally, after building the system, this paper designs experiments to verify the performance of the system. From the research results, it can be seen that the system constructed in this paper has certain teaching and practical effects, and it can be applied to a larger teaching scope in subsequent research.

Download Full-text

Clustering fMRI data with a robust unsupervised learning algorithm for neuroscience data mining

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2018.02.007 ◽

2018 ◽

Vol 299 ◽

pp. 45-54 ◽

Cited By ~ 5

Author(s):

Hadeel K. Aljobouri ◽

Hussain A. Jaber ◽

Orhan M. Koçak ◽

Oktay Algin ◽

Ilyas Çankaya

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Learning Algorithm ◽

Fmri Data

Download Full-text

PENERAPAN DATA MINING DALAM MENGELOMPOKKAN KUNJUNGAN WISATAWAN DI KOTA YOGYAKARTA MENGGUNAKAN METODE K-MEANS

Journal of Computer Science and Technology (JCS-TECH) ◽

10.54840/jcstech.v1i1.9 ◽

2021 ◽

Vol 1 (1) ◽

pp. 27-32

Author(s):

Bambang Setio ◽

Putri Prasetyaningrum

Keyword(s):

Data Mining ◽

Unsupervised Learning ◽

Data Clustering ◽

Cluster 2

Yogyakarta merupakan salah satu kota di Indonesia yang memiliki daya tarik wisata dan merupakan kota tujuan wisata yang paling diminati oleh wisatawan, dilihat dari jumlah kunjungan wisatawan yang semakin naik dari tahun ke tahun. Selain sebagai kota wisata, Yogyakarta merupakan kota pelajar, kota budaya dan kota perjuangan. Karena Yogyakarta disebut sebagai kota wisata, banyak berbagai macam objek wisata yang ditawarkan oleh Kota Yogyakarta. Dalam hal ini, penerapan datamining mampu menjadi solusi dalam menganalisa data. Clustering termasuk ke dalam descriptive methods, dan juga termasuk unsupervised learning dimana tidak ada pendefinisian kelas objek sebelumnya. Sehingga clustering dapat digunakan untuk menentukan label kelas bagi data-data yang belum diketahui kelasnya. Metode K-Means termasuk dalam partitioning clustering yang memisahkan data ke daerah bagian yang terpisah. Metode K-Means sangat terkenal karena kemudahan dan kemampuannya untuk mengelompokkan data besar dan outlier dengan sangat cepat. dari data yang diinputkandan telah di proses melalui metode algoritma K-Means bahwa telah melakukan iterasi sebanyak 5 kali dengan memilih cluster 1, cluster 2, cluster 3 secara acak (random) dengan cluster 1 memiliki 24 data dengan persentase sebesar (50%), cluster 2 memiliki 11 data dengan persentase sebesar (23%), dan cluster 3 memiliki 13 data dengan persentase sebesar (27%).

Download Full-text

Evaluation of Clustering Methods for Adaptive Learning Systems

Artificial Intelligence Applications in Distance Education - Advances in Mobile and Distance Learning ◽

10.4018/978-1-4666-6276-6.ch014 ◽

2015 ◽

pp. 237-260 ◽

Cited By ~ 1

Author(s):

Wilhelmiina Hämäläinen ◽

Ville Kumpulainen ◽

Maxim Mozgovoy

Keyword(s):

Data Mining ◽

Adaptive Learning ◽

Clustering Algorithms ◽

Educational Data Mining ◽

Optimal Choice ◽

Learning Systems ◽

Learning Tools ◽

Clustering Methods ◽

Central Task ◽

Adaptive Learning Systems

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.

Download Full-text