CLG clustering for dropout prediction using log-data clustering method

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.

Download Full-text

Evaluation of Clustering Methods for Adaptive Learning Systems

Artificial Intelligence Applications in Distance Education - Advances in Mobile and Distance Learning ◽

10.4018/978-1-4666-6276-6.ch014 ◽

2015 ◽

pp. 237-260 ◽

Cited By ~ 1

Author(s):

Wilhelmiina Hämäläinen ◽

Ville Kumpulainen ◽

Maxim Mozgovoy

Keyword(s):

Data Mining ◽

Adaptive Learning ◽

Clustering Algorithms ◽

Educational Data Mining ◽

Optimal Choice ◽

Learning Systems ◽

Learning Tools ◽

Clustering Methods ◽

Central Task ◽

Adaptive Learning Systems

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.

Download Full-text

On Fuzzy Non-Metric Model for Data with Tolerance and its Application to Incomplete Data Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0571 ◽

2016 ◽

Vol 20 (4) ◽

pp. 571-579 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Tomoyuki Suzuki ◽

Naohiko Kinoshita ◽

Yukihiro Hamasuna ◽

...

Keyword(s):

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Uncertain Data ◽

Data Sets ◽

Membership Degree ◽

Clustering Methods ◽

Clustering Method ◽

Numerical Examples ◽

Metric Model

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.

Download Full-text

Missing Data Clustering Based on Incomplete Information System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1500 ◽

2014 ◽

Vol 687-691 ◽

pp. 1500-1503

Author(s):

Yong Lin Leng

Keyword(s):

Information System ◽

Missing Data ◽

Incomplete Information ◽

Data Clustering ◽

Clustering Methods ◽

Data Set ◽

Incomplete Information System ◽

Data Measurement ◽

Cluster Data ◽

Test Algorithm

With the development of information technology and data collection capabilities improve, the amount of data accumulated increase, missing data problems are more and more obvious. Traditional clustering methods can not cluster data set which contained missing data directly. In this paper, we proposed a novel missing data measurement method based on the incomplete information system theory and designed the similarity measure criterion for the discrete and successive of attributes separately. The experiment uses K-means clustering to test algorithm accuracy from different missing data rate and different amount of data two aspects, results demonstrate that the method can cluster missing data set efficiently and accurately.

Download Full-text

A review on data clustering using spiking neural network (SNN) models

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v15.i3.pp1392-1400 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1392

Author(s):

Siti Aisyah Mohamed ◽

Muhaini Othman ◽

Mohd Hafizul Afifi

Keyword(s):

Neural Network ◽

Data Clustering ◽

Network Clustering ◽

Complex Data ◽

Spiking Neural Network ◽

Clustering Methods ◽

Clustering Method ◽

Static Data ◽

Clustering Approach ◽

Clustering Problems

The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.

Download Full-text

Clustering Techniques and Their Applications: A Review

American Journal of Advanced Computing ◽

10.15864/ajac.1404 ◽

2020 ◽

Vol 1 (4) ◽

pp. 1-6

Author(s):

Arjun Dutta

Keyword(s):

Data Mining ◽

Image Segmentation ◽

Unsupervised Learning ◽

Clustering Analysis ◽

Data Clustering ◽

Clustering Method ◽

Clustering Techniques ◽

Partitional Clustering ◽

Gene Information ◽

Scientific Fields

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.

Download Full-text

Evaluation of Clustering Methods for Adaptive Learning Systems

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch027 ◽

2016 ◽

pp. 519-542

Author(s):

Wilhelmiina Hämäläinen ◽

Ville Kumpulainen ◽

Maxim Mozgovoy

Keyword(s):

Data Mining ◽

Adaptive Learning ◽

Clustering Algorithms ◽

Educational Data Mining ◽

Optimal Choice ◽

Learning Systems ◽

Learning Tools ◽

Clustering Methods ◽

Central Task ◽

Adaptive Learning Systems

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.

Download Full-text

A Systematic Review of Deep Learning Approaches to Educational Data Mining

Complexity ◽

10.1155/2019/1306039 ◽

2019 ◽

Vol 2019 ◽

pp. 1-22 ◽

Cited By ~ 15

Author(s):

Antonio Hernández-Blanco ◽

Boris Herrera-Flores ◽

David Tomás ◽

Borja Navarro-Colorado

Keyword(s):

Machine Learning ◽

Data Mining ◽

Deep Learning ◽

Language Processing ◽

Educational Data Mining ◽

Research Field ◽

Machine Learning Techniques ◽

Mining Machine ◽

Learning Approaches ◽

Learning Techniques

Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research.

Download Full-text

Pencarian Kemiripan Judul Tugas Akhir Mahasiswa Dengan Menggunakan Metode Single Linkage Hierarchical

Jurnal SAINTEKOM ◽

10.33020/saintekom.v8i2.69 ◽

2018 ◽

Vol 8 (2) ◽

pp. 154

Author(s):

Rizal Tjut Adek ◽

Miftahul Jannah

Keyword(s):

Data Mining ◽

Text Mining ◽

Data Clustering ◽

Single Linkage ◽

Hierarchical Method ◽

Cluster Data

Pencarian kemiripan judul tugas akhir berdasarkan tema pada jurusan teknik informatika menggunakan metode single linkage hierarchical adalah suatu metode untuk mengetahui kemiripan atau kedekatan abstrak dan judul tugas akhir antara input yang dimasukkan admin dengan abstrak dan judul tugas akhir yang sudah dikerjakan atau sudah ada pada jurusan teknik informatika Universitas Malikusssaleh dengan teknik clustering. Pada data - data abstrak tugas akhir yang sudah dikerjakan atau sudah ada dilakukan proses clustering dengan menggunakan Single Linkage Hierarchical Method (SLHM) sampai terbentuk enam buah cluster sesuai dengan bidang yang ada pada jurusan teknik informatika. Kemudian input yang sudah ada melewati proses text mining dengan enam cluster yang terbentuk. Selanjutnya dilakukan proses pencocokan antara data uji atau data baru dengan data yang sudah ada dengan anggota - angggota dari cluster. Data - data yang digunakan untuk membentuk data clustering adalah data abstrak tugas akhir teknik informatika Universitas Malikussaleh tahun 2010 - 2015, sedangkan abstrak yang diinputkan merupakan abstrak baru untuk mengetahui asbtrak tersebut termasuk kedalam kategori mana berdasarkan clustering yang sudah ada didalam database. Hasil dari percobaan 60 data uji abstrak persentase keberhasilan kecocokan pada kategori multimedia sebesar 100%, kategori pemograman sebesar 100%, kategori pengolahan citra sebesar 87,5%, kategori pengenalan pola sebesar 11,11%, sedangkan pada kategori jaringan dan data mining tidak ditemukan kecocakan. Dan pada halaman user, hasil berupa judul tugas akhir yang ada pada database berdasarkan dengan tema judul tugas akhir yang diinputkan oleh user.

Download Full-text

The Modeling and Simulation of Data Clustering Algorithms in Data Mining with Big Data

Journal of Industrial Integration and Management ◽

10.1142/s2424862218500173 ◽

2019 ◽

Vol 04 (01) ◽

pp. 1850017 ◽

Cited By ~ 3

Author(s):

Weiru Chen ◽

Jared Oliverio ◽

Jin Ho Kim ◽

Jiayue Shen

Keyword(s):

Data Mining ◽

Big Data ◽

Data Reduction ◽

Data Clustering ◽

Clustering Algorithms ◽

High Volume ◽

Clustering Methods ◽

Data Set ◽

Processing Methods ◽

Integration Data

Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.

Download Full-text