scholarly journals CLG clustering for dropout prediction using log-data clustering method

Author(s):  
Agung Triayudi ◽  
Wahyu Oktri Widyarto ◽  
Lia Kamelia ◽  
Iksal Iksal ◽  
Sumiati Sumiati

<span lang="EN-US">Implementation of data mining, machine learning, and statistical data from educational department commonly known as educational data mining. Most of school systems require a teacher to teach a number of students at one time. Exam are regularly being use as a method to measure student’s achievement, which is difficult to understand because examination cannot be done easily. The other hand, programming classes makes source code editing and UNIX commands able to easily detect and store automatically as log-data. Hence, rather that estimating the performance of those student based on this log-data, this study being more focused on detecting them who experienced a difficulty or unable to take programming classes. We propose CLG clustering methods that can predict a risk of being dropped out from school using cluster data for outlier detection.</span>

2018 ◽  
Vol 6 (2) ◽  
Author(s):  
Elly Muningsih - AMIK BSI Yogyakarta

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.


Author(s):  
Wilhelmiina Hämäläinen ◽  
Ville Kumpulainen ◽  
Maxim Mozgovoy

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.


Author(s):  
Yasunori Endo ◽  
◽  
Tomoyuki Suzuki ◽  
Naohiko Kinoshita ◽  
Yukihiro Hamasuna ◽  
...  

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.


2014 ◽  
Vol 687-691 ◽  
pp. 1500-1503
Author(s):  
Yong Lin Leng

With the development of information technology and data collection capabilities improve, the amount of data accumulated increase, missing data problems are more and more obvious. Traditional clustering methods can not cluster data set which contained missing data directly. In this paper, we proposed a novel missing data measurement method based on the incomplete information system theory and designed the similarity measure criterion for the discrete and successive of attributes separately. The experiment uses K-means clustering to test algorithm accuracy from different missing data rate and different amount of data two aspects, results demonstrate that the method can cluster missing data set efficiently and accurately.


Author(s):  
Siti Aisyah Mohamed ◽  
Muhaini Othman ◽  
Mohd Hafizul Afifi

The evolution of Artificial Neural Network recently gives researchers an interest to explore deep learning evolved by Spiking Neural Network clustering methods. Spiking Neural Network (SNN) models captured neuronal behaviour more precisely than a traditional neural network as it contains the theory of time into their functioning model [1]. The aim of this paper is to reviewed studies that are related to clustering problems employing Spiking Neural Networks models. Even though there are many algorithms used to solve clustering problems, most of the methods are only suitable for static data and fixed windows of time series. Hence, there is a need to analyse complex data type, the potential for improvement is encouraged. Therefore, this paper summarized the significant result obtains by implying SNN models in different clustering approach. Thus, the findings of this paper could demonstrate the purpose of clustering method using SNN for the fellow researchers from various disciplines to discover and understand complex data.


2020 ◽  
Vol 1 (4) ◽  
pp. 1-6
Author(s):  
Arjun Dutta

This paper deals with concise study on clustering: existing methods and developments made at various times. Clustering is defined as an unsupervised learning where the targets are sorted out on the foundation of some similarity inherent among them. In the recent times, we dispense with large masses of data including images, video, social text, DNA, gene information, etc. Data clustering analysis has come out as an efficient technique to accurately achieve the task of categorizing information into sensible groups. Clustering has a deep association with researches in several scientific fields. k-means algorithm was suggested in 1957. K-mean is the most popular partitional clustering method till date. In many commercial and non-commercial fields, clustering techniques are used. The applications of clustering in some areas like image segmentation, object and role recognition and data mining are highlighted. In this paper, we have presented a brief description of the surviving types of clustering approaches followed by a survey of the areas.


2016 ◽  
pp. 519-542
Author(s):  
Wilhelmiina Hämäläinen ◽  
Ville Kumpulainen ◽  
Maxim Mozgovoy

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.


Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-22 ◽  
Author(s):  
Antonio Hernández-Blanco ◽  
Boris Herrera-Flores ◽  
David Tomás ◽  
Borja Navarro-Colorado

Educational Data Mining (EDM) is a research field that focuses on the application of data mining, machine learning, and statistical methods to detect patterns in large collections of educational data. Different machine learning techniques have been applied in this field over the years, but it has been recently that Deep Learning has gained increasing attention in the educational domain. Deep Learning is a machine learning method based on neural network architectures with multiple layers of processing units, which has been successfully applied to a broad set of problems in the areas of image recognition and natural language processing. This paper surveys the research carried out in Deep Learning techniques applied to EDM, from its origins to the present day. The main goals of this study are to identify the EDM tasks that have benefited from Deep Learning and those that are pending to be explored, to describe the main datasets used, to provide an overview of the key concepts, main architectures, and configurations of Deep Learning and its applications to EDM, and to discuss current state-of-the-art and future directions on this area of research.


2018 ◽  
Vol 8 (2) ◽  
pp. 154
Author(s):  
Rizal Tjut Adek ◽  
Miftahul Jannah

Pencarian kemiripan judul tugas akhir berdasarkan tema pada jurusan teknik informatika menggunakan metode single linkage hierarchical adalah suatu metode untuk mengetahui kemiripan atau kedekatan abstrak dan judul tugas akhir antara input yang dimasukkan admin dengan abstrak dan judul tugas akhir yang sudah dikerjakan atau sudah ada pada jurusan teknik informatika Universitas Malikusssaleh dengan teknik clustering. Pada data - data abstrak tugas akhir yang sudah dikerjakan atau sudah ada dilakukan proses clustering dengan menggunakan Single Linkage Hierarchical Method (SLHM) sampai terbentuk enam buah cluster sesuai dengan bidang yang ada pada jurusan teknik informatika. Kemudian input yang sudah ada melewati proses text mining dengan enam cluster yang terbentuk. Selanjutnya dilakukan proses pencocokan antara data uji atau data baru dengan data yang sudah ada dengan anggota - angggota dari cluster. Data - data yang digunakan untuk membentuk data clustering adalah data abstrak tugas akhir teknik informatika Universitas Malikussaleh tahun 2010 - 2015, sedangkan abstrak yang diinputkan merupakan abstrak baru untuk mengetahui asbtrak tersebut termasuk kedalam kategori mana berdasarkan clustering yang sudah ada didalam database. Hasil dari percobaan 60 data uji abstrak persentase keberhasilan kecocokan pada kategori multimedia sebesar 100%, kategori pemograman sebesar 100%, kategori pengolahan citra sebesar 87,5%, kategori pengenalan pola sebesar 11,11%, sedangkan pada kategori jaringan dan data mining tidak ditemukan kecocakan. Dan pada halaman user, hasil berupa judul tugas akhir yang ada pada database berdasarkan dengan tema judul tugas akhir yang diinputkan oleh user.


2019 ◽  
Vol 04 (01) ◽  
pp. 1850017 ◽  
Author(s):  
Weiru Chen ◽  
Jared Oliverio ◽  
Jin Ho Kim ◽  
Jiayue Shen

Big Data is a popular cutting-edge technology nowadays. Techniques and algorithms are expanding in different areas including engineering, biomedical, and business. Due to the high-volume and complexity of Big Data, it is necessary to conduct data pre-processing methods when data mining. The pre-processing methods include data cleaning, data integration, data reduction, and data transformation. Data clustering is the most important step of data reduction. With data clustering, mining on the reduced data set should be more efficient yet produce quality analytical results. This paper presents the different data clustering methods and related algorithms for data mining with Big Data. Data clustering can increase the efficiency and accuracy of data mining.


Sign in / Sign up

Export Citation Format

Share Document