Analysis of Academic Achievement in Higher-Middle Education in Mexico through Data Clustering Methods

KOMPARASI METODE CLUSTERING K-MEANS DAN K-MEDOIDS DENGAN MODEL FUZZY RFM UNTUK PENGELOMPOKAN PELANGGAN

Evolusi : Jurnal Sains dan Manajemen ◽

10.31294/evolusi.v6i2.4600 ◽

2018 ◽

Vol 6 (2) ◽

Author(s):

Elly Muningsih - AMIK BSI Yogyakarta

Keyword(s):

Data Clustering ◽

Small Data ◽

Clustering Methods ◽

Monetary Model ◽

Clustering Method ◽

Online Sales ◽

Rfm Model ◽

Potential Customers ◽

Cluster 2 ◽

Better Than

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.

Download Full-text

Enhanced Affinity for Spectral Clustering using Topological Node Features (TNFS)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9450.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 974-987

Keyword(s):

Local Structure ◽

Data Clustering ◽

Spectral Clustering ◽

Clustering Coefficient ◽

Complex Data ◽

Clustering Methods ◽

Pairwise Similarity ◽

Synthetic Datasets ◽

Summation Index ◽

Affinity Measure

Data clustering is an active topic of research as it has applications in various fields such as biology, management, statistics, pattern recognition, etc. Spectral Clustering (SC) has gained popularity in recent times due to its ability to handle complex data and ease of implementation. A crucial step in spectral clustering is the construction of the affinity matrix, which is based on a pairwise similarity measure. The varied characteristics of datasets affect the performance of a spectral clustering technique. In this paper, we have proposed an affinity measure based on Topological Node Features (TNFs) viz., Clustering Coefficient (CC) and Summation index (SI) to define the notion of density and local structure. It has been shown that these features improve the performance of SC in clustering the data. The experiments were conducted on synthetic datasets, UCI datasets, and the MNIST handwritten datasets. The results show that the proposed affinity metric outperforms several recent spectral clustering methods in terms of accuracy.

Download Full-text

Data Clustering Algorithms Using Rough Sets

Handbook of Research on Computational Intelligence for Engineering, Science, and Business ◽

10.4018/978-1-4666-2518-1.ch012 ◽

2013 ◽

pp. 297-327 ◽

Cited By ~ 6

Author(s):

B.K. Tripathy ◽

Adhir Ghosh

Keyword(s):

Comparative Study ◽

Rough Set ◽

Fuzzy Clustering ◽

Fuzzy Set ◽

Rough Sets ◽

Data Clustering ◽

Clustering Algorithms ◽

Clustering Methods ◽

Future Studies ◽

Multiple Clusters

Developing Data Clustering algorithms have been pursued by researchers since the introduction of k-means algorithm (Macqueen 1967; Lloyd 1982). These algorithms were subsequently modified to handle categorical data. In order to handle the situations where objects can have memberships in multiple clusters, fuzzy clustering and rough clustering methods were introduced (Lingras et al 2003, 2004a). There are many extensions of these initial algorithms (Lingras et al 2004b; Lingras 2007; Mitra 2004; Peters 2006, 2007). The MMR algorithm (Parmar et al 2007), its extensions (Tripathy et al 2009, 2011a, 2011b) and the MADE algorithm (Herawan et al 2010) use rough set techniques for clustering. In this chapter, the authors focus on rough set based clustering algorithms and provide a comparative study of all the fuzzy set based and rough set based clustering algorithms in terms of their efficiency. They also present problems for future studies in the direction of the topics covered.

Download Full-text

Data Clustering

Web Data Management Practices ◽

10.4018/978-1-59904-228-2.ch001 ◽

2007 ◽

pp. 1-33 ◽

Cited By ~ 4

Author(s):

Dušan Husek ◽

Jaroslav Pokorny ◽

Hana Rezankova ◽

Václav Snasel

Keyword(s):

Information Retrieval ◽

Data Clustering ◽

Important Task ◽

Clustering Methods ◽

Web Documents ◽

Web Communities

Document and information retrieval (IR) is an important task for Web communities. In this chapter, we introduce some clustering methods and focus on their use for the clustering, classification, and retrieval of Web documents.

Download Full-text

Single-cell RNA-seq data clustering: A survey with performance comparison study

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020400053 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2040005

Author(s):

Ruiyi Li ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Single Cell ◽

Data Clustering ◽

Performance Metrics ◽

Clustering Algorithms ◽

Cell Types ◽

Performance Comparison ◽

Cellular Heterogeneity ◽

Clustering Methods ◽

Multiple Perspectives ◽

Underlying Mechanisms

Clustering analysis has been widely applied to single-cell RNA-sequencing (scRNA-seq) data to discover cell types and cell states. Algorithms developed in recent years have greatly helped the understanding of cellular heterogeneity and the underlying mechanisms of biological processes. However, these algorithms often use different techniques, were evaluated on different datasets and compared with some of their counterparts usually using different performance metrics. Consequently, there lacks an accurate and complete picture of their merits and demerits, which makes it difficult for users to select proper algorithms for analyzing their data. To fill this gap, we first do a review on the major existing scRNA-seq data clustering methods, and then conduct a comprehensive performance comparison among them from multiple perspectives. We consider 13 state of the art scRNA-seq data clustering algorithms, and collect 12 publicly available real scRNA-seq datasets from the existing works to evaluate and compare these algorithms. Our comparative study shows that the existing methods are very diverse in performance. Even the top-performance algorithms do not perform well on all datasets, especially those with complex structures. This suggests that further research is required to explore more stable, accurate, and efficient clustering algorithms for scRNA-seq data.

Download Full-text

Evolutionary Algorithms for Robust Density-Based Data Clustering

ISRN Computational Mathematics ◽

10.1155/2013/931019 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Amit Banerjee

Keyword(s):

Evolutionary Algorithms ◽

Evolutionary Computation ◽

Data Clustering ◽

Relational Data ◽

Clustering Methods ◽

Density Based Clustering ◽

Selection Of

Density-based clustering methods are known to be robust against outliers in data; however, they are sensitive to user-specified parameters, the selection of which is not trivial. Moreover, relational data clustering is an area that has received considerably less attention than object data clustering. In this paper, two approaches to robust density-based clustering for relational data using evolutionary computation are investigated.

Download Full-text

Performance Comparison of Social Spider Optimization for Data Clustering with Other Clustering Methods

2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iccons.2018.8662994 ◽

2018 ◽

Author(s):

T. Ravi Chandran ◽

A. V. Reddy ◽

B. Janet

Keyword(s):

Data Clustering ◽

Performance Comparison ◽

Clustering Methods ◽

Social Spider ◽

Social Spider Optimization

Download Full-text

A Comparison of Categorical Attribute Data Clustering Methods

Lecture Notes in Computer Science - Structural, Syntactic, and Statistical Pattern Recognition ◽

10.1007/978-3-662-44415-3_6 ◽

2014 ◽

pp. 53-62 ◽

Cited By ~ 3

Author(s):

Ville Hautamäki ◽

Antti Pöllänen ◽

Tomi Kinnunen ◽

Kong Aik Lee ◽

Haizhou Li ◽

...

Keyword(s):

Data Clustering ◽

Clustering Methods ◽

Attribute Data ◽

Categorical Attribute

Download Full-text

On Fuzzy Non-Metric Model for Data with Tolerance and its Application to Incomplete Data Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0571 ◽

2016 ◽

Vol 20 (4) ◽

pp. 571-579 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Tomoyuki Suzuki ◽

Naohiko Kinoshita ◽

Yukihiro Hamasuna ◽

...

Keyword(s):

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Uncertain Data ◽

Data Sets ◽

Membership Degree ◽

Clustering Methods ◽

Clustering Method ◽

Numerical Examples ◽

Metric Model

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.

Download Full-text

Missing Data Clustering Based on Incomplete Information System

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1500 ◽

2014 ◽

Vol 687-691 ◽

pp. 1500-1503

Author(s):

Yong Lin Leng

Keyword(s):

Information System ◽

Missing Data ◽

Incomplete Information ◽

Data Clustering ◽

Clustering Methods ◽

Data Set ◽

Incomplete Information System ◽

Data Measurement ◽

Cluster Data ◽

Test Algorithm

With the development of information technology and data collection capabilities improve, the amount of data accumulated increase, missing data problems are more and more obvious. Traditional clustering methods can not cluster data set which contained missing data directly. In this paper, we proposed a novel missing data measurement method based on the incomplete information system theory and designed the similarity measure criterion for the discrete and successive of attributes separately. The experiment uses K-means clustering to test algorithm accuracy from different missing data rate and different amount of data two aspects, results demonstrate that the method can cluster missing data set efficiently and accurately.

Download Full-text