Multi-Attribute Utility Theory Based K-Means Clustering Applications

2017 ◽  
Vol 13 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Jungmok Ma

One of major obstacles in the application of the k-means clustering algorithm is the selection of the number of clusters k. The multi-attribute utility theory (MAUT)-based k-means clustering algorithm is proposed to tackle the problem by incorporating user preferences. Using MAUT, the decision maker's value structure for the number of clusters and other attributes can be quantitatively modeled, and it can be used as an objective function of the k-means. A target clustering problem for military targeting process is used to demonstrate the MAUT-based k-means and provide a comparative study. The result shows that the existing clustering algorithms do not necessarily reflect user preferences while the MAUT-based k-means provides a systematic framework of preferences modeling in cluster analysis.

2006 ◽  
Vol 16 (07) ◽  
pp. 2073-2079 ◽  
Author(s):  
D. K. TASOULIS ◽  
M. N. VRAHATIS

Clustering can be defined as the process of "grouping" a collection of objects into subsets or clusters. The clustering problem has been addressed in numerous contexts and by researchers in different disciplines. This reflects its broad appeal and usefulness as an exploratory data analysis approach. Unsupervised clustering algorithms have been developed to address real world problems in which the number of clusters present in the dataset is unknown. These algorithms approximate the number of clusters while performing the clustering procedure. This paper is a first step towards the development of unsupervised clustering algorithms capable of identifying clusters within clusters. To this end, an unsupervised clustering algorithm is modified so as to take into consideration the fractal dimension of the data. The experimental results indicate that this approach can provide further qualitative information compared to the unsupervised clustering algorithm.


1990 ◽  
Vol 29 (03) ◽  
pp. 200-204 ◽  
Author(s):  
J. A. Koziol

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


Author(s):  
Muhammad L O Mardin ◽  
Achamad Fuad ◽  
Hairil K Sirajuddin

Abstrak: Banyaknya pilihan rumah seringkali membuat calon pembeli merasa ragu atau kesulitan saat harus menentukan langsung rumah yang mana yang akan dibeli, karena pada pemilihan perumahan yang akan dibeli belum ada sistem yang akan membantu dalam memilih perumahan yang dibeli, sehingga pada proses pemilihan masih menggunakan pikiran saja dan belum ada perhitungan pada saat pemilihan perumahan yang akan di beli. Tujuan penelitian ini menghasilkan sebuah sistem pendukung keputusan pemilihan perumahan. Kriteria yang diajukan dalam proses pemilihan perumahan yaitu: Harga perumahan, Jarak dari pusat kota, Jarak dengan pasar terdekat, [1], tipe perumahan, jarak dengan jalan umum, jarak dengan lahar. Dari hasil pemilihan perumahan menggunakan sistem yang telah dibuat. dengan 10 alternatif, dengan tingkat kepentingan masing-masing kriteria yang digunakan yaitu: harga = 5, tipe rumah = 5, jarak dengan pusat kota = 2, jarak dengan pasar terdekat = 2, jarak dengan jalan umum = 4, jarak perumahan dengan lahar = 5, telah diperoleh alternatif yang akan direkomendasikan yaitu perumahan safira residen 70 dengan dengan nilai tertinggi 0,65.Kata kunci: Sistem Pendukung Keputusan, Pemilihan, Perumahan, Multi Attribute Utility TheoryAbstract: A large number of choices of houses often makes prospective buyers feel doubtful or difficult when they have to determine directly which house to buy because, in the selection of housing to be purchased, no system will assist in choosing the housing to be purchased so that in the selection process, you still use your mind. There is no calculation at the time of the selection of housing to be purchased. The purpose of this research is to produce a housing selection decision support system. The criteria proposed in the housing selection process are housing prices, distance from the city, distance to the nearest market, [1], type of housing, distance to public roads, distance to lava. From the results of the election using the system that has been created. With ten alternatives, with their respective interests. The criteria used are: price =5, type of house = 5, distance to city center = 2, distance to the nearest market = 2, distance to public roads = 4 distance from housing to lava = 5, has obtained an alternative that will be recommended, namely the residential sapphire housing 70 with the highest value of 0.65Keywords: Housing, Selection, Decision Support System, Multi-Attribute Utility Theory.


2019 ◽  
Vol 4 (1) ◽  
pp. 64-67
Author(s):  
Pavel Kim

One of the fundamental tasks of cluster analysis is the partitioning of multidimensional data samples into groups of clusters – objects, which are closed in the sense of some given measure of similarity. In a some of problems, the number of clusters is set a priori, but more often it is required to determine them in the course of solving clustering. With a large number of clusters, especially if the data is “noisy,” the task becomes difficult for analyzing by experts, so it is artificially reduces the number of consideration clusters. The formal means of merging the “neighboring” clusters are considered, creating the basis for parameterizing the number of significant clusters in the “natural” clustering model [1].


Author(s):  
Junjie Wu ◽  
Jian Chen ◽  
Hui Xiong

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.


Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Ze Dong ◽  
Hao Jia ◽  
Miao Liu

This paper presents a fuzzy clustering method based on multiobjective genetic algorithm. The ADNSGA2-FCM algorithm was developed to solve the clustering problem by combining the fuzzy clustering algorithm (FCM) with the multiobjective genetic algorithm (NSGA-II) and introducing an adaptive mechanism. The algorithm does not need to give the number of clusters in advance. After the number of initial clusters and the center coordinates are given randomly, the optimal solution set is found by the multiobjective evolutionary algorithm. After determining the optimal number of clusters by majority vote method, the Jm value is continuously optimized through the combination of Canonical Genetic Algorithm and FCM, and finally the best clustering result is obtained. By using standard UCI dataset verification and comparing with existing single-objective and multiobjective clustering algorithms, the effectiveness of this method is proved.


Sign in / Sign up

Export Citation Format

Share Document