A Novel Locality Sensitive K-Means Clustering Algorithm based on Core Clusters

2013 ◽  
Vol 321-324 ◽  
pp. 1939-1942
Author(s):  
Lei Gu

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.

2011 ◽  
Vol 121-126 ◽  
pp. 4675-4679
Author(s):  
Ming Wei Leng ◽  
Xiao Yun Chen ◽  
Jian Jun Cheng ◽  
Long Jie Li

In many data mining domains, labeled data is very expensive to generate, how to make the best use of labeled data to guide the process of unlabeled clustering is the core problem of semi-supervised clustering. Most of semi-supervised clustering algorithms require a certain amount of labeled data and need set the values of some parameters, different values maybe have different results. In view of this, a new algorithm, called semi-supervised clustering algorithm based on small size of labeled data, is presented, which can use the small size of labeled data to expand labeled dataset by labeling their k-nearest neighbors and only one parameter. We demonstrate our clustering algorithm with three UCI datasets, compared with SSDBSCAN[4] and KNN, the experimental results confirm that accuracy of our clustering algorithm is close to that of KNN classification algorithm.


Author(s):  
Zhang Xiaodan ◽  
Hu Xiaohua ◽  
Xia Jiali ◽  
Zhou Xiaohua ◽  
Achananuparp Palakorn

In this article, we present a graph-based knowledge representation for biomedical digital library literature clustering. An efficient clustering method is developed to identify the ontology-enriched k-highest density term subgraphs that capture the core semantic relationship information about each document cluster. The distance between each document and the k term graph clusters is calculated. A document is then assigned to the closest term cluster. The extensive experimental results on two PubMed document sets (Disease10 and OHSUMED23) show that our approach is comparable to spherical k-means. The contributions of our approach are the following: (1) we provide two corpus-level graph representations to improve document clustering, a term co-occurrence graph and an abstract-title graph; (2) we develop an efficient and effective document clustering algorithm by identifying k distinguishable class-specific core term subgraphs using terms’ global and local importance information; and (3) the identified term clusters give a meaningful explanation for the document clustering results.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Yang Lei ◽  
Dai Yu ◽  
Zhang Bin ◽  
Yang Yang

Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor’s knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K-means clustering method to improve the user’s satisfactions towards the result. The core of this method is to get the user’s feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user’s business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user’s requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Bei Zhang ◽  
Luquan Wang ◽  
Yuanyuan Li

In user cluster analysis, users with the same or similar behavior characteristics are divided into the same group by iterative update clustering, and the core and larger user groups are detected. In this paper, we present the formulation and data mining of the correlation rules based on the clustering algorithm through the definition and procedure of the algorithm. In addition, based on the idea of the K-mode clustering algorithm, this paper proposes a clustering method combining related rules with multivalued discrete features (MDF). In this paper, we construct a method to calculate the similarity between users using Jaccard distance and combine correlation rules with Jaccard distances to improve the similarity between users. Next, we propose a clustering method suitable for MDF. Finally, the basic K-mode algorithm is improved by the similarity measure method combining the correlation rule with the Jaccard distance and the cluster center update method which is the ARMDKM algorithm proposed in this paper. This method solves the problem that the MDF cannot be effectively processed in the traditional model and demonstrates its theoretical correctness. This experiment verifies the correctness of the new method by clustering purity, entropy, contour, and other indicators.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Norman Schofield

A key concept of social choice is the idea of the Condorcet point or core. For example, consider a voting game with four participants so any three will win. If voters have Euclidean preferences, then the point at the center will be unbeaten. Earlier spatial models of social choice focused on deterministic voter choice. However, it is clear that voter choice is intrinsically stochastic. This chapter employs a stochastic model based on multinomial logit to examine whether parties in electoral competition tend to converge toward the electoral center or respond to activist pressure to adopt more polarized policies. The chapter discusses experimental results of the idea of the core explores empirical analyses of elections in Israel and the United States.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


Author(s):  
Poonam Rani ◽  
MPS Bhatia ◽  
Devendra K Tayal

The paper presents an intelligent approach for the comparison of social networks through a cone model by using the fuzzy k-medoids clustering method. It makes use of a geometrical three-dimensional conical model, which astutely represents the user experience views. It uses both the static as well as the dynamic parameters of social networks. In this, we propose an algorithm that investigates which social network is more fruitful. For the experimental results, the proposed work is employed on the data collected from students from different universities through the Google forms, where students are required to rate their experience of using different social networks on different scales.


2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.


Author(s):  
Weksi Budiaji

A silhouette index is a well-known measure of an internal criteria validation for the clustering algorithm results. While it is a medoid-based validation index, a centroid-based validation index that is called a centroid-based shadow value (CSV) has been developed.  Although both are similar, the CSV has an additional unique property where an image of a 2-dimensional neighborhood graph is possible. A new internal validation index is proposed in this article in order to create a medoid-based validation that has an ability to visualize the results in a 2-dimensional plot. The proposed index behaves similarly to the silhouette index and produces a network visualization, which is comparable to the neighborhood graph of the CSV. The network visualization has a multiplicative parameter (c) to adjust its edges visibility. Due to the medoid-based, in addition, it is more an appropriate visualization technique for any type of data than a neighborhood graph of the CSV.


Sign in / Sign up

Export Citation Format

Share Document