Localized Ambient Solidity Separation Algorithm Based Computer User Segmentation

Most of popular clustering methods typically have some strong assumptions of the dataset. For example, thek-means implicitly assumes that all clusters come from spherical Gaussian distributions which have different means but the same covariance. However, when dealing with datasets that have diverse distribution shapes or high dimensionality, these assumptions might not be valid anymore. In order to overcome this weakness, we proposed a new clustering algorithm named localized ambient solidity separation (LASS) algorithm, using a new isolation criterion called centroid distance. Compared with other density based isolation criteria, our proposed centroid distance isolation criterion addresses the problem caused by high dimensionality and varying density. The experiment on a designed two-dimensional benchmark dataset shows that our proposed LASS algorithm not only inherits the advantage of the original dissimilarity increments clustering method to separate naturally isolated clusters but also can identify the clusters which are adjacent, overlapping, and under background noise. Finally, we compared our LASS algorithm with the dissimilarity increments clustering method on a massive computer user dataset with over two million records that contains demographic and behaviors information. The results show that LASS algorithm works extremely well on this computer user dataset and can gain more knowledge from it.

Download Full-text

Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients

Algorithms ◽

10.3390/a13070158 ◽

2020 ◽

Vol 13 (7) ◽

pp. 158

Author(s):

Tran Dinh Khang ◽

Nguyen Duc Vuong ◽

Manh-Kien Tran ◽

Michael Fowler

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Clustering Method ◽

Machine Learning Technique ◽

Practical Applications ◽

Fuzzy C Means ◽

Fuzzy Clustering Method ◽

Learning Technique ◽

Fuzzy C Means Clustering

Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.

Download Full-text

On Fuzzy Non-Metric Model for Data with Tolerance and its Application to Incomplete Data Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0571 ◽

2016 ◽

Vol 20 (4) ◽

pp. 571-579 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Tomoyuki Suzuki ◽

Naohiko Kinoshita ◽

Yukihiro Hamasuna ◽

...

Keyword(s):

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithm ◽

Uncertain Data ◽

Data Sets ◽

Membership Degree ◽

Clustering Methods ◽

Clustering Method ◽

Numerical Examples ◽

Metric Model

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.

Download Full-text

Research on NMF Based Hierarchical Clustering Methods

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1306 ◽

2010 ◽

Vol 439-440 ◽

pp. 1306-1311

Author(s):

Fang Li ◽

Qun Xiong Zhu

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Clustering Method ◽

Hierarchical Agglomerative Clustering ◽

Hierarchical Clustering Methods

LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative clustering method, NMF based hierarchical clustering method is proposed and analyzed. Two ways of implementing NMF based method are introduced. Finally the result of two groups of experiment based on the TanCorp document corpora show that the method proposed is effective.

Download Full-text

Pattern Classification for GIS Base on GK Clustering Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.448-453.1955 ◽

2013 ◽

Vol 448-453 ◽

pp. 1955-1958

Author(s):

Hui Wang ◽

Xiu Wei Li ◽

Yu Xin Yun ◽

Hai Yan Yuan

Keyword(s):

Pattern Classification ◽

Cross Correlation ◽

Clustering Algorithm ◽

Partial Discharge ◽

Three Dimensional ◽

Pulse Amplitude ◽

Correlation Factor ◽

Two Dimensional ◽

Clustering Method

Four typical defects in GIS for PD detection are proposed, and the pulse, amplitude, phases, number of PD has been used to form the three-dimensional PQN matrix. Based on the PQN matrix, three two-dimensional distributions of Hqmax~Phi, Hqmean~Phi and Hn~Phi can be achieved. Then the new GK clustering method is introduced to separate the four different defects according to separate the four different partial discharge defects in gas in GIS, according to the parameters of Skewness (Sk), Kurtosis (Ku), number of peaks (Pe), cross-correlation factor (CC) and the discharge factor Q.

Download Full-text

Glowworm Swarm Optimization Algorithm- and K-Prototypes Algorithm-Based Metadata Tree Clustering

Mathematical Problems in Engineering ◽

10.1155/2021/8690418 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yaping Li

Keyword(s):

Optimization Algorithm ◽

Categorical Data ◽

Clustering Algorithm ◽

Feature Vector ◽

Clustering Methods ◽

Clustering Method ◽

Swarm Optimization ◽

Tree Structures ◽

High Data ◽

Glowworm Swarm Optimization

The main objective of this paper is to present a new clustering algorithm for metadata trees based on K-prototypes algorithm, GSO (glowworm swarm optimization) algorithm, and maximal frequent path (MFP). Metadata tree clustering includes computing the feature vector of the metadata tree and the feature vector clustering. Therefore, traditional data clustering methods are not suitable directly for metadata trees. As the main method to calculate eigenvectors, the MFP method also faces the difficulties of high computational complexity and loss of key information. Generally, the K-prototypes algorithm is suitable for clustering of mixed-attribute data such as feature vectors, but the K-prototypes algorithm is sensitive to the initial clustering center. Compared with other swarm intelligence algorithms, the GSO algorithm has more efficient global search advantages, which are suitable for solving multimodal problems and also useful to optimize the K-prototypes algorithm. To address the clustering of metadata tree structures in terms of clustering accuracy and high data dimension, this paper combines the GSO algorithm, K-prototypes algorithm, and MFP together to study and design a new metadata structure clustering method. Firstly, MFP is used to describe metadata tree features, and the key parameter of categorical data is introduced into the feature vector of MFP to improve the accuracy of the feature vector to describe the metadata tree; secondly, GSO is combined with K-prototypes to design GSOKP for clustering the feature vector that contains numeric data and categorical data so as to improve the clustering accuracy; finally, tests are conducted with a set of metadata trees. The experimental results show that the designed metadata tree clustering method GSOKP-FP has certain advantages in respect to clustering accuracy and time complexity.

Download Full-text

Center-based l1–clustering method

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2014-0012 ◽

2014 ◽

Vol 24 (1) ◽

pp. 151-163 ◽

Cited By ~ 7

Author(s):

Kristian Sabo

Keyword(s):

Incomplete Data ◽

Optimization Problem ◽

Clustering Algorithm ◽

Optimal Solution ◽

Clustering Methods ◽

Global Minima ◽

Clustering Method ◽

Clustering Problem ◽

Point Set ◽

Locally Optimal Solution

Abstract In this paper, we consider the l1-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l2-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l1-clustering algorithm is faster and gives significantly better results than the l2-clustering algorithm.

Download Full-text

Designing a parallel Feel-the-Way clustering algorithm on HPC systems

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020975194 ◽

2020 ◽

pp. 109434202097519

Author(s):

Weijian Zheng ◽

Dali Wang ◽

Fengguang Song

Keyword(s):

Convergence Rate ◽

High Performance ◽

Clustering Algorithm ◽

Algorithm Design ◽

Computing System ◽

Clustering Methods ◽

Clustering Method ◽

Real World Datasets ◽

Number Of Iterations ◽

The Way

This paper introduces a new parallel clustering algorithm, named Feel-the-Way clustering algorithm, that provides better or equivalent convergence rate than the traditional clustering methods by optimizing the synchronization and communication costs. Our algorithm design centers on how to optimize three factors simultaneously: reduced synchronizations, improved convergence rate, and retained same or comparable optimization cost. To compare the optimization cost, we use the Sum of Square Error (SSE) cost as the metric, which is the sum of the square distance between each data point and its assigned clusters. Compared with the traditional MPI k-means algorithm, the new Feel-the-Way algorithm requires less communications among participating processes. As for the convergence rate, the new algorithm requires fewer number of iterations to converge. As for the optimization cost, it obtains the SSE costs that are close to the k-means algorithm. In the paper, we first design the full-step Feel-the-Way k-means clustering algorithm that can significantly reduce the number of iterations that are required by the original k-means clustering method. Next, we improve the performance of the full-step algorithm by adopting an optimized sampling-based approach, named reassignment-history-aware sampling. Our experimental results show that the optimized sampling-based Feel-the-Way method is significantly faster than the widely used k-means clustering method, and can provide comparable optimization costs. More extensive experiments with several synthetic datasets and real-world datasets (e.g., MNIST, CIFAR-10, ENRON, and PLACES-2) show that the new parallel algorithm can outperform the open source MPI k-means library by up to 110% on a high-performance computing system using 4,096 CPU cores. In addition, the new algorithm can take up to 51% fewer iterations to converge than the k-means clustering algorithm.

Download Full-text

Measurement of clustering effectiveness for document collections

Information Retrieval ◽

10.1007/s10791-021-09401-8 ◽

2022 ◽

Author(s):

Meng Yuan ◽

Justin Zobel ◽

Pauline Lin

Keyword(s):

Information Retrieval ◽

Measurement Techniques ◽

High Dimensionality ◽

Clustering Methods ◽

Clustering Method ◽

Similar Material ◽

Document Collections ◽

Clustering Techniques

AbstractClustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods.

Download Full-text

Comparison of Clustering K-Means, Fuzzy C-Means, and Linkage for Nasa Active Fire Dataset

International Journal of Artificial Intelligence & Robotics (IJAIR) ◽

10.25139/ijair.v2i2.3030 ◽

2020 ◽

Vol 2 (2) ◽

pp. 34

Author(s):

Muchamad Kurniawan ◽

Rani Rotul Muhima ◽

Siti Agustini

Keyword(s):

Forest Fires ◽

Clustering Algorithm ◽

Hot Spot ◽

Clustering Methods ◽

Clustering Method ◽

Simple Method ◽

Fuzzy C Means ◽

Total Distance ◽

Active Fire ◽

Average Linkage

One of the causes of forest fires is the lack of speed of handling when a fire occurs. This can be anticipated by determining how many extinguishing units are in the center of the hot spot. To get hotspots, NASA has provided an active fire dataset. The clustering method is used to get the most optimal centroid point. The clustering methods we use are K-Means, Fuzzy C-Means (FCM), and Average Linkage. The reason for using K-means is a simple method and has been applied in various areas. FCM is a partition-based clustering algorithm which is a development of the K-means method. The hierarchical based clustering method is represented by the Average Linkage method. The measurement technique that uses is the sum of the internal distance of each cluster. Elbow evaluation is used to evaluate the optimal cluster. The results obtained after conducting the K-Means trial obtained the best results with a total distance of 145.35 km, and the best clusters from this method were 4 clusters. Meanwhile, the total distance values obtained from the FCM and Linkage methods were 154.13 km and 266.61 km.

Download Full-text

An Earthquake Clustering Method Based on Soft Distance Calculations

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1104 ◽

2013 ◽

Vol 411-414 ◽

pp. 1104-1107

Author(s):

Jun Xia Chai ◽

Dao Hua Liu

Keyword(s):

Cluster Analysis ◽

Strong Earthquake ◽

Clustering Algorithm ◽

Cluster Center ◽

Analysis Method ◽

Clustering Methods ◽

Clustering Method ◽

Accurate Calculation ◽

Cluster Analysis Method ◽

Data Source

The traditional cluster analysis method based on the true distance is not conducive to the accurate calculation of earthquake different fault rupture propagation and healing rate. This paper proposed and gave a new clustering method based on soft distance calculations. The clustering process based on soft distance calculations, the calculation method for soft distances and the specific clustering algorithm based on soft distances are given. For the real sample points of strong earthquake as a data source, we use this clustering method and other traditional clustering methods to cluster and analysis the data source, and analysis results have showed that the clustering method obtained the same cluster center with the earth stress field evolution, so this method has objective truth. The cluster analysis method for the earthquake fault zones in the accurate calculation of the next strong earthquake provides a good basis for the calculations.

Download Full-text