Bayesian hierarchical K-means clustering

Clustering algorithm is the foundation and important technology in data mining. In fact, in the real world, the data itself often has a hierarchical structure. Hierarchical clustering aims at constructing a cluster tree, which reveals the underlying modal structure of a complex density. Due to its inherent complexity, most existing hierarchical clustering algorithms are usually designed heuristically without an explicit objective function, which limits its utilization and analysis. K-means clustering, the well-known simple yet effective algorithm which can be expressed from the view of probability distribution, has inherent connection to Mixture of Gaussians (MoG). At this point, we consider combining Bayesian theory analysis with K-means algorithm. This motivates us to develop a hierarchical clustering based on K-means under the probability distribution framework, which is different from existing hierarchical K-means algorithms processing data in a single-pass manner along with heuristic strategies. For this goal, we propose an explicit objective function for hierarchical clustering, termed as Bayesian hierarchical K-means (BHK-means). In our method, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner. In this cluster tree, the clustering results of each layer are influenced by the parent and child nodes. Therefore, the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function. The objective function is solved using the same algorithm as K-means, the Expectation-maximization algorithm. The experimental results on both synthetic data and benchmark datasets demonstrate the effectiveness of our algorithm over the existing related ones.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Hesitant Fuzzy Linguistic Agglomerative Hierarchical Clustering Algorithm and Its Application in Judicial Practice

Mathematics ◽

10.3390/math9040370 ◽

2021 ◽

Vol 9 (4) ◽

pp. 370

Author(s):

Shuangsheng Wu ◽

Jie Lin ◽

Zhenyu Zhang ◽

Yushu Yang

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Research Gaps ◽

Judicial Practice ◽

Linguistic Term ◽

Clustering Effect ◽

Hierarchical Clustering Algorithm ◽

Fuzzy Linguistic

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

Data Analysis Using Representation Theory and Clustering Algorithms

WSEAS TRANSACTIONS ON COMPUTERS ◽

10.37394/23205.2020.19.38 ◽

2021 ◽

Vol 19 ◽

pp. 310-320

Author(s):

Suboh Alkhushayni ◽

Taeyoung Choi ◽

Du’a Alzaleq

Keyword(s):

Data Analysis ◽

Random Forest ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Categorical Variables ◽

Common Disease ◽

Agglomerative Hierarchical Clustering ◽

Data Set

This work aims to expand the knowledge of the area of data analysis through both persistence homology, as well as representations of directed graphs. To be specific, we looked for how we can analyze homology cluster groups using agglomerative Hierarchical Clustering algorithms and methods. Additionally, the Wine data, which is offered in R studio, was analyzed using various cluster algorithms such as Hierarchical Clustering, K-Means Clustering, and PAM Clustering. The goal of the analysis was to find out which cluster's method is proper for a given numerical data set. By testing the data, we tried to find the agglomerative hierarchical clustering method that will be the optimal clustering algorithm among these three; K-Means, PAM, and Random Forest methods. By comparing each model's accuracy value with cultivar coefficients, we came with a conclusion that K-Means methods are the most helpful when working with numerical variables. On the other hand, PAM clustering and Gower with random forest are the most beneficial approaches when working with categorical variables. All these tests can determine the optimal number of clustering groups, given the data set, and by doing the proper analysis. Using those the project, we can apply our method to several industrial areas such that clinical, business, and others. For example, people can make different groups based on each patient who has a common disease, required therapy, and other things in the clinical society. Additionally, for the business area, people can expect to get several clustered groups based on the marginal profit, marginal cost, or other economic indicators.

Download Full-text

A comparative analysis of selected clustering algorithms for criminal profiling

Nigerian Journal of Technology ◽

10.4314/njt.v39i2.16 ◽

2020 ◽

Vol 39 (2) ◽

pp. 464-471

Author(s):

J.A. Adeyiga ◽

S.O. Olabiyisi ◽

E.O. Omidiora

Keyword(s):

Expectation Maximization ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Expectation Maximization Algorithm ◽

Real Life ◽

Criminal Activity ◽

Law Enforcement Agencies ◽

Criminal Profiling ◽

Hard Clustering ◽

Membership Value

Several criminal profiling systems have been developed to assist the Law Enforcement Agencies in solving crimes but the techniques employed in most of the systems lack the ability to cluster criminal based on their behavioral characteristics. This paper reviewed different clustering techniques used in criminal profiling and then selects one fuzzy clustering algorithm (Expectation Maximization) and two hard clustering algorithm (K-means and Hierarchical). The selected algorithms were then developed and tested on real life data to produce "profiles" of criminal activity and behavior of criminals. The algorithms were implemented using WEKA software package. The performance of the algorithms was evaluated using cluster accuracy and time complexity. The results show that Expectation Maximization algorithm gave a 90.5% clusters accuracy in 8.5s, while K-Means had 62.6% in 0.09s and Hierarchical with 51.9% in 0.11s. In conclusion, soft clustering algorithm performs better than hard clustering algorithm in analyzing criminal data. Keywords: Clustering Algorithm, Profiling, Crime, Membership value

Download Full-text

Hierarchical kt jet clustering for parallel architectures

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2017-0012 ◽

2017 ◽

Vol 9 (2) ◽

pp. 195-213

Author(s):

Richárd Forster ◽

Ágnes Fülöp

Keyword(s):

Hierarchical Clustering ◽

Particle Physics ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Energy ◽

Theoretical Physics ◽

High Energy Particle ◽

Clustering Methods ◽

Hierarchical Clustering Methods ◽

Using Data

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.

Download Full-text

On Objective-Based Rough Clustering with Fuzzy-Set Representation

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0632 ◽

2015 ◽

Vol 19 (5) ◽

pp. 632-638

Author(s):

Naohiko Kinoshita ◽

◽

Yasunori Endo ◽

Ken Onishi ◽

◽

...

Keyword(s):

Objective Function ◽

Fuzzy Set ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Numerical Examples

The rough clustering algorithm we proposed based on the optimization of objective function (RCM) has a problem because conventional rough clustering algorithm results do not ensure that solutions are optimal. To solve this problem, we propose rough clustering algorithms based on optimization of an objective function with fuzzy-set representation. This yields more flexible results than RCM. We verify algorithm effectiveness through numerical examples.

Download Full-text

A Fuzzy Co-Clustering Algorithm via Modularity Maximization

Mathematical Problems in Engineering ◽

10.1155/2018/3757580 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Yongli Liu ◽

Jingli Chen ◽

Hao Chao

Keyword(s):

Optimization Problem ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Studies ◽

Optimization Procedure ◽

Block Diagonal Matrix ◽

Benchmark Datasets ◽

Modularity Maximization ◽

Object Feature ◽

Block Diagonal

In this paper we propose a fuzzy co-clustering algorithm via modularity maximization, named MMFCC. In its objective function, we use the modularity measure as the criterion for co-clustering object-feature matrices. After converting into a constrained optimization problem, it is solved by an iterative alternative optimization procedure via modularity maximization. This algorithm offers some advantages such as directly producing a block diagonal matrix and interpretable description of resulting co-clusters, automatically determining the appropriate number of final co-clusters. The experimental studies on several benchmark datasets demonstrate that this algorithm can yield higher quality co-clusters than such competitors as some fuzzy co-clustering algorithms and crisp block-diagonal co-clustering algorithms, in terms of accuracy.

Download Full-text

RECURSIVE HIERARCHICAL CLUSTERING FOR HYPERSPECTRAL IMAGES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-461-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 461-465

Author(s):

S. May

Keyword(s):

Hierarchical Clustering ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Hyperspectral Images ◽

High Dimensionality ◽

Illumination Change ◽

Clustering Techniques ◽

Semantic Class ◽

Spectral Angle

Abstract. Partition based clustering techniques are widely used in data mining and also to analyze hyperspectral images. Unsupervised clustering only depends on data, without any external knowledge. It creates a complete partition of the image with many classes. And so, sparse labeled samples may be used to label each cluster, and so simplify the supervised step. Each clustering algorithm has its own advantages, drawbacks (initialization, training complexity). We propose in this paper to use a recursive hierarchical clustering based on standard clustering strategies such as K-Means or Fuzzy-C-Means. The recursive hierarchical approach reduces the algorithm complexity, in order to process large amount of input pixels, and also to produce a clustering with a high number of clusters. Moreover, in hyperspectral images, a classical question is related to the high dimensionality and also to the distance that shall be used. Classical clustering algorithms usually use the Euclidean distance to compute distance between samples and centroids. We propose to implement the spectral angle distance instead and evaluate its performance. It better fits the pixel spectrums and is less sensitive to illumination change or spectrum variability inside a semantic class. Different scenes are processed with this method in order to demonstrate its potential.

Download Full-text

Identifying the Mastery Concepts in Linear Algebra by Using FCM-CM Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3897 ◽

2010 ◽

Vol 44-47 ◽

pp. 3897-3901

Author(s):

Hsiang Chuan Liu ◽

Yen Kuei Yu ◽

Jeng Ming Yih ◽

Chin Chun Chen

Keyword(s):

Objective Function ◽

Linear Algebra ◽

Fuzzy Clustering ◽

Distance Function ◽

Mahalanobis Distance ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Euclidean Distance Function ◽

Fuzzy C Means Algorithm

Euclidean distance function based fuzzy clustering algorithms can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters by employing Mahalanobis distance in objective function, however, both of them need to add some constrains for Mahalanobis distance. In this paper, the authors’ improved Fuzzy C-Means algorithm based on common Mahalanobis distance (FCM-CM) is used to identify the mastery concepts in linear algebra, for comparing the performances with other four partition algorithms; FCM-M, GG, GK, and FCM. The result shows that FCM-CM has better performance than others.

Download Full-text