ANALYSIS OF THE APPLICABILITY CRITERION FOR K MEANS CLUSTERING ALGORITHM RUN TEN NUMBER OF TIMES ON THE FIRST 25 NUMBERS OF THE FIBONACCI SERIES

In this research investigation Analysis Of The Applicability Criterion For K Means Clustering Algorithm Run Ten Number Of Times On The First 25 Numbers Of The Fibonacci Series is performed. For this analysis RCB Model Of Applicability Criterion For K Means Clustering Algorithm is used. K-means is one of the simplest unsupervised learning algorithms that solve the well-known clustering problem. K- Means clustering algorithm is a scheme for clustering continuous and numeric data. As K-Means algorithm consists of scheme of random initialization of centroids, every time it is run, it gives different or slightly different results because it may reach some local optima. Quantification of such aforementioned variation is of some importance as this sheds light on the nature of the Discrete K-Means Objective function with regards its maxima and minima. The K-Means Clustering algorithm aims at minimizing the aforementioned Objective function. The RCB Model Of Applicability Criterion for K-Means Clustering aims at telling us if we can use the K-Means Clustering Algorithm on a given set of data within acceptable variation limits of the results of the K-Means Clustering Algorithm when it is run several times. KEY WORDS: K-means clustering algorithm, RCB model and Cluster evaluation.

Download Full-text

A hybrid clustering algorithm based on improved GWO and KHM clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211034 ◽

2021 ◽

pp. 1-14

Author(s):

Feng Xue ◽

Yongbo Liu ◽

Xiaochen Ma ◽

Bharat Pathak ◽

Peng Liang

Keyword(s):

Clustering Algorithm ◽

Test Functions ◽

Convergence Factor ◽

Step Size ◽

Local Optima ◽

Hybrid Clustering ◽

Clustering Problem ◽

Dynamic Weight ◽

The Stability ◽

Harmonic Means

To solve the problem that the K-means algorithm is sensitive to the initial clustering centers and easily falls into local optima, we propose a new hybrid clustering algorithm called the IGWOKHM algorithm. In this paper, we first propose an improved strategy based on a nonlinear convergence factor, an inertial step size, and a dynamic weight to improve the search ability of the traditional grey wolf optimization (GWO) algorithm. Then, the improved GWO (IGWO) algorithm and the K-harmonic means (KHM) algorithm are fused to solve the clustering problem. This fusion clustering algorithm is called IGWOKHM, and it combines the global search ability of IGWO with the local fast optimization ability of KHM to both solve the problem of the K-means algorithm’s sensitivity to the initial clustering centers and address the shortcomings of KHM. The experimental results on 8 test functions and 4 University of California Irvine (UCI) datasets show that the IGWO algorithm greatly improves the efficiency of the model while ensuring the stability of the algorithm. The fusion clustering algorithm can effectively overcome the inadequacies of the K-means algorithm and has a good global optimization ability.

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

Objective function of semi-supervised Fuzzy C-Means clustering algorithm

2008 6th IEEE International Conference on Industrial Informatics ◽

10.1109/indin.2008.4618199 ◽

2008 ◽

Author(s):

Chunfang Li ◽

Lianzhong Liu ◽

Wenli Jiang

Keyword(s):

Objective Function ◽

Clustering Algorithm ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

Fuzzy Set Based Clustering Algorithm of Web Text

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.678.19 ◽

2014 ◽

Vol 678 ◽

pp. 19-22

Author(s):

Hong Xin Wan ◽

Yun Peng

Keyword(s):

Key Words ◽

Fuzzy Set ◽

Clustering Algorithm ◽

Text Clustering ◽

Classification Methods ◽

Comparative Experiment ◽

Fuzzy Algorithm ◽

Pattern Clustering ◽

The Web ◽

Computing Accuracy

Web text exists non-certain and non-structure contents ,and it is difficult to cluster the text by normal classification methods. We propose a web text clustering algorithm based on fuzzy set to increase the computing accuracy with the web text. After abstracting the key words of the text, we can look it as attributes and design the fuzzy algorithm to decide the membership of the words. The algorithm can improve the algorithm complexity of time and space, increase the robustness comparing to the normal algorithm. To test the accuracy and efficiency of the algorithm, we take the comparative experiment between pattern clustering and our algorithm. The experiment shows that our method has a better result.

Download Full-text

CLUSTERING USING AN IMPROVED HYBRID GENETIC ALGORITHM

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300700362x ◽

2007 ◽

Vol 16 (06) ◽

pp. 919-934

Author(s):

YONGGUO LIU ◽

XIAORONG PU ◽

YIDONG SHEN ◽

ZHANG YI ◽

XIAOFENG LIAO

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Hybrid Genetic Algorithm ◽

Sum Of Squares ◽

Clustering Methods ◽

Clustering Problem ◽

Mutation Operation ◽

Iteration Methods ◽

Genetic Clustering ◽

The Individual

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.

Download Full-text

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text

Order Selection in Unsupervised Learning and Clustering for Arbitrary and Non-Arbitrary Shaped Data

10.32920/ryerson.14668125.v1 ◽

2021 ◽

Author(s):

Mahdi Shahbaba

Keyword(s):

Objective Function ◽

Unsupervised Learning ◽

Minimum Spanning Tree ◽

Statistical Testing ◽

Adjusted Rand Index ◽

Order Selection ◽

Clustering Methods ◽

Conventional Methods ◽

Anderson Darling ◽

Statistical Testing Method

This thesis focuses on clustering for the purpose of unsupervised learning. One topic of our interest is on estimating the correct number of clusters (CNC). In conventional clustering approaches, such as X-means, G-means, PG-means and Dip-means, estimating the CNC is a preprocessing step prior to finding the centers and clusters. In another word, the first step estimates the CNC and the second step finds the clusters. Each step having different objective function to minimize. Here, we propose minimum averaged central error (MACE)-means clustering and use one objective function to simultaneously estimate the CNC and provide the cluster centers. We have shown superiority of MACEmeans over the conventional methods in term of estimating the CNC with comparable complexity. In addition, on average MACE-means results in better values for adjusted rand index (ARI) and variation of information (VI). Next topic of our interest is order selection step of the conventional methods which is usually a statistical testing method such as Kolmogrov-Smrinov test, Anderson-Darling test, and Hartigan's Dip test. We propose a new statistical test denoted by Sigtest (signature testing). The conventional statistical testing approaches rely on a particular assumption on the probability distribution of each cluster. Sigtest on the other hand can be used with any prior distribution assumption on the clusters. By replacing the statistical testing of the mentioned conventional approaches with Sigtest, we have shown that the clustering methods are improved in terms of having more accurate CNC as well as ARI and VI. Conventional clustering approaches fail in arbitrary shaped clustering. Our last contribution of the thesis is in arbitrary shaped clustering. The proposed method denoted by minimum Pathways is Arbitrary Shaped (minPAS) clustering is proposed based on a unique minimum spanning tree structure of the data. Our simulation results show advantage of minPAS over the state-of-the-art arbitrary shaped clustering methods such as DBSCAN and Affinity Propagation in terms of accuracy, ARI and VI indexes.

Download Full-text

Multi Objective Simulated Annealing Approach for Facility Layout Design

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2018.3.4-026 ◽

2018 ◽

Vol 3 (4) ◽

pp. 365-380 ◽

Cited By ~ 5

Author(s):

Safiye Turgay

Keyword(s):

Simulated Annealing ◽

Objective Function ◽

Material Handling ◽

Simulated Annealing Algorithm ◽

Facility Layout ◽

Layout Design ◽

Local Optima ◽

Facility Layout Design ◽

Multi Objective ◽

Annealing Algorithm

Facility layout design problem considers the departments’ physcial layout design with area requirements in some restrictions such as material handling costs, remoteness and distance requests. Briefly, facility layout problem related to optimization of the layout costs and working conditions. This paper proposes a new multi objective simulated annealing algorithm for solving of the unequal area in layout design. Using of the different objective weights are generated with entropy approach and used in the alternative layout design. Multi objective function takes into the objective function and constraints. The suggested heuristic algorithm used the multi-objective parameters for initialization. Then prefered the entropy approach determines the weight of the objective functions. After the suggested improved simulated annealing approach applied to whole developed model. A multi-objective simulated annealing algorithm is implemented to increase the diversity and reduce the chance of getting layout conditions in local optima.

Download Full-text

Clustering Algorithm for Vehicle’s Driving Data Feature based on Integrated Navigation

International Journal of Vehicle Structures and Systems ◽

10.4273/ijvss.13.4.14 ◽

2021 ◽

Vol 13 (4) ◽

Author(s):

Na Guo ◽

Yiyi Zhu

Keyword(s):

Clustering Algorithm ◽

Principal Component ◽

Kernel Principal Component Analysis ◽

Integrated Navigation ◽

Clustering Problem ◽

Incremental Method ◽

Feature Parameters ◽

Vehicle Acceleration ◽

Feature Based ◽

Clustering Problems

The clustering result of K-means clustering algorithm is affected by the initial clustering center and the clustering result is not always global optimal. Therefore, the clustering analysis of vehicle’s driving data feature based on integrated navigation is carried out based on global K-means clustering algorithm. The vehicle mathematical model based on GPS/DR integrated navigation is constructed and the vehicle’s driving data based on GPS/DR integrated navigation, such as vehicle acceleration, are collected. After extracting the vehicle’s driving data features, the feature parameters of vehicle’s driving data are dimensionally reduced based on kernel principal component analysis to reduce the redundancy of feature parameters. The global K-means clustering algorithm converts clustering problem into a series of sub-cluster clustering problems. At the end of each iteration, an incremental method is used to select the next cluster of optimal initial centers. After determining the optimal clustering number, the feature clustering of vehicle’s driving data is completed. The experimental results show that the global K-means clustering algorithm has a clustering error of only 1.37% for vehicle’s driving data features and achieves high precision clustering for vehicle’s driving data features.

Download Full-text

MD-SPKM: A set pair k-modes clustering algorithm for incomplete categorical matrix data

Intelligent Data Analysis ◽

10.3233/ida-205340 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1507-1524

Author(s):

Chunying Zhang ◽

Ruiyan Gao ◽

Jiahao Wang ◽

Song Chen ◽

Fengchun Liu ◽

...

Keyword(s):

Measurement Method ◽

Clustering Algorithm ◽

Average Distance ◽

Boundary Region ◽

Data Sets ◽

Calculation Formula ◽

Information Granule ◽

Clustering Problem ◽

Definition Of ◽

Multiple Clusters

In order to solve the clustering problem with incomplete and categorical matrix data sets, and considering the uncertain relationship between samples and clusters, a set pair k-modes clustering algorithm is proposed (MD-SPKM). Firstly, the correlation theory of set pair information granule is introduced into k-modes clustering. By improving the distance formula of traditional k-modes algorithm, a set pair distance measurement method between incomplete matrix samples is defined. Secondly, considering the uncertain relationship between the sample and the cluster, the definition of the intra-cluster average distance and the threshold calculation formula to determine whether the sample belongs to multiple clusters is given, and then the result of set pair clustering is formed, which includes positive region, boundary region and negative region. Finally, through the selected three data sets and four contrast algorithms for experimental evaluation, the experimental results show that the set pair k-modes clustering algorithm can effectively handle incomplete categorical matrix data sets, and has good clustering performance in Accuracy, Recall, ARI and NMI.

Download Full-text