Grey Wolf Algorithm-Based Clustering Technique

AbstractThe main problem of classical clustering technique is that it is easily trapped in the local optima. An attempt has been made to solve this problem by proposing the grey wolf algorithm (GWA)-based clustering technique, called GWA clustering (GWAC), through this paper. The search capability of GWA is used to search the optimal cluster centers in the given feature space. The agent representation is used to encode the centers of clusters. The proposed GWAC technique is tested on both artificial and real-life data sets and compared to six well-known metaheuristic-based clustering techniques. The computational results are encouraging and demonstrate that GWAC provides better values in terms of precision, recall, G-measure, and intracluster distances. GWAC is further applied for gene expression data set and its performance is compared to other techniques. Experimental results reveal the efficiency of the GWAC over other techniques.

Download Full-text

CLUSTERING USING SIMULATED ANNEALING WITH PROBABILISTIC REDISTRIBUTION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001401000927 ◽

2001 ◽

Vol 15 (02) ◽

pp. 269-285 ◽

Cited By ~ 54

Author(s):

SANGHAMITRA BANDYOPADHYAY ◽

UJJWAL MAULIK ◽

MALAY KUMAR PAKHIRA

Keyword(s):

Simulated Annealing ◽

Clustering Algorithm ◽

Minimum Energy ◽

Real Life ◽

Feature Space ◽

Cluster Center ◽

Data Sets ◽

Partitional Clustering ◽

Real Life Data ◽

Data Points

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.

Download Full-text

A Dynamic Genetic Algorithm for Clustering Problems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.1884 ◽

2013 ◽

Vol 411-414 ◽

pp. 1884-1893

Author(s):

Yong Chun Cao ◽

Ya Bin Shao ◽

Shuang Liang Tian ◽

Zheng Qi Cai

Keyword(s):

Genetic Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Search Space ◽

Adaptive Mutation ◽

Data Sets ◽

Data Set ◽

Local Optima ◽

Clustering Problems

Due to many of the clustering algorithms based on GAs suffer from degeneracy and are easy to fall in local optima, a novel dynamic genetic algorithm for clustering problems (DGA) is proposed. The algorithm adopted the variable length coding to represent individuals and processed the parallel crossover operation in the subpopulation with individuals of the same length, which allows the DGA algorithm clustering to explore the search space more effectively and can automatically obtain the proper number of clusters and the proper partition from a given data set; the algorithm used the dynamic crossover probability and adaptive mutation probability, which prevented the dynamic clustering algorithm from getting stuck at a local optimal solution. The clustering results in the experiments on three artificial data sets and two real-life data sets show that the DGA algorithm derives better performance and higher accuracy on clustering problems.

Download Full-text

Rough ISODATA Algorithm

International Journal of Fuzzy System Applications ◽

10.4018/ijfsa.2013100101 ◽

2013 ◽

Vol 3 (4) ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

S. Sampath ◽

B. Ramya

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Vital Role ◽

Data Sets ◽

Clustering Method ◽

Data Set ◽

Number Of Clusters ◽

Real Life Data ◽

Nonparametric Statistical

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.

Download Full-text

A New Kumaraswamy Generalized Family of Distributions with Properties, Applications and Bivariate Extension

10.20944/preprints202009.0713.v1 ◽

2020 ◽

Author(s):

Muhammad H. Tahir ◽

Muhammad Adnan Hussain ◽

Gauss Cordeiro ◽

Mahmoud El-Morshedy ◽

Mohammed S. Eliwa

Keyword(s):

Maximum Likelihood ◽

Real Life ◽

Unit Interval ◽

Likelihood Method ◽

Data Sets ◽

Data Set ◽

Life Data ◽

Real Life Data ◽

The Family ◽

Family Of Distributions

For bounded unit interval, we propose a new Kumaraswamy generalized (G) family of distributions from a new generator which could be an alternate to the Kumaraswamy-G family proposed earlier by Cordeiro and de-Castro in 2011. This new generator can also be used to develop alternate G-classes such as beta-G, McDonald-G, Topp-Leone-G, Marshall-Olkin-G and Transmuted-G for bounded unit interval. Some mathematical properties of this new family are obtained and maximum likelihood method is used for estimating the family parameters. We investigate the properties of one special model called a new Kumaraswamy-Weibull (NKwW) distribution. Parameter estimation is dealt and maximum likelihood estimators are assessed through simulation study. Two real life data sets are analyzed to illustrate the importance and flexibility of this distribution. In fact, this model outperforms some generalized Weibull models such as the Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibull, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull, exponentiated-Weibull and Weibull distributions when applied to these data sets. The bivariate extension of the family is proposed and the estimation of parameters is given. The usefulness of the bivariate NKwW model is illustrated empirically by means of a real-life data set.

Download Full-text

SCI-Tree: An Incremental Algorithm for Computing Support Counts of all Closed Intervals from an Interval Dataset

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8009.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 233-242

Keyword(s):

Real Life ◽

Interval Data ◽

Incremental Algorithm ◽

Data Sets ◽

Data Set ◽

Closed Intervals ◽

Real Life Data ◽

Static Data ◽

Synthetic Datasets ◽

Computing Support

Interval data mining is used to extract unknown patterns, hidden rules, associations etc. associated in interval based data. The extraction of closed interval is important because by mining the set of closed intervals and their support counts, the support counts of any interval can be computed easily. In this work an incremental algorithm for computing closed intervals together with their support counts from interval dataset is proposed. Many methods for mining closed intervals are available. Most of these methods assume a static data set as input and hence the algorithms are non-incremental. Real life data sets are however dynamic by nature. An efficient incremental algorithm called CI-Tree has been already proposed for computing closed intervals present in dynamic interval data. However this method could not compute the support values of the closed intervals. The proposed algorithm called SCI-Tree extracts all closed intervals together with their support values incrementally from the given interval data. Also, all the frequent closed intervals can be computed for any user defined minimum support with a single scan of SCI-Tree without revisiting the dataset. The proposed method has been tested with real life and synthetic datasets and results have been reported.

Download Full-text

A Poisson XLindley Distribution with Applications

10.21203/rs.3.rs-343104/v1 ◽

2021 ◽

Author(s):

Fatma Zohra Seghier ◽

Halim Zeghdoudi

Keyword(s):

Method Of Moments ◽

Nipah Virus ◽

Continuous Distribution ◽

Real Life ◽

Factorial Moment ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Real Life Data ◽

Method Of Maximum Likelihood

Abstract In this paper, a Poisson XLindley distribution (PXLD) has been obtained by compounding Poisson (PD) distribution with a continuous distribution. A general expression for its rth factorial moment about origin has been derived and hence its raw moments and central moments are obtained. The expressions for its coefficient of variation, skewness, kurtosis and index of dispersion have also been given. In particular, the method of maximum likelihood and the method of moments for the estimation of its parameters have been discussed. Finally, two real-life data sets are analyzed to investigate the suitability of the proposed distribution in modeling a real data set on Nipah virus infection, number of Hemocytometer yeast cell count data and epileptic seizure counts data.

Download Full-text

Misuse of ‘Break-the-Glass' Policies in Hospitals

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2018070106 ◽

2018 ◽

Vol 12 (3) ◽

pp. 100-122

Author(s):

Benjamin Stark ◽

Heiko Gewald ◽

Heinrich Lautenbacher ◽

Ulrich Haase ◽

Siegmar Ruff

Keyword(s):

Health Information ◽

Information Needs ◽

Medical Center ◽

Real Life ◽

Hospital Setting ◽

Data Access ◽

Data Sets ◽

Data Set ◽

Real Life Data ◽

Minimal Data

This article describes how the information about an individual's personal health is among ones most sensitive and important intangible belongings. When health information is misused, serious non-revertible damage can be caused, e.g. through making intimidating details public or leaking it to employers, insurances etc. Therefore, health information needs to be treated with the highest degree of confidentiality. In practice it proves difficult to achieve this goal. In a hospital setting medical staff across departments often needs to access patient data without directly obvious reasons, which makes it difficult to distinguish legitimate from illegitimate access. This article provides a mechanism to classify transactions at a large university medical center into plausible and questionable data access using a real-life data set of more than 60,000 transactions. The classification mechanism works with minimal data requirements and unsupervised data sets. The results were evaluated through manual cross-checks internally and by a group of external experts. Consequently, the hospital's data protection officer is now able to focus on analyzing questionable transactions instead of checking random samples.

Download Full-text

A New Kumaraswamy Generalized Family of Distributions with Properties, Applications, and Bivariate Extension

Mathematics ◽

10.3390/math8111989 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1989

Author(s):

Muhammad H. Tahir ◽

Muhammad Adnan Hussain ◽

Gauss M. Cordeiro ◽

M. El-Morshedy ◽

M. S. Eliwa

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Method ◽

Real Life ◽

Unit Interval ◽

Likelihood Method ◽

Data Sets ◽

Data Set ◽

Life Data ◽

Real Life Data ◽

Family Of Distributions

For bounded unit interval, we propose a new Kumaraswamy generalized (G) family of distributions through a new generator which could be an alternate to the Kumaraswamy-G family proposed earlier by Cordeiro and de Castro in 2011. This new generator can also be used to develop alternate G-classes such as beta-G, McDonald-G, Topp-Leone-G, Marshall-Olkin-G, and Transmuted-G for bounded unit interval. Some mathematical properties of this new family are obtained and maximum likelihood method is used for the estimation of G-family parameters. We investigate the properties of one special model called the new Kumaraswamy-Weibull (NKwW) distribution. Parameters of NKwW model are estimated by using maximum likelihood method, and the performance of these estimators are assessed through simulation study. Two real life data sets are analyzed to illustrate the importance and flexibility of the proposed model. In fact, this model outperforms some generalized Weibull models such as the Kumaraswamy-Weibull, McDonald-Weibull, beta-Weibull, exponentiated-generalized Weibull, gamma-Weibull, odd log-logistic-Weibull, Marshall-Olkin-Weibull, transmuted-Weibull and exponentiated-Weibull distributions when applied to these data sets. The bivariate extension of the family is also proposed, and the estimation of parameters is dealt. The usefulness of the bivariate NKwW model is illustrated empirically by means of a real-life data set.

Download Full-text

On a Generalization of the Weibull Distribution and Its Application in Quality Control

Stochastics and Quality Control ◽

10.1515/eqc-2018-0011 ◽

2018 ◽

Vol 33 (2) ◽

pp. 113-124

Author(s):

K. K. Jose ◽

Lishamol Tomy ◽

Sophia P. Thomas

Keyword(s):

Weibull Distribution ◽

Operating Characteristic ◽

Real Life ◽

Characteristic Functions ◽

Data Sets ◽

Data Set ◽

Reliability Modeling ◽

Failure Times ◽

Real Life Data ◽

Extended Weibull Distribution

Abstract In this article, a generalization of the Weibull distribution called Harris extended Weibull distribution is studied, and its properties are discussed. We fit the distribution to a real-life data set to show the applicability of this distribution in reliability modeling. Also, we derive a reliability test plan for acceptance or rejection of a lot of products submitted for inspection with lifetimes following this distribution. The operating characteristic functions of the sampling plans are obtained. The producer’s risk, minimum sample sizes and associated characteristics are computed and presented in tables. The results are illustrated using two data sets on ordered failure times of products as well as failure times of ball bearings.

Download Full-text

Distance Learning in Discriminative Vector Quantization

Neural Computation ◽

10.1162/neco.2009.10-08-892 ◽

2009 ◽

Vol 21 (10) ◽

pp. 2942-2969 ◽

Cited By ~ 53

Author(s):

Petra Schneider ◽

Michael Biehl ◽

Barbara Hammer

Keyword(s):

Vector Quantization ◽

Euclidean Distance ◽

Distance Measure ◽

Real Life ◽

Data Driven ◽

Data Sets ◽

Life Data ◽

Real Life Data ◽

Metric Structures ◽

The Given

Discriminative vector quantization schemes such as learning vector quantization (LVQ) and extensions thereof offer efficient and intuitive classifiers based on the representation of classes by prototypes. The original methods, however, rely on the Euclidean distance corresponding to the assumption that the data can be represented by isotropic clusters. For this reason, extensions of the methods to more general metric structures have been proposed, such as relevance adaptation in generalized LVQ (GLVQ) and matrix learning in GLVQ. In these approaches, metric parameters are learned based on the given classification task such that a data-driven distance measure is found. In this letter, we consider full matrix adaptation in advanced LVQ schemes. In particular, we introduce matrix learning to a recent statistical formalization of LVQ, robust soft LVQ, and we compare the results on several artificial and real-life data sets to matrix learning in GLVQ, a derivation of LVQ-like learning based on a (heuristic) cost function. In all cases, matrix adaptation allows a significant improvement of the classification accuracy. Interestingly, however, the principled behavior of the models with respect to prototype locations and extracted matrix dimensions shows several characteristic differences depending on the data sets.

Download Full-text