Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Abstract The present work proposes hybridization of Expectation-Maximization (EM) and K-means techniques as an attempt to speed-up the clustering process. Even though both the K-means and EM techniques look into different areas, K-means can be viewed as an approximate way to obtain maximum likelihood estimates for the means. Along with the proposed algorithm for hybridization, the present work also experiments with the Standard EM algorithm. Six different datasets, three of which synthetic datasets, are used for the experiments. Clustering fitness and Sum of Squared Errors (SSE) are computed for measuring the clustering performance. In all the experiments it is observed that the proposed algorithm for hybridization of EM and K-means techniques is consistently taking less execution time with acceptable Clustering Fitness value and less SSE than the standard EM algorithm. It is also observed that the proposed algorithm is producing better clustering results than the Cluster package of Purdue University.

Download Full-text

Estimating the relative proportions of SARS-CoV-2 strains from wastewater samples

10.1101/2022.01.13.22269236 ◽

2022 ◽

Author(s):

Lenore Pipes ◽

Zihao Chen ◽

Svetlana Afanaseva ◽

Rasmus Nielsen

Keyword(s):

Maximum Likelihood ◽

Em Algorithm ◽

Expectation Maximization ◽

Initial Step ◽

Sequencing Depth ◽

Maximum Likelihood Estimates ◽

Reference Database ◽

Wastewater Samples ◽

Different Strains

Wastewater surveillance has become essential for monitoring the spread of SARS-CoV-2. The quantification of SARS-CoV-2 RNA in wastewater correlates with the Covid-19 caseload in a community. However, estimating the proportions of different SARS-CoV-2 strains has remained technically difficult. We present a method for estimating the relative proportions of SARS-CoV-2 strains from wastewater samples. The method uses an initial step to remove unlikely strains, imputation of missing nucleotides using the global SARS-CoV-2 phylogeny, and an Expectation-Maximization (EM) algorithm for obtaining maximum likelihood estimates of the proportions of different strains in a sample. Using simulations with a reference database of >3 million SARS-CoV-2 genomes, we show that the estimated proportions accurately reflect the true proportions given sufficiently high sequencing depth and that the phylogenetic imputation is highly accurate and substantially improves the reference database.

Download Full-text

A Novel Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2016070103 ◽

2016 ◽

Vol 7 (2) ◽

pp. 47-74 ◽

Cited By ~ 17

Author(s):

Duggirala Raja Kishor ◽

N.B. Venkateswarlu

Keyword(s):

Em Algorithm ◽

Expectation Maximization ◽

Execution Time ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Performance Criteria ◽

The Novel ◽

The Em Algorithm ◽

Synthetic Datasets

Expectation Maximization (EM) is a widely employed mixture model-based data clustering algorithm and produces exceptionally good results. However, many researchers reported that the EM algorithm requires huge computational efforts than other clustering algorithms. This paper presents an algorithm for the novel hybridization of EM and K-Means techniques for achieving better clustering performance (NovHbEMKM). This algorithm first performs K-Means and then using these results it performs EM and K-Means in the alternative iterations. Along with the NovHbEMKM, experiments are carried out with the algorithms for EM, EM using the results of K-Means and Cluster package of Purdue University. Experiments are carried out with datasets from UCI ML repository and synthetic datasets. Execution time, Clustering Fitness and Sum of Squared Errors (SSE) are computed as performance criteria. In all the experiments the proposed NovHbEMKM algorithm is taking less execution time by producing results with higher clustering fitness and lesser SSE than other algorithms including the Cluster package.

Download Full-text

An EM algorithm for obtaining maximum likelihood estimates in the multi-phenotype variance components linkage model

Annals of Human Genetics ◽

10.1046/j.1469-1809.2000.6440349.x ◽

2000 ◽

Vol 64 (4) ◽

pp. 349-362 ◽

Cited By ~ 10

Author(s):

S. J. ITURRIA ◽

J. BLANGERO

Keyword(s):

Maximum Likelihood ◽

Em Algorithm ◽

Variance Components ◽

Maximum Likelihood Estimates ◽

Linkage Model

Download Full-text

Finite mixture of compositional regression with gaussian errors

Revista Colombiana de Estadística ◽

10.15446/rce.v41n1.63152 ◽

2018 ◽

Vol 41 (1) ◽

pp. 75-86

Author(s):

Taciana Shimizu ◽

Francisco Louzada ◽

Adriano Suzuki

Keyword(s):

Maximum Likelihood ◽

Em Algorithm ◽

Regression Model ◽

Simulation Study ◽

Regression Models ◽

Finite Mixture ◽

Maximum Likelihood Estimates ◽

The Real ◽

Compositional Structure ◽

Volleyball Players

In this paper, we consider to evaluate the efficiency of volleyball players according to the performance of attack, block and serve, but considering the compositional structure of the data related to the fundaments. The finite mixture of regression models better fitted the data in comparison with the usual regression model. The maximum likelihood estimates are obtained via an EM algorithm. A simulation study revels that the estimates are closer to the real values, the estimators are asymptotically unbiased for the parameters. A real Brazilian volleyball dataset related to the efficiency of the players is considered for the analysis.

Download Full-text

Fast maximum likelihood estimation of parameters for square root and Bessel processes

Studies in Nonlinear Dynamics & Econometrics ◽

10.1515/snde-2019-0079 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Kevin Fergusson

Keyword(s):

Maximum Likelihood ◽

Interest Rate ◽

Likelihood Estimation ◽

Maximum Likelihood Estimates ◽

Square Root ◽

Bessel Processes ◽

Estimation Of Parameters ◽

Volatility Derivatives ◽

Speed Up ◽

Index Time Series

AbstractExplicit formulae for maximum likelihood estimates of the parameters of square root processes and Bessel processes and first and second order approximate sufficient statistics are supplied. Applications of the estimation formulae to simulated interest rate and index time series are supplied, demonstrating the accuracy of the approximations and the extreme speed-up in estimation time. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate, index and volatility derivatives based on such models, as well as modelling credit risk, mortality rates, population size and voting behaviour.

Download Full-text

Fast Computation of the EM Algorithm for Mixture Models

10.5772/intechopen.101249 ◽

2021 ◽

Author(s):

Masahiro Kuroda

Keyword(s):

Maximum Likelihood ◽

Em Algorithm ◽

Mixture Models ◽

Computation Time ◽

Likelihood Estimation ◽

Heterogeneous Data ◽

Maximum Likelihood Estimates ◽

The Em Algorithm ◽

Clustering And Classification ◽

Number Of Iterations

Mixture models become increasingly popular due to their modeling flexibility and are applied to the clustering and classification of heterogeneous data. The EM algorithm is largely used for the maximum likelihood estimation of mixture models because the algorithm is stable in convergence and simple in implementation. Despite such advantages, it is pointed out that the EM algorithm is local and has slow convergence as the main drawback. To avoid the local convergence of the EM algorithm, multiple runs from several different initial values are usually used. Then the algorithm may take a large number of iterations and long computation time to find the maximum likelihood estimates. The speedup of computation of the EM algorithm is available for these problems. We give the algorithms to accelerate the convergence of the EM algorithm and apply them to mixture model estimation. Numerical experiments examine the performance of the acceleration algorithms in terms of the number of iterations and computation time.

Download Full-text

Large-Sample Inference of EM Based on Maximum Likelihood Estimates

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1049-1050.1343 ◽

2014 ◽

Vol 1049-1050 ◽

pp. 1343-1346

Author(s):

Yong Li

Keyword(s):

Data Analysis ◽

Missing Data ◽

Maximum Likelihood ◽

Em Algorithm ◽

Information Matrix ◽

Maximum Likelihood Estimates ◽

Large Sample ◽

Missing Data Analysis ◽

And Inversion

EM algorithm is a very popular algorithm in missing data analysis. However,The variance of the estimator from EM is intractable. In this paper, we propose the supplemented EM algorithm for computing the variance that do not require computation and inversion of the information matrix.

Download Full-text

Asymptotic properties and expectation-maximization algorithm for maximum likelihood estimates of the parameters from Weibull-Logarithmic model

Applied Mathematics-A Journal of Chinese Universities ◽

10.1007/s11766-016-3391-2 ◽

2016 ◽

Vol 31 (4) ◽

pp. 425-438 ◽

Cited By ~ 3

Author(s):

Wen-hao Gui ◽

Huai-nian Zhang

Keyword(s):

Maximum Likelihood ◽

Expectation Maximization ◽

Asymptotic Properties ◽

Expectation Maximization Algorithm ◽

Maximum Likelihood Estimates ◽

Logarithmic Model

Download Full-text

Parameter Estimation of a Multistate Model for an Aging Piece of Equipment under Condition-Based Maintenance

Mathematical Problems in Engineering ◽

10.1155/2012/347675 ◽

2012 ◽

Vol 2012 ◽

pp. 1-19 ◽

Cited By ~ 2

Author(s):

Qihong Duan ◽

Xiang Chen ◽

Dengfu Zhao ◽

Zheng Zhao

Keyword(s):

Parameter Estimation ◽

Maximum Likelihood ◽

Model Calculation ◽

Expectation Maximization ◽

Expectation Maximization Algorithm ◽

Condition Based Maintenance ◽

Maximum Likelihood Estimates ◽

Model Parameters ◽

Simulation Studies ◽

Multistate Model

We study a multistate model for an aging piece of equipment under condition-based maintenance and apply an expectation maximization algorithm to obtain maximum likelihood estimates of the model parameters. Because of the monitoring discontinuity, we cannot observe any state's duration. The observation consists of the equipment's state at an inspection or right after a repair. Based on a proper construction of stochastic processes involved in the model, calculation of some probabilities and expectations becomes tractable. Using these probabilities and expectations, we can apply an expectation maximization algorithm to estimate the parameters in the model. We carry out simulation studies to test the accuracy and the efficiency of the algorithm.

Download Full-text

A Comparison of Maximum Likelihood Estimations for Normal Mean under Right Censoring

Calcutta Statistical Association Bulletin ◽

10.1177/0008068320976785 ◽

2020 ◽

Vol 72 (2) ◽

pp. 122-132

Author(s):

Junfeng Liu ◽

Xiaoxia Zhang

Keyword(s):

Maximum Likelihood ◽

Em Algorithm ◽

Expectation Maximization ◽

Asymptotic Variance ◽

Likelihood Estimation ◽

Right Censoring ◽

Log Likelihood ◽

Framework Approach ◽

Normal Mean ◽

Maximum Likelihood Estimations

For efficiently estimating the normal mean ([Formula: see text]) under right censoring (threshold =[Formula: see text], [Formula: see text] is known), we compare two approaches within the maximum likelihood estimation (MLE) framework. Approach I is a hierarchical MLE for which only the empirical censoring probability is utilized. Approach II is the direct MLE for which expectation-maximization (EM) algorithm is applied to all individual observations. We use discrete approximation to explain that the asymptotic variance of Approach II estimate equals the inverse Fisher information calculated from the full log-likelihood. We prove that Approach II gives a uniformly smaller asymptotic variance than Approach I and the variance ratio is a decreasing function of [Formula: see text]. We further prove some supportive results and graphically demonstrate that EM algorithm monotonically converges to the unique MLE.

Download Full-text