scholarly journals Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

2016 ◽  
Vol 16 (2) ◽  
pp. 16-34 ◽  
Author(s):  
D. Raja Kishor ◽  
N. B. Venkateswarlu

Abstract The present work proposes hybridization of Expectation-Maximization (EM) and K-means techniques as an attempt to speed-up the clustering process. Even though both the K-means and EM techniques look into different areas, K-means can be viewed as an approximate way to obtain maximum likelihood estimates for the means. Along with the proposed algorithm for hybridization, the present work also experiments with the Standard EM algorithm. Six different datasets, three of which synthetic datasets, are used for the experiments. Clustering fitness and Sum of Squared Errors (SSE) are computed for measuring the clustering performance. In all the experiments it is observed that the proposed algorithm for hybridization of EM and K-means techniques is consistently taking less execution time with acceptable Clustering Fitness value and less SSE than the standard EM algorithm. It is also observed that the proposed algorithm is producing better clustering results than the Cluster package of Purdue University.

2022 ◽  
Author(s):  
Lenore Pipes ◽  
Zihao Chen ◽  
Svetlana Afanaseva ◽  
Rasmus Nielsen

Wastewater surveillance has become essential for monitoring the spread of SARS-CoV-2. The quantification of SARS-CoV-2 RNA in wastewater correlates with the Covid-19 caseload in a community. However, estimating the proportions of different SARS-CoV-2 strains has remained technically difficult. We present a method for estimating the relative proportions of SARS-CoV-2 strains from wastewater samples. The method uses an initial step to remove unlikely strains, imputation of missing nucleotides using the global SARS-CoV-2 phylogeny, and an Expectation-Maximization (EM) algorithm for obtaining maximum likelihood estimates of the proportions of different strains in a sample. Using simulations with a reference database of >3 million SARS-CoV-2 genomes, we show that the estimated proportions accurately reflect the true proportions given sufficiently high sequencing depth and that the phylogenetic imputation is highly accurate and substantially improves the reference database.


2016 ◽  
Vol 7 (2) ◽  
pp. 47-74 ◽  
Author(s):  
Duggirala Raja Kishor ◽  
N.B. Venkateswarlu

Expectation Maximization (EM) is a widely employed mixture model-based data clustering algorithm and produces exceptionally good results. However, many researchers reported that the EM algorithm requires huge computational efforts than other clustering algorithms. This paper presents an algorithm for the novel hybridization of EM and K-Means techniques for achieving better clustering performance (NovHbEMKM). This algorithm first performs K-Means and then using these results it performs EM and K-Means in the alternative iterations. Along with the NovHbEMKM, experiments are carried out with the algorithms for EM, EM using the results of K-Means and Cluster package of Purdue University. Experiments are carried out with datasets from UCI ML repository and synthetic datasets. Execution time, Clustering Fitness and Sum of Squared Errors (SSE) are computed as performance criteria. In all the experiments the proposed NovHbEMKM algorithm is taking less execution time by producing results with higher clustering fitness and lesser SSE than other algorithms including the Cluster package.


2018 ◽  
Vol 41 (1) ◽  
pp. 75-86
Author(s):  
Taciana Shimizu ◽  
Francisco Louzada ◽  
Adriano Suzuki

In this paper, we consider to evaluate the efficiency of volleyball players according to the performance of attack, block and serve, but considering the compositional structure of the data related to the fundaments. The finite mixture of regression models better fitted the data in comparison with the usual regression model. The maximum likelihood estimates are obtained via an EM algorithm. A simulation study revels that the estimates are closer to the real values, the estimators are asymptotically unbiased for the parameters. A real Brazilian volleyball dataset related to the efficiency of the players is considered for the analysis.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Kevin Fergusson

AbstractExplicit formulae for maximum likelihood estimates of the parameters of square root processes and Bessel processes and first and second order approximate sufficient statistics are supplied. Applications of the estimation formulae to simulated interest rate and index time series are supplied, demonstrating the accuracy of the approximations and the extreme speed-up in estimation time. This significantly improved run time for parameter estimation has many applications where ex-ante forecasts are required frequently and immediately, such as in hedging interest rate, index and volatility derivatives based on such models, as well as modelling credit risk, mortality rates, population size and voting behaviour.


2021 ◽  
Author(s):  
Masahiro Kuroda

Mixture models become increasingly popular due to their modeling flexibility and are applied to the clustering and classification of heterogeneous data. The EM algorithm is largely used for the maximum likelihood estimation of mixture models because the algorithm is stable in convergence and simple in implementation. Despite such advantages, it is pointed out that the EM algorithm is local and has slow convergence as the main drawback. To avoid the local convergence of the EM algorithm, multiple runs from several different initial values are usually used. Then the algorithm may take a large number of iterations and long computation time to find the maximum likelihood estimates. The speedup of computation of the EM algorithm is available for these problems. We give the algorithms to accelerate the convergence of the EM algorithm and apply them to mixture model estimation. Numerical experiments examine the performance of the acceleration algorithms in terms of the number of iterations and computation time.


2014 ◽  
Vol 1049-1050 ◽  
pp. 1343-1346
Author(s):  
Yong Li

EM algorithm is a very popular algorithm in missing data analysis. However,The variance of the estimator from EM is intractable. In this paper, we propose the supplemented EM algorithm for computing the variance that do not require computation and inversion of the information matrix.


2012 ◽  
Vol 2012 ◽  
pp. 1-19 ◽  
Author(s):  
Qihong Duan ◽  
Xiang Chen ◽  
Dengfu Zhao ◽  
Zheng Zhao

We study a multistate model for an aging piece of equipment under condition-based maintenance and apply an expectation maximization algorithm to obtain maximum likelihood estimates of the model parameters. Because of the monitoring discontinuity, we cannot observe any state's duration. The observation consists of the equipment's state at an inspection or right after a repair. Based on a proper construction of stochastic processes involved in the model, calculation of some probabilities and expectations becomes tractable. Using these probabilities and expectations, we can apply an expectation maximization algorithm to estimate the parameters in the model. We carry out simulation studies to test the accuracy and the efficiency of the algorithm.


2020 ◽  
Vol 72 (2) ◽  
pp. 122-132
Author(s):  
Junfeng Liu ◽  
Xiaoxia Zhang

For efficiently estimating the normal mean ([Formula: see text]) under right censoring (threshold =[Formula: see text], [Formula: see text] is known), we compare two approaches within the maximum likelihood estimation (MLE) framework. Approach I is a hierarchical MLE for which only the empirical censoring probability is utilized. Approach II is the direct MLE for which expectation-maximization (EM) algorithm is applied to all individual observations. We use discrete approximation to explain that the asymptotic variance of Approach II estimate equals the inverse Fisher information calculated from the full log-likelihood. We prove that Approach II gives a uniformly smaller asymptotic variance than Approach I and the variance ratio is a decreasing function of [Formula: see text]. We further prove some supportive results and graphically demonstrate that EM algorithm monotonically converges to the unique MLE.


Sign in / Sign up

Export Citation Format

Share Document