em algorithms
Recently Published Documents


TOTAL DOCUMENTS

129
(FIVE YEARS 13)

H-INDEX

25
(FIVE YEARS 1)

Author(s):  
S. Nickolas ◽  
K. Shobha

Data pre-processing plays a vital role in the life cycle of data mining for accomplishing quality outcomes. In this paper, it is experimentally shown the importance of data pre-processing to achieve highly accurate classifier outcomes by imputing missing values using a novel imputation method, CLUSTPRO, by selecting highly correlated features using Correlation-based Variable Selection (CVS) and by handling imbalanced data using Synthetic Minority Over-sampling Technique (SMOTE). The proposed CLUSTPRO method makes use of Random Forest (RF) and Expectation Maximization (EM) algorithms to impute missing. The imputed results are evaluated using standard evaluation metrics. The CLUSTPRO imputation method outperforms existing, state-of-the-art imputation methods. The combined approach of imputation, feature selection, and imbalanced data handling techniques has significantly contributed to attaining an improved classification accuracy (AUC curve) of 40%–50% in comparison with results obtained without any pre-processing.


2021 ◽  
Vol 14 (2) ◽  
pp. 4-12
Author(s):  
Svetlana Evdokimova ◽  
Aleksandr Zhuravlev ◽  
Tatyana Novikova

This paper analyzes the buyers of the BigCar store, which sells spare parts for trucks, using clustering methods. The algorithms of k-means, g-means, EM and construction of Kohonen networks are considered. For their implementation, the Loginom Community analytical platform is used. Based on sales data for 3 years, buyers are divided into 3 clusters by implementing the k-means, EM algorithms and building a self-organizing Kohonen network. An EM algorithm was also performed with automatic determination of the number of clusters and g-means, which divided buyers into 9 and 10 clusters. The analysis of the resulting clusters showed that the results of the k-means and Kohonen algorithms are better suited to increase sales efficiency.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Yupeng Li ◽  
Jianhua Zhang ◽  
Ruisi He ◽  
Lei Tian ◽  
Hewen Wei

In this paper, the Gaussian mixture model (GMM) is introduced to the channel multipath clustering. In the GMM field, the expectation-maximization (EM) algorithm is usually utilized to estimate the model parameters. However, the EM widely converges into local optimization. To address this issue, a hybrid differential evolution (DE) and EM (DE-EM) algorithms are proposed in this paper. To be specific, the DE is employed to initialize the GMM parameters. Then, the parameters are estimated with the EM algorithm. Thanks to the global searching ability of DE, the proposed hybrid DE-EM algorithm is more likely to obtain the global optimization. Simulations demonstrate that our proposed DE-EM clustering algorithm can significantly improve the clustering performance.


2019 ◽  
Vol 11 (24) ◽  
pp. 2994 ◽  
Author(s):  
Naiallen Carolyne Rodrigues Lima Carvalho ◽  
Leonardo Sant’Anna Bins ◽  
Sidnei João Siqueira Sant’Anna

This paper address unsupervised classification strategies applied to Polarimetric Synthetic Aperture Radar (PolSAR) images. We analyze the performance of complex Wishart distribution, which is a widely used model for multi-look PolSAR images, and the robustness of five stochastic distances (Bhattacharyya, Kullback-Leibler, Rényi, Hellinger and Chi-square) between Wishart distributions. Two unsupervised classification strategies were chosen: the Stochastic Clustering (SC) algorithm, which is based on the K-means algorithm but uses stochastic distance as the similarity metric, and the Expectation-Maximization (EM) algorithm for Wishart Mixture Model. With the aim of assessing the performance of all algorithms presented here, we performed a Monte Carlo simulation over a set of simulated PolSAR images. A second experiment was conducted using the study area of Tapajós National Forest and the surrounding area, in Brazilian Amazon Forest. The PolSAR images were obtained by the satellite PALSAR. The results, in both experiments, suggest that the EM algorithm and the SC with Hellinger and the SC with Bhattacharyya distance provide a better classification performance. We also analyze the initialization problem for SC and EM algorithms, and we demonstrate how the initial centroid choice influences the final classification result.


Sign in / Sign up

Export Citation Format

Share Document