the em algorithm
Recently Published Documents


TOTAL DOCUMENTS

794
(FIVE YEARS 37)

H-INDEX

61
(FIVE YEARS 0)

2021 ◽  
Vol 7 (4) ◽  
pp. 10-17
Author(s):  
M. Buranova ◽  
I Kartashevskiy

An accurate assessment of the quality of service parameters in modern information communication networks is a very important task. This paper proposes the use of hyperexponential distributions to solve the problem of approxi-mating an arbitrary probability density in the G/G/1 system for the case when the approximation by a system of the type H2/H2/1 is assumed. To determine the parameters of the probability density of the hyperexponential distribu-tion, it is proposed to use EM- algorithm that provides fairly simple use cases for uncorrelated flows. In this paper, we propose a variant of the EM algorithm implementation for determining the parameters of the hyperexponential distribution in the presence of correlation properties of the analyzed flow.



2021 ◽  
Author(s):  
Masahiro Kuroda

Mixture models become increasingly popular due to their modeling flexibility and are applied to the clustering and classification of heterogeneous data. The EM algorithm is largely used for the maximum likelihood estimation of mixture models because the algorithm is stable in convergence and simple in implementation. Despite such advantages, it is pointed out that the EM algorithm is local and has slow convergence as the main drawback. To avoid the local convergence of the EM algorithm, multiple runs from several different initial values are usually used. Then the algorithm may take a large number of iterations and long computation time to find the maximum likelihood estimates. The speedup of computation of the EM algorithm is available for these problems. We give the algorithms to accelerate the convergence of the EM algorithm and apply them to mixture model estimation. Numerical experiments examine the performance of the acceleration algorithms in terms of the number of iterations and computation time.



Author(s):  
Jakob Raymaekers ◽  
Peter Rousseeuw

We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellFlagger technique which combines lasso regression with a stepwise application of constructed cutoff values. The penalty term of the lasso has a physical interpretation as the total distance that suspicious cells need to move in order to bring their row into the fold. For estimating a cellwise robust covariance matrix we construct a detection-imputation method which alternates between flagging outlying cells and updating the covariance matrix as in the EM algorithm. The proposed methods are illustrated by simulations and on real data about volatile organic compounds in children.



Author(s):  
Lu Zhao ◽  
Xiaowei Xu ◽  
Runping Hou ◽  
Wangyuan Zhao ◽  
Hai Zhong ◽  
...  

Abstract Subtype classification plays a guiding role in the clinical diagnosis and treatment of non-small-cell lung cancer (NSCLC). However, due to the gigapixel of whole slide images (WSIs) and the absence of definitive morphological features, most automatic subtype classification methods for NSCLC require manually delineating the regions of interest (ROIs) on WSIs. In this paper, a weakly supervised framework is proposed for accurate subtype classification while freeing pathologists from pixel-level annotation. With respect to the characteristics of histopathological images, we design a two-stage structure with ROI localization and subtype classification. We first develop a method called MR-EM-CNN (multi-resolution expectation-maximization convolutional neural network) to locate ROIs for subsequent subtype classification. The EM algorithm is introduced to select the discriminative image patches for training a patch-wise network, with only WSI-wise labels available. A multi-resolution mechanism is designed for fine localization, similar to the coarse-to-fine process of manual pathological analysis. In the second stage, we build a novel hierarchical attention multi-scale network (HMS) for subtype classification. HMS can capture multi-scale features flexibly driven by the attention module and implement hierarchical features interaction. Experimental results on the 1002-patient Cancer Genome Atlas dataset achieved an AUC of 0.9602 in the ROI localization and an AUC of 0.9671 for subtype classification. The proposed method shows superiority compared with other algorithms in the subtype classification of NSCLC. The proposed framework can also be extended to other classification tasks with WSIs.



2021 ◽  
Author(s):  
◽  
Yuki Fujita

<p>This goal of this research is to investigate associations between presences of fish species, space, and time in a selected set of areas in New Zealand waters. In particular we use fish abundance indices on the Chatham Rise from scientific surveys in 2002, 2011, 2012, and 2013. The data are collected in annual bottom trawl surveys carried out by the National Institute of Water and Atmospheric Research (NIWA). This research applies clustering via finite mixture models that gives a likelihood-based foundation for the analysis. We use the methods developed by Pledger and Arnold (2014) to cluster species into common groups, conditional on the measured covariates (body size, depth, and water temperature). The project for the first time applies these methods incorporating covariates, and we use simple binary presence/absence data rather than abundances. The models are fitted using the Expectation-Maximization (EM) algorithm. The performance of the models is evaluated by a simulation study. We discuss the advantages and the disadvantages of the EM algorithm. We then introduce a newly developed function clustglm (Pledger et al., 2015) in R, which implements this clustering methodology, and perform our analysis using this function on the real-life presence/absence data. The results are analysed and interpreted from a biological point of view. We present a variety of visualisations of the models to assist in their interpretation. We found that depth is the most important factor to explain the data.</p>



2021 ◽  
Author(s):  
◽  
Yuki Fujita

<p>This goal of this research is to investigate associations between presences of fish species, space, and time in a selected set of areas in New Zealand waters. In particular we use fish abundance indices on the Chatham Rise from scientific surveys in 2002, 2011, 2012, and 2013. The data are collected in annual bottom trawl surveys carried out by the National Institute of Water and Atmospheric Research (NIWA). This research applies clustering via finite mixture models that gives a likelihood-based foundation for the analysis. We use the methods developed by Pledger and Arnold (2014) to cluster species into common groups, conditional on the measured covariates (body size, depth, and water temperature). The project for the first time applies these methods incorporating covariates, and we use simple binary presence/absence data rather than abundances. The models are fitted using the Expectation-Maximization (EM) algorithm. The performance of the models is evaluated by a simulation study. We discuss the advantages and the disadvantages of the EM algorithm. We then introduce a newly developed function clustglm (Pledger et al., 2015) in R, which implements this clustering methodology, and perform our analysis using this function on the real-life presence/absence data. The results are analysed and interpreted from a biological point of view. We present a variety of visualisations of the models to assist in their interpretation. We found that depth is the most important factor to explain the data.</p>



2021 ◽  
Author(s):  
◽  
Faezeh Frouzesh

<p>The use of mixture models in statistical analysis is increasing for datasets with heterogeneity and/or redundancy in the data. They are likelihood based models, and maximum likelihood estimates of parameters are obtained by the use of the expectation maximization (EM) algorithm. Multi-modality of the likelihood surface means that the EM algorithm is highly dependent on starting points and poorly chosen initial points for the optimization may lead to only a local maximum, not the global maximum. In this thesis, different methods of choosing initialising points in the EM algorithm will be evaluated and two procedures which make intelligent choices of possible starting points and fast evaluations of their usefulness will be presented. Furthermore, several approaches to measure the best model to fit from a set of models for a given dataset, will be investigated and some lemmas and theorems are presented to illustrate the information criterion. This work introduces two novel and heuristic methods to choose the best starting points for the EM algorithm that are named Combined method and Hybrid PSO (Particle Swarm Optimisation). Combined method is based on a combination of two clustering methods that leads to finding the best starting points in the EM algorithm in comparison with the different initialisation point methods. Hybrid PSO is a hybrid method of Particle Swarm Optimization (PSO) as a global optimization approach and the EM algorithm as a local search to overcome the EM algorithm’s problem that makes it independent to starting points. Finally it will be compared with different methods of choosing starting points in the EM algorithm.</p>



2021 ◽  
Author(s):  
◽  
Faezeh Frouzesh

<p>The use of mixture models in statistical analysis is increasing for datasets with heterogeneity and/or redundancy in the data. They are likelihood based models, and maximum likelihood estimates of parameters are obtained by the use of the expectation maximization (EM) algorithm. Multi-modality of the likelihood surface means that the EM algorithm is highly dependent on starting points and poorly chosen initial points for the optimization may lead to only a local maximum, not the global maximum. In this thesis, different methods of choosing initialising points in the EM algorithm will be evaluated and two procedures which make intelligent choices of possible starting points and fast evaluations of their usefulness will be presented. Furthermore, several approaches to measure the best model to fit from a set of models for a given dataset, will be investigated and some lemmas and theorems are presented to illustrate the information criterion. This work introduces two novel and heuristic methods to choose the best starting points for the EM algorithm that are named Combined method and Hybrid PSO (Particle Swarm Optimisation). Combined method is based on a combination of two clustering methods that leads to finding the best starting points in the EM algorithm in comparison with the different initialisation point methods. Hybrid PSO is a hybrid method of Particle Swarm Optimization (PSO) as a global optimization approach and the EM algorithm as a local search to overcome the EM algorithm’s problem that makes it independent to starting points. Finally it will be compared with different methods of choosing starting points in the EM algorithm.</p>



Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2164
Author(s):  
Héctor J. Gómez ◽  
Diego I. Gallardo ◽  
Karol I. Santoro

In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is defined as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.



Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2834
Author(s):  
José Antonio Roldán-Nofuentes ◽  
Saad Bouh Regad

The average kappa coefficient of a binary diagnostic test is a parameter that measures the average beyond-chance agreement between the diagnostic test and the gold standard. This parameter depends on the accuracy of the diagnostic test and also on the disease prevalence. This article studies the comparison of the average kappa coefficients of two binary diagnostic tests when the gold standard is not applied to all individuals in a random sample. In this situation, known as partial disease verification, the disease status of some individuals is a missing piece of data. Assuming that the missing data mechanism is missing at random, the comparison of the average kappa coefficients is solved by applying two computational methods: the EM algorithm and the SEM algorithm. With the EM algorithm the parameters are estimated and with the SEM algorithm their variances-covariances are estimated. Simulation experiments have been carried out to study the sizes and powers of the hypothesis tests studied, obtaining that the proposed method has good asymptotic behavior. A function has been written in R to solve the proposed problem, and the results obtained have been applied to the diagnosis of Alzheimer's disease.



Sign in / Sign up

Export Citation Format

Share Document