finite mixture models
Recently Published Documents


TOTAL DOCUMENTS

308
(FIVE YEARS 81)

H-INDEX

30
(FIVE YEARS 4)

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
A. S. Al-Moisheer

Finite mixture models provide a flexible tool for handling heterogeneous data. This paper introduces a new mixture model which is the mixture of Lindley and lognormal distributions (MLLND). First, the model is formulated, and some of its statistical properties are studied. Next, maximum likelihood estimation of the parameters of the model is considered, and the performance of the estimators of the parameters of the proposed models is evaluated via simulation. Also, the flexibility of the proposed mixture distribution is demonstrated by showing its superiority to fit a well-known real data set of 128 bladder cancer patients compared to several mixture and nonmixture distributions. The Kolmogorov Smirnov test and some information criteria are used to compare the fitted models to the real dataset. Finally, the results are verified using several graphical methods.


2021 ◽  
pp. 0193841X2110656
Author(s):  
Zachary K. Collier ◽  
Haobai Zhang ◽  
Bridgette Johnson

Background Finite mixture models cluster individuals into latent subgroups based on observed traits. However, inaccurate enumeration of clusters can have lasting implications on policy decisions and allocations of resources. Applied and methodological researchers accept no obvious best model fit statistic, and different measures could suggest different numbers of latent clusters. Objectives The purpose of this article is to evaluate and compare different cluster enumeration techniques. Research Design Study I demonstrates how recently proposed resampling methods result in no precise number of clusters on which all fit statistics agree. We recommend the pre-processing method in Study II as an alternative. Both studies used nationally representative data on working memory, cognitive flexibility, and inhibitory control. Conclusions The data plus priors method shows promise to address inconsistencies among fit measures and help applied researchers using finite mixture models in the future.


2021 ◽  
pp. 105713
Author(s):  
Robert B. Durand ◽  
William H. Greene ◽  
Mark N. Harris ◽  
Joye Khoo

PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260748
Author(s):  
Ibrahim Al-Sumaih ◽  
Michael Donnelly ◽  
Ciaran O’Neill

Background Recorded serum 25(OH)D in survey data varies with observed and unobserved respondent characteristics. The aim of this study was to expose latent population sub-groups and examine variation across groups regarding relationships between serum 25(OH)D and observable characteristics. Methods This study explored the role of unobserved heterogeneity on associations between surveyed 25(OH)D and various factors using a sample (n = 2,641) extracted from the Saudi Health Interview Survey (2013). Linear regression and finite mixture models (FMM) were estimated and compared. The number of latent classes in the FMM was chosen based on BIC score. Result Three latent classes were identified. Class I (39.82%), class II (41.03%), and class III (19.15%) with mean 25(OH)D levels of 22.79, 34.88, and 57.45 ng/ml respectively. Distinct patterns of associations with nutrition, behaviour and socio-demographic variables were recorded across classes that were not revealed in pooled linear regression. Conclusion FMM has the potential to provide additional insights on the relationship between 25(OH)D levels and observable characteristics. It should be more widely considered as a method of investigation in this area.


2021 ◽  
Author(s):  
◽  
Yuki Fujita

<p>This goal of this research is to investigate associations between presences of fish species, space, and time in a selected set of areas in New Zealand waters. In particular we use fish abundance indices on the Chatham Rise from scientific surveys in 2002, 2011, 2012, and 2013. The data are collected in annual bottom trawl surveys carried out by the National Institute of Water and Atmospheric Research (NIWA). This research applies clustering via finite mixture models that gives a likelihood-based foundation for the analysis. We use the methods developed by Pledger and Arnold (2014) to cluster species into common groups, conditional on the measured covariates (body size, depth, and water temperature). The project for the first time applies these methods incorporating covariates, and we use simple binary presence/absence data rather than abundances. The models are fitted using the Expectation-Maximization (EM) algorithm. The performance of the models is evaluated by a simulation study. We discuss the advantages and the disadvantages of the EM algorithm. We then introduce a newly developed function clustglm (Pledger et al., 2015) in R, which implements this clustering methodology, and perform our analysis using this function on the real-life presence/absence data. The results are analysed and interpreted from a biological point of view. We present a variety of visualisations of the models to assist in their interpretation. We found that depth is the most important factor to explain the data.</p>


2021 ◽  
Author(s):  
◽  
Yuki Fujita

<p>This goal of this research is to investigate associations between presences of fish species, space, and time in a selected set of areas in New Zealand waters. In particular we use fish abundance indices on the Chatham Rise from scientific surveys in 2002, 2011, 2012, and 2013. The data are collected in annual bottom trawl surveys carried out by the National Institute of Water and Atmospheric Research (NIWA). This research applies clustering via finite mixture models that gives a likelihood-based foundation for the analysis. We use the methods developed by Pledger and Arnold (2014) to cluster species into common groups, conditional on the measured covariates (body size, depth, and water temperature). The project for the first time applies these methods incorporating covariates, and we use simple binary presence/absence data rather than abundances. The models are fitted using the Expectation-Maximization (EM) algorithm. The performance of the models is evaluated by a simulation study. We discuss the advantages and the disadvantages of the EM algorithm. We then introduce a newly developed function clustglm (Pledger et al., 2015) in R, which implements this clustering methodology, and perform our analysis using this function on the real-life presence/absence data. The results are analysed and interpreted from a biological point of view. We present a variety of visualisations of the models to assist in their interpretation. We found that depth is the most important factor to explain the data.</p>


2021 ◽  
Author(s):  
◽  
Daniel Fernández Martínez

<p>Many of the methods which deal with the reduction of dimensionality in matrices of data are based on mathematical techniques. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. Furthermore, the use of ordinal data is very common (e.g. Likert or Braun-Blanquet scale) and the clustering methods in common use treat ordered categorical variables as nominal or continuous rather than as true ordinal data. Recently a group of likelihood-based finite mixture models for binary or count data has been developed (Pledger and Arnold, 2014). This thesis extends this idea and establishes novel likelihood-based multivariate methods for data reduction of a matrix containing ordinal data. This new approach applies fuzzy clustering via finite mixtures to the ordered stereotype model (Fernández et al., 2014a). Fuzzy allocation of rows and columns to corresponding clusters is achieved by performing the EM algorithm, and also Bayesian model fitting is obtained by performing a reversible jump MCMC sampler. Their performances for one-dimensional clustering are compared. Simulation studies and three real data sets are used to illustrate the application of these approaches and also to present novel data visualisation tools for depicting the fuzziness of the clustering results for ordinal data. Additionally, a simulation study is set up to empirically establish a relationship between our likelihood-based methodology and the performance of eleven information criteria in common use. Finally, clustering comparisons between count data and categorising the data as ordinal over a same data set are performed and results are analysed and presented.</p>


2021 ◽  
Author(s):  
◽  
Daniel Fernández Martínez

<p>Many of the methods which deal with the reduction of dimensionality in matrices of data are based on mathematical techniques. In general, it is not possible to use statistical inferences or select the appropriateness of a model via information criteria with these techniques because there is no underlying probability model. Furthermore, the use of ordinal data is very common (e.g. Likert or Braun-Blanquet scale) and the clustering methods in common use treat ordered categorical variables as nominal or continuous rather than as true ordinal data. Recently a group of likelihood-based finite mixture models for binary or count data has been developed (Pledger and Arnold, 2014). This thesis extends this idea and establishes novel likelihood-based multivariate methods for data reduction of a matrix containing ordinal data. This new approach applies fuzzy clustering via finite mixtures to the ordered stereotype model (Fernández et al., 2014a). Fuzzy allocation of rows and columns to corresponding clusters is achieved by performing the EM algorithm, and also Bayesian model fitting is obtained by performing a reversible jump MCMC sampler. Their performances for one-dimensional clustering are compared. Simulation studies and three real data sets are used to illustrate the application of these approaches and also to present novel data visualisation tools for depicting the fuzziness of the clustering results for ordinal data. Additionally, a simulation study is set up to empirically establish a relationship between our likelihood-based methodology and the performance of eleven information criteria in common use. Finally, clustering comparisons between count data and categorising the data as ordinal over a same data set are performed and results are analysed and presented.</p>


Sign in / Sign up

Export Citation Format

Share Document