scholarly journals Constrained parsimonious model-based clustering

2021 ◽  
Vol 32 (1) ◽  
Author(s):  
Luis A. García-Escudero ◽  
Agustín Mayo-Iscar ◽  
Marco Riani

AbstractA new methodology for constrained parsimonious model-based clustering is introduced, where some tuning parameter allows to control the strength of these constraints. The methodology includes the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. This is done in a natural way by filling the gap among models and providing a smooth transition among them. The methodology provides mathematically well-defined problems and is also useful to prevent us from obtaining spurious solutions. Novel information criteria are proposed to help the user in choosing parameters. The interest of the proposed methodology is illustrated through simulation studies and a real-data application on COVID data.

2012 ◽  
Vol 51 (02) ◽  
pp. 178-186 ◽  
Author(s):  
B. Hofner ◽  
M. Schmid ◽  
A. Mayr

SummaryObjectives: Component-wise boosting algorithms have evolved into a popular estimation scheme in biomedical regression settings. The iteration number of these algorithms is the most important tuning parameter to optimize their performance. To date, no fully automated strategy for determining the optimal stopping iteration of boosting algorithms has been proposed.Methods: We propose a fully data-driven sequential stopping rule for boosting algorithms. It combines resampling methods with a modified version of an earlier stopping approach that depends on AIC-based information criteria. The new “subsampling after AIC” stopping rule is applied to component-wise gradient boosting algorithms.Results: The newly developed sequential stopping rule outperformed earlier approaches if applied to both simulated and real data. Specifically, it improved purely AIC-based methods when used for the microarray-based prediction of the recurrence of meta-stases for stage II colon cancer patients.Conclusions: The proposed sequential stopping rule for boosting algorithms can help to identify the optimal stopping iteration already during the fitting process of the algorithm, at least for the most common loss functions.


2021 ◽  
Vol 16 ◽  
Author(s):  
Zhaoyang Liu ◽  
Hongsheng Yin ◽  
Shutao Chen ◽  
Hui Liu ◽  
Jia Meng ◽  
...  

Background: m6A methylation is a ubiquitous post-transcriptional modification that exists in mammals. MeRIP-seq technology makes the acquisition of m6A data in the whole transcriptome under different conditions realizable. The specific regulation of the enzyme will present co-methylation module on m6A methylation level data. Thus, mining the co-methylation module from which can help to unveil the mechanism of m<sup>6</sup>A methylation modification and its mechanism in the occurrence and development of complex diseases such as cancer. Objective: To develop a clustering algorithm that can effectively realize the mining of m6 co-methylation module. Method: In this study, a novel beta mixture model-based clustering algorithm named MBMM was proposed, which is based on the EM framework and introduces the method of moment estimating in M-step for parameter estimation to tackle the high-dimensional small sample m6A data. Simulation research was employed to evaluate the clustering performance of the proposed algorithm, and by which the co-methylation module mining was done based on real data. Biological significance correlation analysis was employed to explore whether the clustering results are co-methylation modules. Results and Conclusion: Simulation research demonstrated that MBMM performed out than other clustering algorithms. In real data, seven co-methylation modules were found by MBMM. Six m6A-related pathways specific analysis showed that six co-methylation modules were enriched in the pathway and were different. Five enzymes substrate-specific analysis revealed that seven co-methylation modules expressed varying degrees of enrichment. Gene Ontology enrichment analysis indicated that these modules may be regulated by enzymes while having potential functional specificity.


Mathematics ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 597 ◽  
Author(s):  
Vincent Vandewalle

In model based clustering, it is often supposed that only one clustering latent variable explains the heterogeneity of the whole dataset. However, in many cases several latent variables could explain the heterogeneity of the data at hand. Finding such class variables could result in a richer interpretation of the data. In the continuous data setting, a multi-partition model based clustering is proposed. It assumes the existence of several latent clustering variables, each one explaining the heterogeneity of the data with respect to some clustering subspace. It allows to simultaneously find the multi-partitions and the related subspaces. Parameters of the model are estimated through an EM algorithm relying on a probabilistic reinterpretation of the factorial discriminant analysis. A model choice strategy relying on the BIC criterion is proposed to select to number of subspaces and the number of clusters by subspace. The obtained results are thus several projections of the data, each one conveying its own clustering of the data. Model’s behavior is illustrated on simulated and real data.


Author(s):  
Charles Bouveyron ◽  
Gilles Celeux ◽  
T. Brendan Murphy ◽  
Adrian E. Raftery

2020 ◽  
pp. 509-529
Author(s):  
G.J. McLachlan ◽  
S.I. Rathnayake ◽  
S.X. Lee

Sign in / Sign up

Export Citation Format

Share Document