2017 ◽  
Vol 29 (4) ◽  
pp. 990-1020 ◽  
Author(s):  
Hien D. Nguyen ◽  
Geoffrey J. McLachlan ◽  
Pierre Orban ◽  
Pierre Bellec ◽  
Andrew L. Janke

Mixture of autoregressions (MoAR) models provide a model-based approach to the clustering of time series data. The maximum likelihood (ML) estimation of MoAR models requires evaluating products of large numbers of densities of normal random variables. In practical scenarios, these products converge to zero as the length of the time series increases, and thus the ML estimation of MoAR models becomes infeasible without the use of numerical tricks. We propose a maximum pseudolikelihood (MPL) estimation approach as an alternative to the use of numerical tricks. The MPL estimator is proved to be consistent and can be computed with an EM (expectation-maximization) algorithm. Simulations are used to assess the performance of the MPL estimator against that of the ML estimator in cases where the latter was able to be calculated. An application to the clustering of time series data arising from a resting state fMRI experiment is presented as a demonstration of the methodology.


2008 ◽  
Vol 26 (1) ◽  
pp. 78-89 ◽  
Author(s):  
Sylvia Fröhwirth-Schnatter ◽  
Sylvia Kaufmann

2021 ◽  
pp. 133-178
Author(s):  
Magy Seif El-Nasr ◽  
Truong Huy Nguyen Dinh ◽  
Alessandro Canossa ◽  
Anders Drachen

This chapter discusses different clustering methods and their application to game data. In particular, the chapter details K-means, Fuzzy C-Means, Hierarchical Clustering, Archetypical Analysis, and Model-based clustering techniques. It discusses the disadvantages and advantages of the different methods and discusses when you may use one method vs. the other. It also identifies and shows you ways to visualize the results to make sense of the resulting clusters. It also includes details on how one would evaluate such clusters or go about applying the algorithms to a game dataset. The chapter includes labs to delve deeper into the application of these algorithms on real game data.


2020 ◽  
Vol 13 (2) ◽  
pp. 178-187
Author(s):  
Farzane Ahmadi ◽  
Ali-Reza Abadi ◽  
Zahra Bazi ◽  
Abolfazl Movafagh

Background: Aging is an organized biological process that is regulated by highly interconnected pathways between different cells and tissues in the living organism. Identification of similar genes between tissues in different ages may also help to discover the general mechanism of aging or to discover more effective therapeutic decisions. Objective: Objective: According to the wide application of model-based clustering techniques, the aim is to evaluate the performance of the Mixture of Multivariate Normal Distributions (MMNDs) as a valid method for clustering time series gene expression data with the Mixture of Matrix-Variate Normal Distributions (MMVNDs). Methods: In this study, the expression of aging data from NCBI’s Gene Expression Omnibus was elaborated to utilize proper data. A set of common genes which were differentially expressed between different tissues were selected and then clustered together through two methods. Finally, the biological significance of clusters was evaluated, using their ability to find genes in the cell using Enricher. Results: The MMVNDs is more efficient to find co-express genes. Six clusters of genes were observed using the MMVNDs. According to the functional analysis, most genes in clusters 1-6 are related to the B-cell receptors and IgG immunoglobulin complex, proliferating cell nuclear antigen complex, the metabolic pathways of iron, fat, and body mass control, the defense against bacteria, the cancer development incidence, and the chronic kidney failure, respectively. Conclusion: Results showed that most biological changes of aging between tissues are related to the specific components of immune cells. Also, the application of MMVNDs can increase the ability to find similar genes.


Author(s):  
Siva Rajesh Kasa ◽  
Sakyajit Bhattacharya ◽  
Vaibhav Rajan

Abstract Motivation The identification of sub-populations of patients with similar characteristics, called patient subtyping, is important for realizing the goals of precision medicine. Accurate subtyping is crucial for tailoring therapeutic strategies that can potentially lead to reduced mortality and morbidity. Model-based clustering, such as Gaussian mixture models, provides a principled and interpretable methodology that is widely used to identify subtypes. However, they impose identical marginal distributions on each variable; such assumptions restrict their modeling flexibility and deteriorates clustering performance. Results In this paper, we use the statistical framework of copulas to decouple the modeling of marginals from the dependencies between them. Current copula-based methods cannot scale to high dimensions due to challenges in parameter inference. We develop HD-GMCM, that addresses these challenges and, to our knowledge, is the first copula-based clustering method that can fit high-dimensional data. Our experiments on real high-dimensional gene-expression and clinical datasets show that HD-GMCM outperforms state-of-the-art model-based clustering methods, by virtue of modeling non-Gaussian data and being robust to outliers through the use of Gaussian mixture copulas. We present a case study on lung cancer data from TCGA. Clusters obtained from HD-GMCM can be interpreted based on the dependencies they model, that offers a new way of characterizing subtypes. Empirically, such modeling not only uncovers latent structure that leads to better clustering but also meaningful clinical subtypes in terms of survival rates of patients. Availability and implementation An implementation of HD-GMCM in R is available at: https://bitbucket.org/cdal/hdgmcm/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document