The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering

Journal of Classification ◽

10.1007/s00357-009-9037-9 ◽

2009 ◽

Vol 26 (3) ◽

pp. 249-277 ◽

Author(s):

Fionn Murtagh

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Model Based ◽

Data Application ◽

Download Full-text

Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

Advances in Data Analysis and Classification ◽

10.1007/s11634-013-0133-7 ◽

2013 ◽

Vol 7 (3) ◽

pp. 281-300 ◽

Author(s):

Anastasios Bellas ◽

Charles Bouveyron ◽

Marie Cottrell ◽

Jérôme Lacaille

Keyword(s):

Data Streams ◽

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Probabilistic Pca ◽

Download Full-text

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

10.1101/2019.12.13.876326 ◽

2019 ◽

Author(s):

Siva Rajesh Kasa ◽

Vaibhav Rajan

Keyword(s):

Automatic Differentiation ◽

High Dimensional Data ◽

Penalized Likelihood ◽

Gaussian Mixture ◽

High Dimensional ◽

Adjusted Rand Index ◽

Penalty Term ◽

Model Based Clustering ◽

Model Based ◽

Leibler Divergence

AbstractWe study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD in cases of misspecification, whereas on high dimensional data GD outperforms EM. We observe that both with EM and GD there are many solutions with high likelihood but poor cluster interpretation. To address this problem we design a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components. Closed form expressions for the gradients of this penalized likelihood are difficult to derive but AD can be done effortlessly, illustrating the advantage of AD-based optimization. Extensions of this penalty for high dimensional data and for model selection are discussed. Numerical experiments on synthetic and real datasets demonstrate the efficacy of clustering using the proposed penalized likelihood approach.

Download Full-text

Model-based clustering of high-dimensional data: A review

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2012.12.008 ◽

2014 ◽

Vol 71 ◽

pp. 52-78 ◽

Author(s):

Charles Bouveyron ◽

Camille Brunet-Saumard

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Download Full-text

Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning ◽

10.1016/j.ijar.2012.08.001 ◽

2013 ◽

Vol 54 (1) ◽

pp. 196-215 ◽

Author(s):

Leonard K.M. Poon ◽

Nevin L. Zhang ◽

Tengfei Liu ◽

April H. Liu

Keyword(s):

Variable Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Download Full-text

Model-based regression clustering for high-dimensional data: application to functional data

Advances in Data Analysis and Classification ◽

10.1007/s11634-016-0242-1 ◽

2016 ◽

Vol 11 (2) ◽

pp. 243-279 ◽

Author(s):

Emilie Devijver

Keyword(s):

Functional Data ◽

High Dimensional Data ◽

High Dimensional ◽

Model Based ◽

Data Application

Download Full-text

HDclassif: AnRPackage for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

Journal of Statistical Software ◽

10.18637/jss.v046.i06 ◽

2012 ◽

Vol 46 (6) ◽

Author(s):

Laurent Bergé ◽

Charles Bouveyron ◽

Stéphane Girard

Keyword(s):

Discriminant Analysis ◽

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Download Full-text

Model-based Clustering of High-Dimensional Data in Astrophysics

EAS Publications Series ◽

10.1051/eas/1677006 ◽

2016 ◽

Vol 77 ◽

pp. 91-119 ◽

Author(s):

C. Bouveyron

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Download Full-text

Feature selection algorithms for very high dimensional data and mixed data

10.32657/10356/41404 ◽

2008 ◽

Author(s):

Wen Yin Tang

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithms ◽

Download Full-text

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-06605-9_21 ◽

2014 ◽

pp. 247-258 ◽

Author(s):

Nguyen Thanh Tung ◽

Joshua Zhexue Huang ◽

Imran Khan ◽

Mark Junjie Li ◽

Graham Williams

Keyword(s):

Quantile Regression ◽

High Dimensional Data ◽

High Dimensional ◽

Download Full-text

Optimal properties of centroid-based classifiers for very high-dimensional data

The Annals of Statistics ◽

10.1214/09-aos736 ◽

2010 ◽

Vol 38 (2) ◽

pp. 1071-1093 ◽

Author(s):

Peter Hall ◽

Tung Pham

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Download Full-text