Model-based regression clustering for high-dimensional data: application to functional data

Classification and prediction problems using spectral data lead to high-dimensional data sets. Spectral data are, however, different from most other high-dimensional data sets in that information usually varies smoothly with wavelength, suggesting that fitted models should also vary smoothly with wavelength. Functional data analysis, widely used in the analysis of spectral data, meets this objective by changing perspective from the raw spectra to approximations using smooth basis functions. This paper explores linear regression and linear discriminant analysis fitted directly to the spectral data, imposing penalties on the values and roughness of the fitted coefficients, and shows by example that this can lead to better fits than existing standard methodologies.

Download Full-text

Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

Advances in Data Analysis and Classification ◽

10.1007/s11634-013-0133-7 ◽

2013 ◽

Vol 7 (3) ◽

pp. 281-300 ◽

Cited By ~ 7

Author(s):

Anastasios Bellas ◽

Charles Bouveyron ◽

Marie Cottrell ◽

Jérôme Lacaille

Keyword(s):

Data Streams ◽

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Probabilistic Pca ◽

Model Based

Download Full-text

Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions

Rough Sets and Knowledge Technology - Lecture Notes in Computer Science ◽

10.1007/11795131_70 ◽

2006 ◽

pp. 482-489 ◽

Cited By ~ 11

Author(s):

Julio J. Valdés ◽

Alan J. Barton

Keyword(s):

Breast Cancer ◽

High Dimensional Data ◽

High Dimensional ◽

Cancer Gene ◽

Gene Expressions ◽

Data Application ◽

Relevant Attribute ◽

Breast Cancer Gene

Download Full-text

Variable selection for partially varying coefficient model based on modal regression under high dimensional data

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2020.1747081 ◽

2020 ◽

pp. 1-17

Author(s):

Yafeng Xia ◽

Lirong Zhang ◽

Aiping Zhang

Keyword(s):

Variable Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Varying Coefficient Model ◽

Varying Coefficient ◽

Model Based ◽

Modal Regression ◽

Selection For

Download Full-text

A Fuzzy Deep Model Based on Fuzzy Restricted Boltzmann Machines for High-dimensional Data Classification

IEEE Transactions on Fuzzy Systems ◽

10.1109/tfuzz.2019.2902111 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 6

Author(s):

Shuang Feng ◽

C. L. Philip Chen ◽

Chun-Yang Zhang

Keyword(s):

High Dimensional Data ◽

Data Classification ◽

High Dimensional ◽

Restricted Boltzmann Machines ◽

Boltzmann Machines ◽

Model Based ◽

Deep Model

Download Full-text

Group classification based on high-dimensional data: application to differential scanning calorimetry plasma thermogram analysis of cervical cancer and control samples

Open Access Medical Statistics ◽

10.2147/oams.s40069 ◽

2013 ◽

pp. 1 ◽

Cited By ~ 5

Author(s):

Shesh Rai ◽

Pan ◽

Cambon ◽

Chaires ◽

Garbett

Keyword(s):

Cervical Cancer ◽

Differential Scanning Calorimetry ◽

High Dimensional Data ◽

Group Classification ◽

High Dimensional ◽

Scanning Calorimetry ◽

Data Application ◽

And Control ◽

Control Samples

Download Full-text

Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

The Annals of Applied Statistics ◽

10.1214/09-aoas279 ◽

2010 ◽

Vol 4 (1) ◽

pp. 396-421 ◽

Cited By ~ 28

Author(s):

Thomas Brendan Murphy ◽

Nema Dean ◽

Adrian E. Raftery

Keyword(s):

Discriminant Analysis ◽

Variable Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Food Authenticity ◽

Model Based

Download Full-text

Model-based Clustering using Automatic Differentiation: Confronting Misspecification and High-Dimensional Data

10.1101/2019.12.13.876326 ◽

2019 ◽

Author(s):

Siva Rajesh Kasa ◽

Vaibhav Rajan

Keyword(s):

Automatic Differentiation ◽

High Dimensional Data ◽

Penalized Likelihood ◽

Gaussian Mixture ◽

High Dimensional ◽

Adjusted Rand Index ◽

Penalty Term ◽

Model Based Clustering ◽

Model Based ◽

Leibler Divergence

AbstractWe study two practically important cases of model based clustering using Gaussian Mixture Models: (1) when there is misspecification and (2) on high dimensional data, in the light of recent advances in Gradient Descent (GD) based optimization using Automatic Differentiation (AD). Our simulation studies show that EM has better clustering performance, measured by Adjusted Rand Index, compared to GD in cases of misspecification, whereas on high dimensional data GD outperforms EM. We observe that both with EM and GD there are many solutions with high likelihood but poor cluster interpretation. To address this problem we design a new penalty term for the likelihood based on the Kullback Leibler divergence between pairs of fitted components. Closed form expressions for the gradients of this penalized likelihood are difficult to derive but AD can be done effortlessly, illustrating the advantage of AD-based optimization. Extensions of this penalty for high dimensional data and for model selection are discussed. Numerical experiments on synthetic and real datasets demonstrate the efficacy of clustering using the proposed penalized likelihood approach.

Download Full-text

Model-based clustering of high-dimensional data: A review

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2012.12.008 ◽

2014 ◽

Vol 71 ◽

pp. 52-78 ◽

Cited By ~ 155

Author(s):

Charles Bouveyron ◽

Camille Brunet-Saumard

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Model Based

Download Full-text