Top-down mining of frequent closed patterns from very high dimensional data

Information Sciences ◽

10.1016/j.ins.2008.11.033 ◽

2009 ◽

Vol 179 (7) ◽

pp. 899-924 ◽

Author(s):

Hongyan Liu ◽

Xiaoyu Wang ◽

Jun He ◽

Jiawei Han ◽

Dong Xin ◽

...

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Top Down ◽

Download Full-text

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach

Proceedings of the 2006 SIAM International Conference on Data Mining ◽

10.1137/1.9781611972764.25 ◽

2006 ◽

Author(s):

Hongyan Liu ◽

Jiawei Han ◽

Dong Xin ◽

Zheng Shao

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Frequent Patterns ◽

Top Down ◽

Download Full-text

Top-Down Mining of Interesting Patterns from Very High Dimensional Data

22nd International Conference on Data Engineering (ICDE'06) ◽

10.1109/icde.2006.161 ◽

2006 ◽

Author(s):

Hongyan Liu ◽

Jiawei Han ◽

Dong Xin ◽

Zheng Shao

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Top Down ◽

Download Full-text

Feature selection algorithms for very high dimensional data and mixed data

10.32657/10356/41404 ◽

2008 ◽

Author(s):

Wen Yin Tang

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithms ◽

Download Full-text

Extensions to Quantile Regression Forests for Very High-Dimensional Data

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-319-06605-9_21 ◽

2014 ◽

pp. 247-258 ◽

Author(s):

Nguyen Thanh Tung ◽

Joshua Zhexue Huang ◽

Imran Khan ◽

Mark Junjie Li ◽

Graham Williams

Keyword(s):

Quantile Regression ◽

High Dimensional Data ◽

High Dimensional ◽

Download Full-text

Optimal properties of centroid-based classifiers for very high-dimensional data

The Annals of Statistics ◽

10.1214/09-aos736 ◽

2010 ◽

Vol 38 (2) ◽

pp. 1071-1093 ◽

Author(s):

Peter Hall ◽

Tung Pham

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Download Full-text

Statistical analysis of very high-dimensional data sets of hierarchically structured binary variables with missing data: An application to marine corps readiness evaluations

Naval Research Logistics Quarterly ◽

10.1002/nav.3800320310 ◽

1985 ◽

Vol 32 (3) ◽

pp. 467-490 ◽

Author(s):

S. Zacks ◽

W. H. Marlow ◽

S. S. Brier

Keyword(s):

Statistical Analysis ◽

Missing Data ◽

High Dimensional Data ◽

Marine Corps ◽

High Dimensional ◽

Data Sets ◽

Binary Variables ◽

Download Full-text

A fast farthest neighbor search algorithm for very high dimensional data

2016 19th International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2016.7860222 ◽

2016 ◽

Author(s):

Shafin Rahman ◽

Mrigank Rochan

Keyword(s):

Search Algorithm ◽

High Dimensional Data ◽

High Dimensional ◽

Neighbor Search ◽

Download Full-text

Using Evidence of Mixed Populations to Select Variables for Clustering Very High-Dimensional Data

Journal of the American Statistical Association ◽

10.1198/jasa.2010.tm09404 ◽

2010 ◽

Vol 105 (490) ◽

pp. 798-809 ◽

Author(s):

Yao-ban Chan ◽

Peter Hall

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Mixed Populations ◽

Download Full-text

A Preview on Subspace Clustering of High Dimensional Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v6i3.4466 ◽

2013 ◽

Vol 6 (3) ◽

pp. 441-448 ◽

Author(s):

Sajid Nagi ◽

Dhruba Kumar Bhattacharyya ◽

Jugal K. Kalita

Keyword(s):

Search Strategy ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Expression Data ◽

Clustering Methods ◽

Top Down ◽

Data Points ◽

Low Dimensional ◽

When clustering high dimensional data, traditional clustering methods are found to be lacking since they consider all of the dimensions of the dataset in discovering clusters whereas only some of the dimensions are relevant. This may give rise to subspaces within the dataset where clusters may be found. Using feature selection, we can remove irrelevant and redundant dimensions by analyzing the entire dataset. The problem of automatically identifying clusters that exist in multiple and maybe overlapping subspaces of high dimensional data, allowing better clustering of the data points, is known as Subspace Clustering. There are two major approaches to subspace clustering based on search strategy. Top-down algorithms find an initial clustering in the full set of dimensions and evaluate the subspaces of each cluster, iteratively improving the results. Bottom-up approaches start from finding low dimensional dense regions, and then use them to form clusters. Based on a survey on subspace clustering, we identify the challenges and issues involved with clustering gene expression data.

Download Full-text

The Remarkable Simplicity of Very High Dimensional Data: Application of Model-Based Clustering

Journal of Classification ◽

10.1007/s00357-009-9037-9 ◽

2009 ◽

Vol 26 (3) ◽

pp. 249-277 ◽

Author(s):

Fionn Murtagh

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Model Based Clustering ◽

Model Based ◽

Data Application ◽

Download Full-text