An Efficient Clustering Method for High-Dimensional Data Mining

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Fuzzy comprehensive evaluation of physical education based on high dimensional data mining

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-169661 ◽

2018 ◽

Vol 35 (3) ◽

pp. 3065-3076 ◽

Cited By ~ 3

Author(s):

Zhihui Wang

Keyword(s):

Data Mining ◽

Physical Education ◽

Comprehensive Evaluation ◽

High Dimensional Data ◽

Fuzzy Comprehensive Evaluation ◽

High Dimensional

Download Full-text

Generalizing rules by random forest-based learning classifier systems for high-dimensional data mining

Proceedings of the Genetic and Evolutionary Computation Conference Companion on - GECCO '18 ◽

10.1145/3205651.3208298 ◽

2018 ◽

Author(s):

Fumito Uwano ◽

Koji Dobashi ◽

Keiki Takadama ◽

Tim Kovacs

Keyword(s):

Data Mining ◽

Random Forest ◽

High Dimensional Data ◽

Learning Classifier Systems ◽

High Dimensional ◽

Classifier Systems ◽

Learning Classifier

Download Full-text

Exploiting the anomaly detection for high dimensional data using descriptive approach of data mining

2013 4th International Conference on Computer and Communication Technology (ICCCT) ◽

10.1109/iccct.2013.6749614 ◽

2013 ◽

Cited By ~ 1

Author(s):

Bharat Singh ◽

Nidhi Kushwaha ◽

O P Vyas

Keyword(s):

Data Mining ◽

Anomaly Detection ◽

High Dimensional Data ◽

High Dimensional ◽

Descriptive Approach

Download Full-text

Automatic subspace clustering of high dimensional data for data mining applications

ACM SIGMOD Record ◽

10.1145/276305.276314 ◽

1998 ◽

Vol 27 (2) ◽

pp. 94-105 ◽

Cited By ~ 378

Author(s):

Rakesh Agrawal ◽

Johannes Gehrke ◽

Dimitrios Gunopulos ◽

Prabhakar Raghavan

Keyword(s):

Data Mining ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional

Download Full-text

Information-Theoretic Based Clustering Method for High-Dimensional Data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1533/2/022115 ◽

2020 ◽

Vol 1533 ◽

pp. 022115

Author(s):

Xuan Huang ◽

Lixi Chen ◽

Yinsong Ye

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Clustering Method ◽

Information Theoretic

Download Full-text

A Novel Convex Clustering Method for High-Dimensional Data Using Semiproximal ADMM

Mathematical Problems in Engineering ◽

10.1155/2020/9216351 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Huangyue Chen ◽

Lingchen Kong ◽

Yan Li

Keyword(s):

High Dimensional Data ◽

Group Lasso ◽

High Dimensional ◽

Clustering Methods ◽

Finite Sample ◽

Clustering Method ◽

Sparse Group Lasso ◽

Clustering Model ◽

Sample Error ◽

Convex Clustering

Clustering is an important ingredient of unsupervised learning; classical clustering methods include K-means clustering and hierarchical clustering. These methods may suffer from instability because of their tendency prone to sink into the local optimal solutions of the nonconvex optimization model. In this paper, we propose a new convex clustering method for high-dimensional data based on the sparse group lasso penalty, which can simultaneously group observations and eliminate noninformative features. In this method, the number of clusters can be learned from the data instead of being given in advance as a parameter. We theoretically prove that the proposed method has desirable statistical properties, including a finite sample error bound and feature screening consistency. Furthermore, the semiproximal alternating direction method of multipliers is designed to solve the sparse group lasso convex clustering model, and its convergence analysis is established without any conditions. Finally, the effectiveness of the proposed method is thoroughly demonstrated through simulated experiments and real applications.

Download Full-text

An Efficient Clustering Method for High-Dimensional Data Mining

A New Cell-Based Clustering Method for High-Dimensional Data Mining Applications

A new cell-based clustering method for large, high-dimensional data in data mining applications

Opening the Black Box of Feature Extraction: Incorporating Visualization into High-Dimensional Data Mining Processes

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Fuzzy comprehensive evaluation of physical education based on high dimensional data mining

Generalizing rules by random forest-based learning classifier systems for high-dimensional data mining

Exploiting the anomaly detection for high dimensional data using descriptive approach of data mining

Automatic subspace clustering of high dimensional data for data mining applications

Information-Theoretic Based Clustering Method for High-Dimensional Data

A Novel Convex Clustering Method for High-Dimensional Data Using Semiproximal ADMM

Export Citation Format