An efficient approach for dimensionality reduction and classification of high dimensional text documents

A major step for high-quality optical surfaces faults diagnosis concerns scratches and digs defects characterization in products. This challenging operation is very important since it is directly linked with the produced optical component’s quality. A classification phase is mandatory to complete optical devices diagnosis since a number of correctable defects are usually present beside the potential “abiding” ones. Unfortunately relevant data extracted from raw image during defects detection phase are high dimensional. This can have harmful effect on the behaviors of artificial neural networks which are suitable to perform such a challenging classification. Reducing data dimension to a smaller value can decrease the problems related to high dimensionality. In this paper we compare different techniques which permit dimensionality reduction and evaluate their impact on classification tasks performances.

Download Full-text

Dimensionality reduction approach for high dimensional text documents

2016 International Conference on Engineering & MIS (ICEMIS) ◽

10.1109/icemis.2016.7745364 ◽

2016 ◽

Cited By ~ 7

Author(s):

G. Suresh Reddy

Keyword(s):

Dimensionality Reduction ◽

High Dimensional ◽

Text Documents ◽

Reduction Approach

Download Full-text

Design and analysis of novel similarity measure for clustering and classification of high dimensional text documents

Proceedings of the 15th International Conference on Computer Systems and Technologies - CompSysTech '14 ◽

10.1145/2659532.2659615 ◽

2014 ◽

Cited By ~ 6

Author(s):

G. SureshReddy ◽

T. V. Rajinikanth ◽

A. Ananda Rao

Keyword(s):

Similarity Measure ◽

High Dimensional ◽

Text Documents ◽

Clustering And Classification

Download Full-text

Unsupervised Feature Selection Based on Ultrametricity and Sparse Training Data: A Case Study for the Classification of High-Dimensional Hyperspectral Data

Remote Sensing ◽

10.3390/rs10101564 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1564 ◽

Cited By ~ 3

Author(s):

Patrick Bradley ◽

Sina Keller ◽

Martin Weinmann

Keyword(s):

Feature Selection ◽

Dimensionality Reduction ◽

Hyperspectral Data ◽

Training Data ◽

High Dimensional ◽

Unsupervised Feature Selection ◽

Feature Selection Techniques ◽

The Given

In this paper, we investigate the potential of unsupervised feature selection techniques for classification tasks, where only sparse training data are available. This is motivated by the fact that unsupervised feature selection techniques combine the advantages of standard dimensionality reduction techniques (which only rely on the given feature vectors and not on the corresponding labels) and supervised feature selection techniques (which retain a subset of the original set of features). Thus, feature selection becomes independent of the given classification task and, consequently, a subset of generally versatile features is retained. We present different techniques relying on the topology of the given sparse training data. Thereby, the topology is described with an ultrametricity index. For the latter, we take into account the Murtagh Ultrametricity Index (MUI) which is defined on the basis of triangles within the given data and the Topological Ultrametricity Index (TUI) which is defined on the basis of a specific graph structure. In a case study addressing the classification of high-dimensional hyperspectral data based on sparse training data, we demonstrate the performance of the proposed unsupervised feature selection techniques in comparison to standard dimensionality reduction and supervised feature selection techniques on four commonly used benchmark datasets. The achieved classification results reveal that involving supervised feature selection techniques leads to similar classification results as involving unsupervised feature selection techniques, while the latter perform feature selection independently from the given classification task and thus deliver generally versatile features.

Download Full-text

Sensing aware dimensionality reduction for nearest neighbor classification of high dimensional signals

2012 IEEE Statistical Signal Processing Workshop (SSP) ◽

10.1109/ssp.2012.6319716 ◽

2012 ◽

Author(s):

Zachary Sun ◽

W. Clem Karl ◽

Prakash Ishwar ◽

Venkatesh Saligrama

Keyword(s):

Dimensionality Reduction ◽

Nearest Neighbor ◽

High Dimensional ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

Sparsity promoting dimensionality reduction for classification of high dimensional hyperspectral images

2013 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2013.6638035 ◽

2013 ◽

Cited By ~ 6

Author(s):

Minshan Cui ◽

Saurabh Prasad

Keyword(s):

Dimensionality Reduction ◽

Hyperspectral Images ◽

High Dimensional

Download Full-text

Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

Nature Communications ◽

10.1038/s41467-020-20430-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Laura Cantini ◽

Pooya Zakeri ◽

Celine Hernandez ◽

Aurelien Naldi ◽

Denis Thieffry ◽

...

Keyword(s):

Dimensionality Reduction ◽

Ground Truth ◽

Systematic Evaluation ◽

High Dimensional ◽

Biological Processes ◽

Cancer Data ◽

Benchmark Study ◽

Practical Guidelines ◽

Cell Data

AbstractHigh-dimensional multi-omics data are now standard in biology. They can greatly enhance our understanding of biological systems when effectively integrated. To achieve proper integration, joint Dimensionality Reduction (jDR) methods are among the most efficient approaches. However, several jDR methods are available, urging the need for a comprehensive benchmark with practical guidelines. We perform a systematic evaluation of nine representative jDR methods using three complementary benchmarks. First, we evaluate their performances in retrieving ground-truth sample clustering from simulated multi-omics datasets. Second, we use TCGA cancer data to assess their strengths in predicting survival, clinical annotations and known pathways/biological processes. Finally, we assess their classification of multi-omics single-cell data. From these in-depth comparisons, we observe that intNMF performs best in clustering, while MCIA offers an effective behavior across many contexts. The code developed for this benchmark study is implemented in a Jupyter notebook—multi-omics mix (momix)—to foster reproducibility, and support users and future developers.

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text