Uncovering High-dimensional Structures of Projections from Dimensionality Reduction Methods

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.

Download Full-text

Comparison of Matrix Dimensionality Reduction Methods in Uncovering Latent Structures in the Data

Journal of Information & Knowledge Management ◽

10.1142/s0219649210002498 ◽

2010 ◽

Vol 09 (01) ◽

pp. 81-92 ◽

Cited By ~ 3

Author(s):

Ch. Aswani Kumar ◽

Ramaraj Palanisamy

Keyword(s):

Dimensionality Reduction ◽

Time Series Data ◽

Matrix Decomposition ◽

Decomposition Methods ◽

Singular Value ◽

Series Data ◽

High Dimensional ◽

Reduction Methods ◽

Latent Structures ◽

Value Decomposition

Matrix decomposition methods: Singular Value Decomposition (SVD) and Semi Discrete Decomposition (SDD) are proved to be successful in dimensionality reduction. However, to the best of our knowledge, no empirical results are presented and no comparison between these methods is done to uncover latent structures in the data. In this paper, we present how these methods can be used to identify and visualise latent structures in the time series data. Results on a high dimensional dataset demonstrate that SVD is more successful in uncovering the latent structures.

Download Full-text

Dimensionality Reduction Methods Used in Machine Learning

Műszaki Tudományos Közlemények ◽

10.33894/mtk-2020.13.27 ◽

2020 ◽

Vol 13 (1) ◽

pp. 148-151

Author(s):

Kristóf Muhi ◽

Zsolt Csaba Johanyák

Keyword(s):

Machine Learning ◽

Missing Data ◽

Dimensionality Reduction ◽

Feature Space ◽

Data Preprocessing ◽

Short Review ◽

High Dimensional ◽

Data Types ◽

Reduction Methods ◽

The Individual

AbstractIn most cases, a dataset obtained through observation, measurement, etc. cannot be directly used for the training of a machine learning based system due to the unavoidable existence of missing data, inconsistencies and high dimensional feature space. Additionally, the individual features can contain quite different data types and ranges. For this reason, a data preprocessing step is nearly always necessary before the data can be used. This paper gives a short review of the typical methods applicable in the preprocessing and dimensionality reduction of raw data.

Download Full-text

A Similar Distribution Discriminant Analysis with Orthogonal and Nearly Statistically Uncorrelated Characteristics

Mathematical Problems in Engineering ◽

10.1155/2019/3145973 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Zhibo Guo ◽

Ying Zhang

Keyword(s):

Discriminant Analysis ◽

Dimensionality Reduction ◽

Linear Discriminant Analysis ◽

High Dimensional Data ◽

Sensor Data ◽

High Dimensional ◽

Similar Distribution ◽

Face Database ◽

Linear Discriminant ◽

Reduction Methods

It is very difficult to process and analyze high-dimensional data directly. Therefore, it is necessary to learn a potential subspace of high-dimensional data through excellent dimensionality reduction algorithms to preserve the intrinsic structure of high-dimensional data and abandon the less useful information. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two popular dimensionality reduction methods for high-dimensional sensor data preprocessing. LDA contains two basic methods, namely, classic linear discriminant analysis and FS linear discriminant analysis. In this paper, a new method, called similar distribution discriminant analysis (SDDA), is proposed based on the similarity of samples’ distribution. Furthermore, the method of solving the optimal discriminant vector is given. These discriminant vectors are orthogonal and nearly statistically uncorrelated. The disadvantages of PCA and LDA are overcome, and the extracted features are more effective by using SDDA. The recognition performance of SDDA exceeds PCA and LDA largely. Some experiments on the Yale face database, FERET face database, and UCI multiple features dataset demonstrate that the proposed method is effective. The results reveal that SDDA obtains better performance than comparison dimensionality reduction methods.

Download Full-text

A Robust Supervised Variable Selection for Noisy High-Dimensional Data

BioMed Research International ◽

10.1155/2015/320385 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Jan Kalina ◽

Anna Schlenker

Keyword(s):

Variable Selection ◽

Dimensionality Reduction ◽

Robust Statistics ◽

High Dimensional Data ◽

Real Data ◽

High Dimensional ◽

Adaptive Weights ◽

Novel Approach ◽

Reduction Methods ◽

Data Adaptive

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.

Download Full-text

New dimensionality reduction methods for the representation of high dimensional ‘omics’ data

Expert Review of Molecular Diagnostics ◽

10.1586/erm.10.95 ◽

2011 ◽

Vol 11 (1) ◽

pp. 27-34 ◽

Cited By ~ 9

Author(s):

Christophe Bécavin ◽

Arndt Benecke

Keyword(s):

Dimensionality Reduction ◽

High Dimensional ◽

Omics Data ◽

Reduction Methods

Download Full-text

A Comparative Analysis of Dimensionality Reduction Methods for Genetic Programming to Solve High-Dimensional Symbolic Regression Problems

10.1109/smc52423.2021.9658595 ◽

2021 ◽

Author(s):

Lianjie Zhong ◽

Jinghui Zhong ◽

Chengyu Lu

Keyword(s):

Comparative Analysis ◽

Genetic Programming ◽

Dimensionality Reduction ◽

Symbolic Regression ◽

High Dimensional ◽

Reduction Methods ◽

Regression Problems

Download Full-text

Quality-based guidance for exploratory dimensionality reduction

Information Visualization ◽

10.1177/1473871612460526 ◽

2012 ◽

Vol 12 (1) ◽

pp. 44-64 ◽

Cited By ~ 12

Author(s):

Sara Johansson Fernstad ◽

Jane Shaw ◽

Jimmy Johansson

Keyword(s):

Dimensionality Reduction ◽

High Dimensional Data ◽

Quality Metrics ◽

Exploratory Analysis ◽

High Dimensional ◽

Data Sets ◽

Data Set ◽

Bacterial Populations ◽

Reduction Methods ◽

Individual Variables

High-dimensional data sets containing hundreds of variables are difficult to explore, as traditional visualization methods often are unable to represent such data effectively. This is commonly addressed by employing dimensionality reduction prior to visualization. Numerous dimensionality reduction methods are available. However, few reduction approaches take the importance of several structures into account and few provide an overview of structures existing in the full high-dimensional data set. For exploratory analysis, as well as for many other tasks, several structures may be of interest. Exploration of the full high-dimensional data set without reduction may also be desirable. This paper presents flexible methods for exploratory analysis and interactive dimensionality reduction. Automated methods are employed to analyse the variables, using a range of quality metrics, providing one or more measures of ‘interestingness’ for individual variables. Through ranking, a single value of interestingness is obtained, based on several quality metrics, that is usable as a threshold for the most interesting variables. An interactive environment is presented in which the user is provided with many possibilities to explore and gain understanding of the high-dimensional data set. Guided by this, the analyst can explore the high-dimensional data set and interactively select a subset of the potentially most interesting variables, employing various methods for dimensionality reduction. The system is demonstrated through a use-case analysing data from a DNA sequence-based study of bacterial populations.

Download Full-text

Speech Emotion Recognition Based on Sparse Representation

Archives of Acoustics ◽

10.2478/aoa-2013-0055 ◽

2013 ◽

Vol 38 (4) ◽

pp. 465-470 ◽

Cited By ~ 11

Author(s):

Jingjie Yan ◽

Xiaolan Wang ◽

Weiyi Gu ◽

LiLi Ma

Keyword(s):

Dimensionality Reduction ◽

Emotion Recognition ◽

Least Squares ◽

Partial Least Squares ◽

Partial Least Squares Regression ◽

Speech Emotion Recognition ◽

Least Squares Regression ◽

Computer Science Pedagogy ◽

Reduction Methods ◽

Analysis Computer

Abstract Speech emotion recognition is deemed to be a meaningful and intractable issue among a number of do- mains comprising sentiment analysis, computer science, pedagogy, and so on. In this study, we investigate speech emotion recognition based on sparse partial least squares regression (SPLSR) approach in depth. We make use of the sparse partial least squares regression method to implement the feature selection and dimensionality reduction on the whole acquired speech emotion features. By the means of exploiting the SPLSR method, the component parts of those redundant and meaningless speech emotion features are lessened to zero while those serviceable and informative speech emotion features are maintained and selected to the following classification step. A number of tests on Berlin database reveal that the recogni- tion rate of the SPLSR method can reach up to 79.23% and is superior to other compared dimensionality reduction methods.

Download Full-text

RHDSI: A Novel Dimensionality Reduction Based Algorithm on High Dimensional Feature Selection with Interactions

Information Sciences ◽

10.1016/j.ins.2021.06.096 ◽

2021 ◽

Author(s):

Rahi Jain ◽

Wei Xu

Keyword(s):

Feature Selection ◽

Dimensionality Reduction ◽

High Dimensional

Download Full-text