scholarly journals Unsupervised Text Feature Learning via Deep Variational Auto-encoder

2020 ◽  
Vol 49 (3) ◽  
pp. 421-437
Author(s):  
Genggeng Liu ◽  
Lin Xie ◽  
Chi-Hua Chen

Dimensionality reduction plays an important role in the data processing of machine learning and data mining, which makes the processing of high-dimensional data more efficient. Dimensionality reduction can extract the low-dimensional feature representation of high-dimensional data, and an effective dimensionality reduction method can not only extract most of the useful information of the original data, but also realize the function of removing useless noise. The dimensionality reduction methods can be applied to all types of data, especially image data. Although the supervised learning method has achieved good results in the application of dimensionality reduction, its performance depends on the number of labeled training samples. With the growing of information from internet, marking the data requires more resources and is more difficult. Therefore, using unsupervised learning to learn the feature of data has extremely important research value. In this paper, an unsupervised multilayered variational auto-encoder model is studied in the text data, so that the high-dimensional feature to the low-dimensional feature becomes efficient and the low-dimensional feature can retain mainly information as much as possible. Low-dimensional feature obtained by different dimensionality reduction methods are used to compare with the dimensionality reduction results of variational auto-encoder (VAE), and the method can be significantly improved over other comparison methods.

Author(s):  
Akira Imakura ◽  
Momo Matsuda ◽  
Xiucai Ye ◽  
Tetsuya Sakurai

Dimensionality reduction methods that project highdimensional data to a low-dimensional space by matrix trace optimization are widely used for clustering and classification. The matrix trace optimization problem leads to an eigenvalue problem for a low-dimensional subspace construction, preserving certain properties of the original data. However, most of the existing methods use only a few eigenvectors to construct the low-dimensional space, which may lead to a loss of useful information for achieving successful classification. Herein, to overcome the deficiency of the information loss, we propose a novel complex moment-based supervised eigenmap including multiple eigenvectors for dimensionality reduction. Furthermore, the proposed method provides a general formulation for matrix trace optimization methods to incorporate with ridge regression, which models the linear dependency between covariate variables and univariate labels. To reduce the computational complexity, we also propose an efficient and parallel implementation of the proposed method. Numerical experiments indicate that the proposed method is competitive compared with the existing dimensionality reduction methods for the recognition performance. Additionally, the proposed method exhibits high parallel efficiency.


Author(s):  
Xiaofeng Zhu ◽  
Cong Lei ◽  
Hao Yu ◽  
Yonggang Li ◽  
Jiangzhang Gan ◽  
...  

In this paper, we propose conducting Robust Graph Dimensionality Reduction (RGDR) by learning a transformation matrix to map original high-dimensional data into their low-dimensional intrinsic space without the influence of outliers. To do this, we propose simultaneously 1) adaptively learning three variables, \ie a reverse graph embedding of original data, a transformation matrix, and a graph matrix preserving the local similarity of original data in their low-dimensional intrinsic space; and 2) employing robust estimators to  avoid outliers involving the processes of optimizing these three matrices. As a result, original data are cleaned by two strategies, \ie a prediction of original data based on three resulting variables and robust estimators, so that the transformation matrix can be learnt from accurately estimated intrinsic space with the helping of the reverse graph embedding and the graph matrix. Moreover, we propose a new optimization algorithm to the resulting objective function as well as theoretically prove the convergence of our optimization algorithm. Experimental results indicated that our proposed method outperformed all the comparison methods in terms of different classification tasks.


2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.


2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Zhibo Guo ◽  
Ying Zhang

It is very difficult to process and analyze high-dimensional data directly. Therefore, it is necessary to learn a potential subspace of high-dimensional data through excellent dimensionality reduction algorithms to preserve the intrinsic structure of high-dimensional data and abandon the less useful information. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two popular dimensionality reduction methods for high-dimensional sensor data preprocessing. LDA contains two basic methods, namely, classic linear discriminant analysis and FS linear discriminant analysis. In this paper, a new method, called similar distribution discriminant analysis (SDDA), is proposed based on the similarity of samples’ distribution. Furthermore, the method of solving the optimal discriminant vector is given. These discriminant vectors are orthogonal and nearly statistically uncorrelated. The disadvantages of PCA and LDA are overcome, and the extracted features are more effective by using SDDA. The recognition performance of SDDA exceeds PCA and LDA largely. Some experiments on the Yale face database, FERET face database, and UCI multiple features dataset demonstrate that the proposed method is effective. The results reveal that SDDA obtains better performance than comparison dimensionality reduction methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Jan Kalina ◽  
Anna Schlenker

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.


2013 ◽  
Vol 677 ◽  
pp. 502-507
Author(s):  
Kang Hua Hui ◽  
Chun Li Li ◽  
Xiao Rong Feng ◽  
Xue Yang Wang

In this paper, a new method is proposed, which can be considered as the combination of sparse representation based classification (SRC) and KNN classifier. In detail, with the assumption of locally linear embedding coming into existence, the proposed method achieves the classification goal via non-negative locally sparse representation, combining the reconstruction property and the sparsity of SRC and the discrimination power included in KNN. Compared to SRC, the proposed method has obvious discrimination and is more acceptable for the real image data without those preconditions difficult to satisfy. Moreover, it is more suitable for the classification of low dimensional data dimensionally reduced by dimensionality reduction methods, especially those methods obtaining the low dimensional and neighborhood preserving embeddings of high dimensional data. The experiments on MNIST is also presented, which supports the above arguments.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4718
Author(s):  
Zheng Zhuo ◽  
Zhong Zhou

Recently, there have been rapid advances in high-resolution remote sensing image retrieval, which plays an important role in remote sensing data management and utilization. For content-based remote sensing image retrieval, low-dimensional, representative and discriminative features are essential to ensure good retrieval accuracy and speed. Dimensionality reduction is one of the important solutions to improve the quality of features in image retrieval, in which LargeVis is an effective algorithm specifically designed for Big Data visualization. Here, an extended LargeVis (E-LargeVis) dimensionality reduction method for high-resolution remote sensing image retrieval is proposed. This can realize the dimensionality reduction of single high-dimensional data by modeling the implicit mapping relationship between LargeVis high-dimensional data and low-dimensional data with support vector regression. An effective high-resolution remote sensing image retrieval method is proposed to obtain stronger representative and discriminative deep features. First, the fully connected layer features are extracted using a channel attention-based ResNet50 as a backbone network. Then, E-LargeVis is used to reduce the dimensionality of the fully connected features to obtain a low-dimensional discriminative representation. Finally, L2 distance is computed for similarity measurement to realize the retrieval of high-resolution remote sensing images. The experimental results on four high-resolution remote sensing image datasets, including UCM, RS19, RSSCN7, and AID, show that for various convolutional neural network architectures, the proposed E-LargeVis can effectively improve retrieval performance, far exceeding other dimensionality reduction methods.


2021 ◽  
pp. 1-19
Author(s):  
Guo Niu ◽  
Zhengming Ma ◽  
Haoqing Chen ◽  
Xue Su

Manifold learning plays an important role in nonlinear dimensionality reduction. But many manifold learning algorithms cannot offer an explicit expression for dealing with the problem of out-of-sample (or new data). In recent, many improved algorithms introduce a fixed function to the object function of manifold learning for learning this expression. In manifold learning, the relationship between the high-dimensional data and its low-dimensional representation is a local homeomorphic mapping. Therefore, these improved algorithms actually change or damage the intrinsic structure of manifold learning, as well as not manifold learning. In this paper, a novel manifold learning based on polynomial approximation (PAML) is proposed, which learns the polynomial approximation of manifold learning by using the dimensionality reduction results of manifold learning and the original high-dimensional data. In particular, we establish a polynomial representation of high-dimensional data with Kronecker product, and learns an optimal transformation matrix with this polynomial representation. This matrix gives an explicit and optimal nonlinear mapping between the high-dimensional data and its low-dimensional representation, and can be directly used for solving the problem of new data. Compare with using the fixed linear or nonlinear relationship instead of the manifold relationship, our proposed method actually learns the polynomial optimal approximation of manifold learning, without changing the object function of manifold learning (i.e., keeping the intrinsic structure of manifold learning). We implement experiments over eight data sets with the advanced algorithms published in recent years to demonstrate the benefits of our algorithm.


2010 ◽  
Vol 7 (1) ◽  
pp. 127-138 ◽  
Author(s):  
Zhao Zhang ◽  
Ye Ning

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or different classes. KNDR can project the data onto a set of 'useful' features and preserve the structure of labeled and unlabeled data as well as the constraints defined in the embedding space, under which the projections of the original data can be effectively partitioned from each other. We demonstrate the practical usefulness of KNDR for data visualization and wood defects recognition through extensive experiments. Experimental results show it achieves similar or even higher performances than some existing methods.


2012 ◽  
Vol 12 (1) ◽  
pp. 44-64 ◽  
Author(s):  
Sara Johansson Fernstad ◽  
Jane Shaw ◽  
Jimmy Johansson

High-dimensional data sets containing hundreds of variables are difficult to explore, as traditional visualization methods often are unable to represent such data effectively. This is commonly addressed by employing dimensionality reduction prior to visualization. Numerous dimensionality reduction methods are available. However, few reduction approaches take the importance of several structures into account and few provide an overview of structures existing in the full high-dimensional data set. For exploratory analysis, as well as for many other tasks, several structures may be of interest. Exploration of the full high-dimensional data set without reduction may also be desirable. This paper presents flexible methods for exploratory analysis and interactive dimensionality reduction. Automated methods are employed to analyse the variables, using a range of quality metrics, providing one or more measures of ‘interestingness’ for individual variables. Through ranking, a single value of interestingness is obtained, based on several quality metrics, that is usable as a threshold for the most interesting variables. An interactive environment is presented in which the user is provided with many possibilities to explore and gain understanding of the high-dimensional data set. Guided by this, the analyst can explore the high-dimensional data set and interactively select a subset of the potentially most interesting variables, employing various methods for dimensionality reduction. The system is demonstrated through a use-case analysing data from a DNA sequence-based study of bacterial populations.


Sign in / Sign up

Export Citation Format

Share Document