Dimensionality reduction for tensor data based on projection distance minimization and hilbert-schmidt independence criterion maximization1

2021 ◽  
Vol 40 (5) ◽  
pp. 10307-10322
Author(s):  
Weichao Gan ◽  
Zhengming Ma ◽  
Shuyu Liu

Tensor data are becoming more and more common in machine learning. Compared with vector data, the curse of dimensionality of tensor data is more serious. The motivation of this paper is to combine Hilbert-Schmidt Independence Criterion (HSIC) and tensor algebra to create a new dimensionality reduction algorithm for tensor data. There are three contributions in this paper. (1) An HSIC-based algorithm is proposed in which the dimension-reduced tensor is determined by maximizing HSIC between the dimension-reduced and high-dimensional tensors. (2) A tensor algebra-based algorithm is proposed, in which the high-dimensional tensor are projected onto a subspace and the projection coordinate is set to be the dimension-reduced tensor. The subspace is determined by minimizing the distance between the high-dimensional tensor data and their projection in the subspace. (3) By combining the above two algorithms, a new dimensionality reduction algorithm, called PDMHSIC, is proposed, in which the dimensionality reduction must satisfy two criteria at the same time: HSIC maximization and subspace projection distance minimization. The proposed algorithm is a new attempt to combine HSIC with other algorithms to create new algorithms and has achieved better experimental results on 8 commonly-used datasets than the other 7 well-known algorithms.

2021 ◽  
pp. 1-18
Author(s):  
Ting Gao ◽  
Zhengming Ma ◽  
Wenxu Gao ◽  
Shuyu Liu

There are three contributions in this paper. (1) A tensor version of LLE (short for Local Linear Embedding algorithm) is deduced and presented. LLE is the most famous manifold learning algorithm. Since its proposal, various improvements to LLE have kept emerging without interruption. However, all these achievements are only suitable for vector data, not tensor data. The proposed tensor LLE can also be used a bridge for various improvements to LLE to transfer from vector data to tensor data. (2) A framework of tensor dimensionality reduction based on tensor mode product is proposed, in which the mode matrices can be determined according to specific criteria. (3) A novel dimensionality reduction algorithm for tensor data based on LLE and mode product (LLEMP-TDR) is proposed, in which LLE is used as a criterion to determine the mode matrices. Benefiting from local LLE and global mode product, the proposed LLEMP-TDR can preserve both local and global features of high-dimensional tenser data during dimensionality reduction. The experimental results on data clustering and classification tasks demonstrate that our method performs better than 5 other related algorithms published recently in top academic journals.


2021 ◽  
Vol 12 ◽  
Author(s):  
Jianping Zhao ◽  
Na Wang ◽  
Haiyun Wang ◽  
Chunhou Zheng ◽  
Yansen Su

Dimensionality reduction of high-dimensional data is crucial for single-cell RNA sequencing (scRNA-seq) visualization and clustering. One prominent challenge in scRNA-seq studies comes from the dropout events, which lead to zero-inflated data. To address this issue, in this paper, we propose a scRNA-seq data dimensionality reduction algorithm based on a hierarchical autoencoder, termed SCDRHA. The proposed SCDRHA consists of two core modules, where the first module is a deep count autoencoder (DCA) that is used to denoise data, and the second module is a graph autoencoder that projects the data into a low-dimensional space. Experimental results demonstrate that SCDRHA has better performance than existing state-of-the-art algorithms on dimension reduction and noise reduction in five real scRNA-seq datasets. Besides, SCDRHA can also dramatically improve the performance of data visualization and cell clustering.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Sai Kiranmayee Samudrala ◽  
Jaroslaw Zola ◽  
Srinivas Aluru ◽  
Baskar Ganapathysubramanian

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.


2018 ◽  
Vol 30 (12) ◽  
pp. 3281-3308
Author(s):  
Hong Zhu ◽  
Li-Zhi Liao ◽  
Michael K. Ng

We study a multi-instance (MI) learning dimensionality-reduction algorithm through sparsity and orthogonality, which is especially useful for high-dimensional MI data sets. We develop a novel algorithm to handle both sparsity and orthogonality constraints that existing methods do not handle well simultaneously. Our main idea is to formulate an optimization problem where the sparse term appears in the objective function and the orthogonality term is formed as a constraint. The resulting optimization problem can be solved by using approximate augmented Lagrangian iterations as the outer loop and inertial proximal alternating linearized minimization (iPALM) iterations as the inner loop. The main advantage of this method is that both sparsity and orthogonality can be satisfied in the proposed algorithm. We show the global convergence of the proposed iterative algorithm. We also demonstrate that the proposed algorithm can achieve high sparsity and orthogonality requirements, which are very important for dimensionality reduction. Experimental results on both synthetic and real data sets show that the proposed algorithm can obtain learning performance comparable to that of other tested MI learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document