3D Tensor Auto-encoder with Application to Video Compression

Author(s):  
Yang Li ◽  
Guangcan Liu ◽  
Yubao Sun ◽  
Qingshan Liu ◽  
Shengyong Chen

Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n , the number of parameters in an auto-encoder is in general O ( n ). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O ( n 1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.

2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
Fuding Xie ◽  
Yutao Fan ◽  
Ming Zhou

Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality. This paper introduces a dimensionality reduction technique by weighted connections between neighborhoods to improveK-Isomap method, attempting to preserve perfectly the relationships between neighborhoods in the process of dimensionality reduction. The validity of the proposal is tested by three typical examples which are widely employed in the algorithms based on manifold. The experimental results show that the local topology nature of dataset is preserved well while transforming dataset in high-dimensional space into a new dataset in low-dimensionality by the proposed method.


2020 ◽  
Author(s):  
Alexander Jung

We propose networked exponential families for non-parametric<br>machine learning from massive network-structured datasets<br>(“big data over networks”). High-dimensional data points are<br>interpreted as the realizations of a random process distributed<br>according to some exponential family. Networked exponential<br>families allow to jointly leverage the information contained<br>in high-dimensional data points and their network structure.<br>For data points representing individuals, we obtain perfectly<br>personalized models which enable high-precision medicine or<br>more general recommendation systems.We learn the parameters<br>of networked exponential families, using the network Lasso<br>which implicitly pools (or clusters) the data points according to<br>the intrinsic network structure and a local likelihood function.<br>Our main theoretical result characterizes how the accuracy<br>of network Lasso depends on the network structure and the<br>information geometry of the node-wise exponential families.<br>The network Lasso can be implemented as highly scalable<br>message-passing over the data network. Such message passing<br>is appealing for federated machine learning relying on edge<br>computing. The proposed method is also privacy preserving in<br>the sense that no raw data but only parameter (estimates) are<br>shared among different nodes.


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Wan-Yu Deng ◽  
Dan Liu ◽  
Ying-Ying Dong

Due to missing values, incomplete dataset is ubiquitous in multimodal scene. Complete data is a prerequisite of the most existing multimodality data fusion methods. For incomplete multimodal high-dimensional data, we propose a feature selection and classification method. Our method mainly focuses on extracting the most relevant features from the high-dimensional features and then improving the classification accuracy. The experimental results show that our method produces considerably better performance on incomplete multimodal data such as ADNI dataset and Office dataset, compared to the case of complete data.


Author(s):  
M. Pavithra ◽  
R.M.S. Parvathi

Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. The proposed method called “kernel trick” and “Collective Neighbour Clustering”, which takes as input measures of correspondence between pairs of data points. Real-valued hubs are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges [2]. To validate our theory by demonstrating that hubness is a high-quality measure of point centrality within a high dimensional information cluster, and by proposing several hubness-based clustering algorithms, showing that main hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns [4]. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes [6]. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself [10]. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reflections on implications of our model for future research. High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency [7]. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data [8]. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More specifically, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbour lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise [9].


2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Yujia Sun ◽  
Jan Platoš

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.


Author(s):  
Yanping Lu ◽  
Shaozi Li

This chapter aims at developing effective particle swarm optimization (PSO) for two problems commonly encountered in studies related to high-dimensional data clustering, namely the variable weighting problem in soft projected clustering with known the number of clusters k and the problem of automatically determining the number of clusters k. Each problem is formulated to minimize a nonlinear continuous objective function subjected to bound constraints. Special treatments of encoding schemes and search strategies are also proposed to tailor PSO for these two problems. Experimental results on both synthetic and real high-dimensional data show that these two proposed algorithms greatly improve cluster quality. In addition, the results of the new algorithms are much less dependent on the initial cluster centroids. Experimental results indicate that the promising potential pertaining to PSO applicability to clustering high-dimensional data.


2020 ◽  
Author(s):  
Alexander Jung

We propose networked exponential families for non-parametric<br>machine learning from massive network-structured datasets<br>(“big data over networks”). High-dimensional data points are<br>interpreted as the realizations of a random process distributed<br>according to some exponential family. Networked exponential<br>families allow to jointly leverage the information contained<br>in high-dimensional data points and their network structure.<br>For data points representing individuals, we obtain perfectly<br>personalized models which enable high-precision medicine or<br>more general recommendation systems.We learn the parameters<br>of networked exponential families, using the network Lasso<br>which implicitly pools (or clusters) the data points according to<br>the intrinsic network structure and a local likelihood function.<br>Our main theoretical result characterizes how the accuracy<br>of network Lasso depends on the network structure and the<br>information geometry of the node-wise exponential families.<br>The network Lasso can be implemented as highly scalable<br>message-passing over the data network. Such message passing<br>is appealing for federated machine learning relying on edge<br>computing. The proposed method is also privacy preserving in<br>the sense that no raw data but only parameter (estimates) are<br>shared among different nodes.


2010 ◽  
Vol 7 (1) ◽  
pp. 127-138 ◽  
Author(s):  
Zhao Zhang ◽  
Ye Ning

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or different classes. KNDR can project the data onto a set of 'useful' features and preserve the structure of labeled and unlabeled data as well as the constraints defined in the embedding space, under which the projections of the original data can be effectively partitioned from each other. We demonstrate the practical usefulness of KNDR for data visualization and wood defects recognition through extensive experiments. Experimental results show it achieves similar or even higher performances than some existing methods.


Sign in / Sign up

Export Citation Format

Share Document