3D Tensor Auto-encoder with Application to Video Compression

Dimensionality reduction is the transformation of high-dimensional data into a meaningful representation of reduced dimensionality. This paper introduces a dimensionality reduction technique by weighted connections between neighborhoods to improveK-Isomap method, attempting to preserve perfectly the relationships between neighborhoods in the process of dimensionality reduction. The validity of the proposal is tested by three typical examples which are widely employed in the algorithms based on manifold. The experimental results show that the local topology nature of dataset is preserved well while transforming dataset in high-dimensional space into a new dataset in low-dimensionality by the proposed method.

Download Full-text

Networked Exponential Families For Big Data Over Networks

10.36227/techrxiv.12674198 ◽

2020 ◽

Author(s):

Alexander Jung

Keyword(s):

Machine Learning ◽

Big Data ◽

Network Structure ◽

Message Passing ◽

Likelihood Function ◽

High Dimensional Data ◽

Exponential Families ◽

High Dimensional ◽

Parameter Estimates ◽

Data Points

We propose networked exponential families for non-parametric machine learning from massive network-structured datasets (“big data over networks”). High-dimensional data points are interpreted as the realizations of a random process distributed according to some exponential family. Networked exponential families allow to jointly leverage the information contained in high-dimensional data points and their network structure. For data points representing individuals, we obtain perfectly personalized models which enable high-precision medicine or more general recommendation systems.We learn the parameters of networked exponential families, using the network Lasso which implicitly pools (or clusters) the data points according to the intrinsic network structure and a local likelihood function. Our main theoretical result characterizes how the accuracy of network Lasso depends on the network structure and the information geometry of the node-wise exponential families. The network Lasso can be implemented as highly scalable message-passing over the data network. Such message passing is appealing for federated machine learning relying on edge computing. The proposed method is also privacy preserving in the sense that no raw data but only parameter (estimates) are shared among different nodes.

Download Full-text

Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data

Mathematical Problems in Engineering ◽

10.1155/2018/1583969 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Wan-Yu Deng ◽

Dan Liu ◽

Ying-Ying Dong

Keyword(s):

Feature Selection ◽

Data Fusion ◽

Classification Accuracy ◽

Missing Values ◽

High Dimensional Data ◽

Complete Data ◽

Experimental Results ◽

High Dimensional ◽

Multimodal Data ◽

Fusion Methods

Due to missing values, incomplete dataset is ubiquitous in multimodal scene. Complete data is a prerequisite of the most existing multimodality data fusion methods. For incomplete multimodal high-dimensional data, we propose a feature selection and classification method. Our method mainly focuses on extracting the most relevant features from the high-dimensional features and then improving the classification accuracy. The experimental results show that our method produces considerably better performance on incomplete multimodal data such as ADNI dataset and Office dataset, compared to the case of complete data.

Download Full-text

High dimensional data clustering

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.82 ◽

2018 ◽

Vol 3 (1) ◽

pp. 21-30

Author(s):

M. Pavithra ◽

R.M.S. Parvathi

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Quality Metrics ◽

Experimental Results ◽

High Dimensional ◽

Future Research ◽

High Quality ◽

Feature Subspace ◽

Data Points ◽

Visualization Techniques

Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. The proposed method called “kernel trick” and “Collective Neighbour Clustering”, which takes as input measures of correspondence between pairs of data points. Real-valued hubs are exchanged between data points until a high-quality set of patterns and corresponding clusters gradually emerges [2]. To validate our theory by demonstrating that hubness is a high-quality measure of point centrality within a high dimensional information cluster, and by proposing several hubness-based clustering algorithms, showing that main hubs can be used effectively as cluster prototypes or as guides during the search for centroid-based cluster patterns [4]. Experimental results demonstrate the good performance of our proposed algorithms in manifold settings, mainly focused on large quantities of overlapping noise. The proposed methods are modified mostly for detecting approximately hyper spherical clusters and need to be extended to properly handle clusters of arbitrary shapes [6]. For this purpose, we provide an overview of approaches that use quality metrics in high-dimensional data visualization and propose systematization based on a thorough literature review. We carefully analyze the papers and derive a set of factors for discriminating the quality metrics, visualization techniques, and the process itself [10]. The process is described through a reworked version of the well-known information visualization pipeline. We demonstrate the usefulness of our model by applying it to several existing approaches that use quality metrics, and we provide reﬂections on implications of our model for future research. High-dimensional data arise naturally in many domains, and have regularly presented a great challenge for traditional data-mining techniques, both in terms of effectiveness and efficiency [7]. Clustering becomes difficult due to the increasing sparsity of such data, as well as the increasing difficulty in distinguishing distances between data points. In this paper we take a novel perspective on the problem of clustering high-dimensional data [8]. Instead of attempting to avoid the curse of dimensionality by observing a lower-dimensional feature subspace, we embrace dimensionality by taking advantage of some inherently high-dimensional phenomena. More speciﬁcally, we show that hubness, i.e., the tendency of high-dimensional data to contain points (hubs) that frequently occur in k-nearest neighbour lists of other points, can be successfully exploited in clustering. We validate our hypothesis by proposing several hubness-based clustering algorithms and testing them on high-dimensional data. Experimental results demonstrate good performance of our algorithms in multiple settings, particularly in the presence of large quantities of noise [9].

Download Full-text

High-Dimensional Text Clustering by Dimensionality Reduction and Improved Density Peak

Wireless Communications and Mobile Computing ◽

10.1155/2020/8881112 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16

Author(s):

Yujia Sun ◽

Jan Platoš

Keyword(s):

Dimensionality Reduction ◽

Data Clustering ◽

High Dimensional Data ◽

Random Projection ◽

Experimental Results ◽

High Dimensional ◽

Density Peak ◽

Text Data ◽

Number Of Clusters ◽

Density Peaks

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.

Download Full-text

Particle Swarm Optimizer for High-Dimensional Data Clustering

Advances in Computer and Electrical Engineering - Kansei Engineering and Soft Computing ◽

10.4018/978-1-61692-797-4.ch002 ◽

2010 ◽

pp. 31-51

Author(s):

Yanping Lu ◽

Shaozi Li

Keyword(s):

Data Clustering ◽

High Dimensional Data ◽

Particle Swarm ◽

Experimental Results ◽

High Dimensional ◽

Bound Constraints ◽

Particle Swarm Optimizer ◽

Number Of Clusters ◽

Projected Clustering ◽

Weighting Problem

This chapter aims at developing effective particle swarm optimization (PSO) for two problems commonly encountered in studies related to high-dimensional data clustering, namely the variable weighting problem in soft projected clustering with known the number of clusters k and the problem of automatically determining the number of clusters k. Each problem is formulated to minimize a nonlinear continuous objective function subjected to bound constraints. Special treatments of encoding schemes and search strategies are also proposed to tailor PSO for these two problems. Experimental results on both synthetic and real high-dimensional data show that these two proposed algorithms greatly improve cluster quality. In addition, the results of the new algorithms are much less dependent on the initial cluster centroids. Experimental results indicate that the promising potential pertaining to PSO applicability to clustering high-dimensional data.

Download Full-text

Networked Exponential Families For Big Data Over Networks

10.36227/techrxiv.12674198.v1 ◽

2020 ◽

Author(s):

Alexander Jung

Keyword(s):

Machine Learning ◽

Big Data ◽

Network Structure ◽

Message Passing ◽

Likelihood Function ◽

High Dimensional Data ◽

Exponential Families ◽

High Dimensional ◽

Parameter Estimates ◽

Data Points

We propose networked exponential families for non-parametric machine learning from massive network-structured datasets (“big data over networks”). High-dimensional data points are interpreted as the realizations of a random process distributed according to some exponential family. Networked exponential families allow to jointly leverage the information contained in high-dimensional data points and their network structure. For data points representing individuals, we obtain perfectly personalized models which enable high-precision medicine or more general recommendation systems.We learn the parameters of networked exponential families, using the network Lasso which implicitly pools (or clusters) the data points according to the intrinsic network structure and a local likelihood function. Our main theoretical result characterizes how the accuracy of network Lasso depends on the network structure and the information geometry of the node-wise exponential families. The network Lasso can be implemented as highly scalable message-passing over the data network. Such message passing is appealing for federated machine learning relying on edge computing. The proposed method is also privacy preserving in the sense that no raw data but only parameter (estimates) are shared among different nodes.

Download Full-text

Effective semi-supervised nonlinear dimensionality reduction for wood defects recognition

Computer Science and Information Systems ◽

10.2298/csis1001127z ◽

2010 ◽

Vol 7 (1) ◽

pp. 127-138 ◽

Cited By ~ 2

Author(s):

Zhao Zhang ◽

Ye Ning

Keyword(s):

Data Analysis ◽

Dimensionality Reduction ◽

Data Visualization ◽

Domain Knowledge ◽

High Dimensional Data ◽

Original Data ◽

Experimental Results ◽

High Dimensional ◽

Nonlinear Dimensionality Reduction ◽

Practical Usefulness

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or different classes. KNDR can project the data onto a set of 'useful' features and preserve the structure of labeled and unlabeled data as well as the constraints defined in the embedding space, under which the projections of the original data can be effectively partitioned from each other. We demonstrate the practical usefulness of KNDR for data visualization and wood defects recognition through extensive experiments. Experimental results show it achieves similar or even higher performances than some existing methods.

Download Full-text