scholarly journals SpHMC: Spectral Hamiltonian Monte Carlo

Author(s):  
Haoyi Xiong ◽  
Kafeng Wang ◽  
Jiang Bian ◽  
Zhanxing Zhu ◽  
Cheng-Zhong Xu ◽  
...  

Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) methods have been widely used to sample from certain probability distributions, incorporating (kernel) density derivatives and/or given datasets. Instead of exploring new samples from kernel spaces, this piece of work proposed a novel SGHMC sampler, namely Spectral Hamiltonian Monte Carlo (SpHMC), that produces the high dimensional sparse representations of given datasets through sparse sensing and SGHMC. Inspired by compressed sensing, we assume all given samples are low-dimensional measurements of certain high-dimensional sparse vectors, while a continuous probability distribution exists in such high-dimensional space. Specifically, given a dictionary for sparse coding, SpHMC first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the dictionary. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of SpHMC beyond baseline methods.

Author(s):  
Yuanfu Lu ◽  
Chuan Shi ◽  
Linmei Hu ◽  
Zhiyuan Liu

Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.


2015 ◽  
Vol 2015 ◽  
pp. 1-17 ◽  
Author(s):  
Huiwu Luo ◽  
Yuan Yan Tang ◽  
Lina Yang

The computational procedure of hyperspectral image (HSI) is extremely complex, not only due to the high dimensional information, but also due to the highly correlated data structure. The need of effective processing and analyzing of HSI has met many difficulties. It has been evidenced that dimensionality reduction has been found to be a powerful tool for high dimensional data analysis. Local Fisher’s liner discriminant analysis (LFDA) is an effective method to treat HSI processing. In this paper, a novel approach, called PD-LFDA, is proposed to overcome the weakness of LFDA. PD-LFDA emphasizes the probability distribution (PD) in LFDA, where the maximum distance is replaced with local variance for the construction of weight matrix and the class prior probability is applied to compute the affinity matrix. The proposed approach increases the discriminant ability of the transformed features in low dimensional space. Experimental results on Indian Pines 1992 data indicate that the proposed approach significantly outperforms the traditional alternatives.


2021 ◽  
Vol 15 (2) ◽  
pp. 1-21
Author(s):  
Kafeng Wang ◽  
Haoyi Xiong ◽  
Jiang Bian ◽  
Zhanxing Zhu ◽  
Qian Gao ◽  
...  

Stochastic Gradient Langevin Dynamics (SGLD) have been widely used for Bayesian sampling from certain probability distributions, incorporating derivatives of the log-posterior. With the derivative evaluation of the log-posterior distribution, SGLD methods generate samples from the distribution through performing as a thermostats dynamics that traverses over gradient flows of the log-posterior with certainly controllable perturbation. Even when the density is not known, existing solutions still can first learn the kernel density models from the given datasets, then produce new samples using the SGLD over the kernel density derivatives. In this work, instead of exploring new samples from kernel spaces, a novel SGLD sampler, namely, Randomized Measurement Langevin Dynamics (RMLD) is proposed to sample the high-dimensional sparse representations from the spectral domain of a given dataset. Specifically, given a random measurement matrix for sparse coding, RMLD first derives a novel likelihood evaluator of the probability distribution from the loss function of LASSO, then samples from the high-dimensional distribution using stochastic Langevin dynamics with derivatives of the logarithm likelihood and Metropolis–Hastings sampling. In addition, new samples in low-dimensional measuring spaces can be regenerated using the sampled high-dimensional vectors and the measurement matrix. The algorithm analysis shows that RMLD indeed projects a given dataset into a high-dimensional Gaussian distribution with Laplacian prior, then draw new sparse representation from the dataset through performing SGLD over the distribution. Extensive experiments have been conducted to evaluate the proposed algorithm using real-world datasets. The performance comparisons on three real-world applications demonstrate the superior performance of RMLD beyond baseline methods.


2014 ◽  
Vol 10 (S306) ◽  
pp. 68-71
Author(s):  
Giuseppe Vinci ◽  
Peter Freeman ◽  
Jeffrey Newman ◽  
Larry Wasserman ◽  
Christopher Genovese

AbstractThe incredible variety of galaxy shapes cannot be summarized by human defined discrete classes of shapes without causing a possibly large loss of information. Dictionary learning and sparse coding allow us to reduce the high dimensional space of shapes into a manageable low dimensional continuous vector space. Statistical inference can be done in the reduced space via probability distribution estimation and manifold estimation.


2021 ◽  
Author(s):  
Duluxan Sritharan ◽  
Shu Wang ◽  
Sahand Hormoz

Most high-dimensional datasets are thought to be inherently low-dimensional, that is, datapoints are constrained to lie on a low-dimensional manifold embedded in a high-dimensional ambient space. Here we study the viability of two approaches from differential geometry to estimate the Riemannian curvature of these low-dimensional manifolds. The intrinsic approach relates curvature to the Laplace-Beltrami operator using the heat-trace expansion, and is agnostic to how a manifold is embedded in a high-dimensional space. The extrinsic approach relates the ambient coordinates of a manifold's embedding to its curvature using the Second Fundamental Form and the Gauss-Codazzi equation. Keeping in mind practical constraints of real-world datasets, like small sample sizes and measurement noise, we found that estimating curvature is only feasible for even simple, low-dimensional toy manifolds, when the extrinsic approach is used. To test the applicability of the extrinsic approach to real-world data, we computed the curvature of a well-studied manifold of image patches, and recapitulated its topological classification as a Klein bottle. Lastly, we applied the approach to study single-cell transcriptomic sequencing (scRNAseq) datasets of blood, gastrulation, and brain cells, revealing for the first time the intrinsic curvature of scRNAseq manifolds.


Author(s):  
Sayak Dey ◽  
Swagatam Das ◽  
Rammohan Mallipeddi

Classical clustering methods usually face tough challenges when we have a larger set of features compared to the number of items to be partitioned. We propose a Sparse MinMax k-Means Clustering approach by reformulating the objective of the MinMax k-Means algorithm (a variation of classical k-Means that minimizes the maximum intra-cluster variance instead of the sum of intra-cluster variances), into a new weighted between-cluster sum of squares (BCSS) form. We impose sparse regularization on these weights to make it suitable for high-dimensional clustering. We seek to use the advantages of the MinMax k-Means algorithm in the high-dimensional space to generate good quality clusters. The efficacy of the proposal is showcased through comparison against a few representative clustering methods over several real world datasets.


Author(s):  
J. L. Cagney ◽  
S. S. Rao

Abstract The modeling of manufacturing errors in mechanisms is a significant task to validate practical designs. The use of probability distributions for errors can simulate manufacturing variations and real world operations. This paper presents the mechanical error analysis of universal joint drivelines. Each error is simulated using a probability distribution, i.e., a design of the mechanism is created by assigning random values to the errors. Each design is then evaluated by comparing the output error with a limiting value and the reliability of the universal joint is estimated. For this, the design is considered a failure whenever the output error exceeds the specified limit. In addition, the problem of synthesis, which involves the allocation of tolerances (errors) for minimum manufacturing cost without violating a specified accuracy requirement of the output, is also considered. Three probability distributions — normal, Weibull and beta distributions — were used to simulate the random values of the errors. The similarity of the results given by the three distributions suggests that the use of normal distribution would be acceptable for modeling the tolerances in most cases.


2017 ◽  
Vol 27 (1) ◽  
pp. 169-180 ◽  
Author(s):  
Marton Szemenyei ◽  
Ferenc Vajda

Abstract Dimension reduction and feature selection are fundamental tools for machine learning and data mining. Most existing methods, however, assume that objects are represented by a single vectorial descriptor. In reality, some description methods assign unordered sets or graphs of vectors to a single object, where each vector is assumed to have the same number of dimensions, but is drawn from a different probability distribution. Moreover, some applications (such as pose estimation) may require the recognition of individual vectors (nodes) of an object. In such cases it is essential that the nodes within a single object remain distinguishable after dimension reduction. In this paper we propose new discriminant analysis methods that are able to satisfy two criteria at the same time: separating between classes and between the nodes of an object instance. We analyze and evaluate our methods on several different synthetic and real-world datasets.


Author(s):  
Yang Fang ◽  
Xiang Zhao ◽  
Zhen Tan

Network Embedding (NE) is an important method to learn the representations of network via a low-dimensional space. Conventional NE models focus on capturing the structure information and semantic information of vertices while neglecting such information for edges. In this work, we propose a novel NE model named BimoNet to capture both the structure and semantic information of edges. BimoNet is composed of two parts, i.e., the bi-mode embedding part and the deep neural network part. For bi-mode embedding part, the first mode named add-mode is used to express the entity-shared features of edges and the second mode named subtract-mode is employed to represent the entity-specific features of edges. These features actually reflect the semantic information. For deep neural network part, we firstly regard the edges in a network as nodes, and the vertices as links, which will not change the overall structure of the whole network. Then we take the nodes' adjacent matrix as the input of the deep neural network as it can obtain similar representations for nodes with similar structure. Afterwards, by jointly optimizing the objective function of these two parts, BimoNet could preserve both the semantic and structure information of edges. In experiments, we evaluate BimoNet on three real-world datasets and task of relation extraction, and BimoNet is demonstrated to outperform state-of-the-art baseline models consistently and significantly.


2021 ◽  
pp. 1-12
Author(s):  
JinFang Sheng ◽  
Huaiyu Zuo ◽  
Bin Wang ◽  
Qiong Li

 In a complex network system, the structure of the network is an extremely important element for the analysis of the system, and the study of community detection algorithms is key to exploring the structure of the complex network. Traditional community detection algorithms would represent the network using an adjacency matrix based on observations, which may contain redundant information or noise that interferes with the detection results. In this paper, we propose a community detection algorithm based on density clustering. In order to improve the performance of density clustering, we consider an algorithmic framework for learning the continuous representation of network nodes in a low-dimensional space. The network structure is effectively preserved through network embedding, and density clustering is applied in the embedded low-dimensional space to compute the similarity of nodes in the network, which in turn reveals the implied structure in a given network. Experiments show that the algorithm has superior performance compared to other advanced community detection algorithms for real-world networks in multiple domains as well as synthetic networks, especially when the network data chaos is high.


Sign in / Sign up

Export Citation Format

Share Document