scholarly journals Dimensionality Reduction of Complex Metastable Systems via Kernel Embeddings of Transition Manifolds

2020 ◽  
Vol 31 (1) ◽  
Author(s):  
Andreas Bittracher ◽  
Stefan Klus ◽  
Boumediene Hamzi ◽  
Péter Koltai ◽  
Christof Schütte

AbstractWe present a novel kernel-based machine learning algorithm for identifying the low-dimensional geometry of the effective dynamics of high-dimensional multiscale stochastic systems. Recently, the authors developed a mathematical framework for the computation of optimal reaction coordinates of such systems that is based on learning a parameterization of a low-dimensional transition manifold in a certain function space. In this article, we enhance this approach by embedding and learning this transition manifold in a reproducing kernel Hilbert space, exploiting the favorable properties of kernel embeddings. Under mild assumptions on the kernel, the manifold structure is shown to be preserved under the embedding, and distortion bounds can be derived. This leads to a more robust and more efficient algorithm compared to the previous parameterization approaches.

2019 ◽  
Vol 9 (3) ◽  
pp. 677-719 ◽  
Author(s):  
Xiuyuan Cheng ◽  
Alexander Cloninger ◽  
Ronald R Coifman

Abstract The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $n$ data points and a set of $n_R$ reference points, where $n_R$ can be drastically smaller than $n$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $\|p-q\| \sqrt{n} \to \infty $, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.


Author(s):  
Jianzhong Wang

Let [Formula: see text] be a data set in [Formula: see text], where [Formula: see text] is the training set and [Formula: see text] is the test one. Many unsupervised learning algorithms based on kernel methods have been developed to provide dimensionality reduction (DR) embedding for a given training set [Formula: see text] ([Formula: see text]) that maps the high-dimensional data [Formula: see text] to its low-dimensional feature representation [Formula: see text]. However, these algorithms do not straightforwardly produce DR of the test set [Formula: see text]. An out-of-sample extension method provides DR of [Formula: see text] using an extension of the existent embedding [Formula: see text], instead of re-computing the DR embedding for the whole set [Formula: see text]. Among various out-of-sample DR extension methods, those based on Nyström approximation are very attractive. Many papers have developed such out-of-extension algorithms and shown their validity by numerical experiments. However, the mathematical theory for the DR extension still need further consideration. Utilizing the reproducing kernel Hilbert space (RKHS) theory, this paper develops a preliminary mathematical analysis on the out-of-sample DR extension operators. It treats an out-of-sample DR extension operator as an extension of the identity on the RKHS defined on [Formula: see text]. Then the Nyström-type DR extension turns out to be an orthogonal projection. In the paper, we also present the conditions for the exact DR extension and give the estimate for the error of the extension.


2006 ◽  
Vol 18 (12) ◽  
pp. 3097-3118 ◽  
Author(s):  
Matthias O. Franz ◽  
Bernhard Schölkopf

Volterra and Wiener series are perhaps the best-understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by using polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.


2021 ◽  
Vol 2021 (12) ◽  
pp. 124009
Author(s):  
Behrooz Ghorbani ◽  
Song Mei ◽  
Theodor Misiakiewicz ◽  
Andrea Montanari

Abstract For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization. How can we reconcile the above claims? For which tasks do NNs outperform RKHS? If covariates are nearly isotropic, RKHS methods suffer from the curse of dimensionality, while NNs can overcome it by learning the best low-dimensional representation. Here we show that this curse of dimensionality becomes milder if the covariates display the same low-dimensional structure as the target function, and we precisely characterize this tradeoff. Building on these results, we present the spiked covariates model that can capture in a unified framework both behaviors observed in earlier work. We hypothesize that such a latent low-dimensional structure is present in image classification. We test numerically this hypothesis by showing that specific perturbations of the training distribution degrade the performances of RKHS methods much more significantly than NNs.


Author(s):  
Luting Yang ◽  
Jianyi Yang ◽  
Shaolei Ren

Contextual bandit is a classic multi-armed bandit setting, where side information (i.e., context) is available before arm selection. A standard assumption is that exact contexts are perfectly known prior to arm selection and only single feedback is returned. In this work, we focus on multi-feedback bandit learning with probabilistic contexts, where a bundle of contexts are revealed to the agent along with their corresponding probabilities at the beginning of each round. This models such scenarios as where contexts are drawn from the probability output of a neural network and the reward function is jointly determined by multiple feedback signals. We propose a kernelized learning algorithm based on upper confidence bound to choose the optimal arm in reproducing kernel Hilbert space for each context bundle. Moreover, we theoretically establish an upper bound on the cumulative regret with respect to an oracle that knows the optimal arm given probabilistic contexts, and show that the bound grows sublinearly with time. Our simula- tion on machine learning model recommendation further validates the sub-linearity of our cumulative regret and demonstrates that our algorithm outper- forms the approach that selects arms based on the most probable context.


Author(s):  
Michael T Jury ◽  
Robert T W Martin

Abstract We extend the Lebesgue decomposition of positive measures with respect to Lebesgue measure on the complex unit circle to the non-commutative (NC) multi-variable setting of (positive) NC measures. These are positive linear functionals on a certain self-adjoint subspace of the Cuntz–Toeplitz $C^{\ast }-$algebra, the $C^{\ast }-$algebra of the left creation operators on the full Fock space. This theory is fundamentally connected to the representation theory of the Cuntz and Cuntz–Toeplitz $C^{\ast }-$algebras; any *−representation of the Cuntz–Toeplitz $C^{\ast }-$algebra is obtained (up to unitary equivalence), by applying a Gelfand–Naimark–Segal construction to a positive NC measure. Our approach combines the theory of Lebesgue decomposition of sesquilinear forms in Hilbert space, Lebesgue decomposition of row isometries, free semigroup algebra theory, NC reproducing kernel Hilbert space theory, and NC Hardy space theory.


2021 ◽  
Vol 14 (2) ◽  
pp. 201-214
Author(s):  
Danilo Croce ◽  
Giuseppe Castellucci ◽  
Roberto Basili

In recent years, Deep Learning methods have become very popular in classification tasks for Natural Language Processing (NLP); this is mainly due to their ability to reach high performances by relying on very simple input representations, i.e., raw tokens. One of the drawbacks of deep architectures is the large amount of annotated data required for an effective training. Usually, in Machine Learning this problem is mitigated by the usage of semi-supervised methods or, more recently, by using Transfer Learning, in the context of deep architectures. One recent promising method to enable semi-supervised learning in deep architectures has been formalized within Semi-Supervised Generative Adversarial Networks (SS-GANs) in the context of Computer Vision. In this paper, we adopt the SS-GAN framework to enable semi-supervised learning in the context of NLP. We demonstrate how an SS-GAN can boost the performances of simple architectures when operating in expressive low-dimensional embeddings; these are derived by combining the unsupervised approximation of linguistic Reproducing Kernel Hilbert Spaces and the so-called Universal Sentence Encoders. We experimentally evaluate the proposed approach over a semantic classification task, i.e., Question Classification, by considering different sizes of training material and different numbers of target classes. By applying such adversarial schema to a simple Multi-Layer Perceptron, a classifier trained over a subset derived from 1% of the original training material achieves 92% of accuracy. Moreover, when considering a complex classification schema, e.g., involving 50 classes, the proposed method outperforms state-of-the-art alternatives such as BERT.


Author(s):  
Nicolas Nagel ◽  
Martin Schäfer ◽  
Tino Ullrich

AbstractWe provide a new upper bound for sampling numbers $$(g_n)_{n\in \mathbb {N}}$$ ( g n ) n ∈ N associated with the compact embedding of a separable reproducing kernel Hilbert space into the space of square integrable functions. There are universal constants $$C,c>0$$ C , c > 0 (which are specified in the paper) such that $$\begin{aligned} g^2_n \le \frac{C\log (n)}{n}\sum \limits _{k\ge \lfloor cn \rfloor } \sigma _k^2,\quad n\ge 2, \end{aligned}$$ g n 2 ≤ C log ( n ) n ∑ k ≥ ⌊ c n ⌋ σ k 2 , n ≥ 2 , where $$(\sigma _k)_{k\in \mathbb {N}}$$ ( σ k ) k ∈ N is the sequence of singular numbers (approximation numbers) of the Hilbert–Schmidt embedding $$\mathrm {Id}:H(K) \rightarrow L_2(D,\varrho _D)$$ Id : H ( K ) → L 2 ( D , ϱ D ) . The algorithm which realizes the bound is a least squares algorithm based on a specific set of sampling nodes. These are constructed out of a random draw in combination with a down-sampling procedure coming from the celebrated proof of Weaver’s conjecture, which was shown to be equivalent to the Kadison–Singer problem. Our result is non-constructive since we only show the existence of a linear sampling operator realizing the above bound. The general result can for instance be applied to the well-known situation of $$H^s_{\text {mix}}(\mathbb {T}^d)$$ H mix s ( T d ) in $$L_2(\mathbb {T}^d)$$ L 2 ( T d ) with $$s>1/2$$ s > 1 / 2 . We obtain the asymptotic bound $$\begin{aligned} g_n \le C_{s,d}n^{-s}\log (n)^{(d-1)s+1/2}, \end{aligned}$$ g n ≤ C s , d n - s log ( n ) ( d - 1 ) s + 1 / 2 , which improves on very recent results by shortening the gap between upper and lower bound to $$\sqrt{\log (n)}$$ log ( n ) . The result implies that for dimensions $$d>2$$ d > 2 any sparse grid sampling recovery method does not perform asymptotically optimal.


Sign in / Sign up

Export Citation Format

Share Document