scholarly journals A Conformal Predictive System for Distribution Regression with Random Features

Author(s):  
Wei Zhang ◽  
Zhen He ◽  
Di WANG

Abstract Distribution regression is the regression case where the input objects are distributions. Many machine learning problems can be analysed in this framework, such as multi-instance learning and learning from noisy data. This paper attempts to build a conformal predictive system(CPS) for distribution regression, where the prediction of the system for a test input is a cumulative distribution function(CDF) of the corresponding test label. The CDF output by a CPS provides useful information about the test label, as it can estimate the probability of any event related to the label and be transformed to prediction interval and prediction point with the help of the corresponding quantiles. Furthermore, a CPS has the property of validity as the prediction CDFs and the prediction intervals are statistically compatible with the realizations. This property is desired for many risk-sensitive applications, such as weather forecast. To the best of our knowledge, this is the first work to extend the learning framework of CPS to distribution regression problems. We first embed the input distributions to a reproducing kernel Hilbert space using kernel mean embedding approximated by random Fourier features, and then build a fast CPS on the top of the embeddings. While inheriting the property of validity from the learning framework of CPS, our algorithm is simple, easy to implement and fast. The proposed approach is tested on synthetic data sets and can be used to tackle the problem of statistical postprocessing of ensemble forecasts, which demonstrates the effectiveness of our algorithm for distribution regression problems.

2019 ◽  
Vol 52 (5) ◽  
pp. 693-723
Author(s):  
Lingqing Yao ◽  
Roussos Dimitrakopoulos ◽  
Michel Gamache

AbstractThe present work proposes a new high-order simulation framework based on statistical learning. The training data consist of the sample data together with a training image, and the learning target is the underlying random field model of spatial attributes of interest. The learning process attempts to find a model with expected high-order spatial statistics that coincide with those observed in the available data, while the learning problem is approached within the statistical learning framework in a reproducing kernel Hilbert space (RKHS). More specifically, the required RKHS is constructed via a spatial Legendre moment (SLM) reproducing kernel that systematically incorporates the high-order spatial statistics. The target distributions of the random field are mapped into the SLM-RKHS to start the learning process, where solutions of the random field model amount to solving a quadratic programming problem. Case studies with a known data set in different initial settings show that sequential simulation under the new framework reproduces the high-order spatial statistics of the available data and resolves the potential conflicts between the training image and the sample data. This is due to the characteristics of the spatial Legendre moment kernel and the generalization capability of the proposed statistical learning framework. A three-dimensional case study at a gold deposit shows practical aspects of the proposed method in real-life applications.


2020 ◽  
Author(s):  
Konstantinos Slavakis ◽  
Masahiro Yukawa

<div>This paper introduces a non-parametric learning framework to combat outliers in online, multi-output, and nonlinear regression tasks. A hierarchical-optimization problem underpins the learning task: Search in a reproducing kernel Hilbert space (RKHS) for a function that minimizes a sample average $\ell_p$-norm ($1 \leq p \leq 2$) error loss on data contaminated by noise and outliers, subject to side information that takes the form of affine constraints defined as the set of minimizers of a quadratic loss on a finite number of faithful data devoid of noise and outliers. To surmount the computational obstacles inflicted by the choice of loss and the potentially infinite dimensional RKHS, approximations of the $\ell_p$-norm loss, as well as a novel twist of the criterion of approximate linear dependency are devised to keep the computational-complexity footprint of the proposed algorithm bounded over time. Numerical tests on datasets showcase the robust behavior of the advocated framework against different types of outliers, under a low computational load, while satisfying at the same time the affine constraints, in contrast to the state-of-the-art methods which are constraint agnostic.</div><div><br></div><div>-------</div><div><br></div><div>© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<br></div>


2017 ◽  
Vol 43 (3) ◽  
pp. 567-592 ◽  
Author(s):  
Dong Nguyen ◽  
Jacob Eisenstein

Quantifying the degree of spatial dependence for linguistic variables is a key task for analyzing dialectal variation. However, existing approaches have important drawbacks. First, they are based on parametric models of dependence, which limits their power in cases where the underlying parametric assumptions are violated. Second, they are not applicable to all types of linguistic data: Some approaches apply only to frequencies, others to boolean indicators of whether a linguistic variable is present. We present a new method for measuring geographical language variation, which solves both of these problems. Our approach builds on Reproducing Kernel Hilbert Space (RKHS) representations for nonparametric statistics, and takes the form of a test statistic that is computed from pairs of individual geotagged observations without aggregation into predefined geographical bins. We compare this test with prior work using synthetic data as well as a diverse set of real data sets: a corpus of Dutch tweets, a Dutch syntactic atlas, and a data set of letters to the editor in North American newspapers. Our proposed test is shown to support robust inferences across a broad range of scenarios and types of data.


2016 ◽  
Vol 14 (06) ◽  
pp. 809-827
Author(s):  
Ting Hu ◽  
Yuan Yao

This paper studies some robust regression problems associated with the [Formula: see text]-norm loss ([Formula: see text]) and the [Formula: see text]-insensitive [Formula: see text]-norm loss in the reproducing kernel Hilbert space. We establish a variance-expectation bound under a priori noise condition on the conditional distribution, which is the key technique to measure the error bound. Explicit learning rates will be given under the approximation ability assumptions on the reproducing kernel Hilbert space.


2020 ◽  
Author(s):  
Konstantinos Slavakis ◽  
Masahiro Yukawa

<div>This paper introduces a non-parametric learning framework to combat outliers in online, multi-output, and nonlinear regression tasks. A hierarchical-optimization problem underpins the learning task: Search in a reproducing kernel Hilbert space (RKHS) for a function that minimizes a sample average $\ell_p$-norm ($1 \leq p \leq 2$) error loss on data contaminated by noise and outliers, subject to side information that takes the form of affine constraints defined as the set of minimizers of a quadratic loss on a finite number of faithful data devoid of noise and outliers. To surmount the computational obstacles inflicted by the choice of loss and the potentially infinite dimensional RKHS, approximations of the $\ell_p$-norm loss, as well as a novel twist of the criterion of approximate linear dependency are devised to keep the computational-complexity footprint of the proposed algorithm bounded over time. Numerical tests on datasets showcase the robust behavior of the advocated framework against different types of outliers, under a low computational load, while satisfying at the same time the affine constraints, in contrast to the state-of-the-art methods which are constraint agnostic.</div><div><br></div><div>-------</div><div><br></div><div>© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<br></div>


2021 ◽  
Author(s):  
Konstantinos Slavakis ◽  
Masahiro Yukawa

<div>This paper introduces a non-parametric learning framework to combat outliers in online, multi-output, and nonlinear regression tasks. A hierarchical-optimization problem underpins the learning task: Search in a reproducing kernel Hilbert space (RKHS) for a function that minimizes a sample average $\ell_p$-norm ($1 \leq p \leq 2$) error loss on data contaminated by noise and outliers, subject to side information that takes the form of affine constraints defined as the set of minimizers of a quadratic loss on a finite number of faithful data devoid of noise and outliers. To surmount the computational obstacles inflicted by the choice of loss and the potentially infinite dimensional RKHS, approximations of the $\ell_p$-norm loss, as well as a novel twist of the criterion of approximate linear dependency are devised to keep the computational-complexity footprint of the proposed algorithm bounded over time. Numerical tests on datasets showcase the robust behavior of the advocated framework against different types of outliers, under a low computational load, while satisfying at the same time the affine constraints, in contrast to the state-of-the-art methods which are constraint agnostic.</div><div><br></div><div>-------</div><div><br></div><div>© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<br></div>


2020 ◽  
Author(s):  
Konstantinos Slavakis ◽  
Masahiro Yukawa

<div>This paper introduces a non-parametric learning framework to combat outliers in online, multi-output, and nonlinear regression tasks. A hierarchical-optimization problem underpins the learning task: Search in a reproducing kernel Hilbert space (RKHS) for a function that minimizes a sample average $\ell_p$-norm ($1 \leq p \leq 2$) error loss on data contaminated by noise and outliers, subject to side information that takes the form of affine constraints defined as the set of minimizers of a quadratic loss on a finite number of faithful data devoid of noise and outliers. To surmount the computational obstacles inflicted by the choice of loss and the potentially infinite dimensional RKHS, approximations of the $\ell_p$-norm loss, as well as a novel twist of the criterion of approximate linear dependency are devised to keep the computational-complexity footprint of the proposed algorithm bounded over time. Numerical tests on datasets showcase the robust behavior of the advocated framework against different types of outliers, under a low computational load, while satisfying at the same time the affine constraints, in contrast to the state-of-the-art methods which are constraint agnostic.</div><div><br></div><div>-------</div><div><br></div><div>© 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.<br></div>


Author(s):  
Michael T Jury ◽  
Robert T W Martin

Abstract We extend the Lebesgue decomposition of positive measures with respect to Lebesgue measure on the complex unit circle to the non-commutative (NC) multi-variable setting of (positive) NC measures. These are positive linear functionals on a certain self-adjoint subspace of the Cuntz–Toeplitz $C^{\ast }-$algebra, the $C^{\ast }-$algebra of the left creation operators on the full Fock space. This theory is fundamentally connected to the representation theory of the Cuntz and Cuntz–Toeplitz $C^{\ast }-$algebras; any *−representation of the Cuntz–Toeplitz $C^{\ast }-$algebra is obtained (up to unitary equivalence), by applying a Gelfand–Naimark–Segal construction to a positive NC measure. Our approach combines the theory of Lebesgue decomposition of sesquilinear forms in Hilbert space, Lebesgue decomposition of row isometries, free semigroup algebra theory, NC reproducing kernel Hilbert space theory, and NC Hardy space theory.


Sign in / Sign up

Export Citation Format

Share Document