Entropy-Penalized Semidefinite Programming

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Smooth input preparation for quantum and quantum-inspired machine learning

Quantum Machine Intelligence ◽

10.1007/s42484-021-00045-x ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Zhikuan Zhao ◽

Jack K. Fitzsimons ◽

Patrick Rebentrost ◽

Vedran Dunjko ◽

Joseph F. Fitzsimons

Keyword(s):

Machine Learning ◽

Quantum Algorithms ◽

Machine Learning Algorithms ◽

Low Rank ◽

Smoothed Analysis ◽

Machine Learning Applications ◽

Data Points ◽

Input Model ◽

The Cost ◽

Input Perturbation

AbstractMachine learning has recently emerged as a fruitful area for finding potential quantum computational advantage. Many of the quantum-enhanced machine learning algorithms critically hinge upon the ability to efficiently produce states proportional to high-dimensional data points stored in a quantum accessible memory. Even given query access to exponentially many entries stored in a database, the construction of which is considered a one-off overhead, it has been argued that the cost of preparing such amplitude-encoded states may offset any exponential quantum advantage. Here we prove using smoothed analysis that if the data analysis algorithm is robust against small entry-wise input perturbation, state preparation can always be achieved with constant queries. This criterion is typically satisfied in realistic machine learning applications, where input data is subjective to moderate noise. Our results are equally applicable to the recent seminal progress in quantum-inspired algorithms, where specially constructed databases suffice for polylogarithmic classical algorithm in low-rank cases. The consequence of our finding is that for the purpose of practical machine learning, polylogarithmic processing time is possible under a general and flexible input model with quantum algorithms or quantum-inspired classical algorithms in the low-rank cases.

Download Full-text

A Unifying Representer Theorem for Inverse Problems and Machine Learning

Foundations of Computational Mathematics ◽

10.1007/s10208-020-09472-x ◽

2020 ◽

Author(s):

Michael Unser

Keyword(s):

Machine Learning ◽

Banach Spaces ◽

Tikhonov Regularization ◽

Optimization Problems ◽

Broad Class ◽

Training Problem ◽

Optimization Task ◽

Machine Leaning ◽

Sparse Solutions ◽

Standard Strategy

Abstract Regularization addresses the ill-posedness of the training problem in machine learning or the reconstruction of a signal from a limited number of measurements. The method is applicable whenever the problem is formulated as an optimization task. The standard strategy consists in augmenting the original cost functional by an energy that penalizes solutions with undesirable behavior. The effect of regularization is very well understood when the penalty involves a Hilbertian norm. Another popular configuration is the use of an $$\ell _1$$ ℓ 1 -norm (or some variant thereof) that favors sparse solutions. In this paper, we propose a higher-level formulation of regularization within the context of Banach spaces. We present a general representer theorem that characterizes the solutions of a remarkably broad class of optimization problems. We then use our theorem to retrieve a number of known results in the literature such as the celebrated representer theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, the recovery of spikes, as well as a few new ones.

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218.v1 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Exact and inexact subsampled Newton methods for optimization

IMA Journal of Numerical Analysis ◽

10.1093/imanum/dry009 ◽

2018 ◽

Vol 39 (2) ◽

pp. 545-578 ◽

Cited By ~ 11

Author(s):

Raghu Bollapragada ◽

Richard H Byrd ◽

Jorge Nocedal

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Rate Of Convergence ◽

Conjugate Gradient ◽

Complexity Analysis ◽

Optimization Problems ◽

Inexact Newton Method ◽

Newton Methods ◽

Machine Learning Applications

Abstract The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance of inexact subsampled Newton methods on machine learning applications based on logistic regression.

Download Full-text

General-Purpose Bayesian Tensor Learning With Automatic Rank Determination and Uncertainty Quantification

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.668353 ◽

2022 ◽

Vol 4 ◽

Author(s):

Kaiqi Zhang ◽

Cole Hawkins ◽

Zheng Zhang

Keyword(s):

Machine Learning ◽

Uncertainty Quantification ◽

Large Scale ◽

Broad Class ◽

Low Rank ◽

Tensor Rank ◽

Tensor Completion ◽

Volume Data ◽

Rank Tensor ◽

Rank Determination

A major challenge in many machine learning tasks is that the model expressive power depends on model size. Low-rank tensor methods are an efficient tool for handling the curse of dimensionality in many large-scale machine learning models. The major challenges in training a tensor learning model include how to process the high-volume data, how to determine the tensor rank automatically, and how to estimate the uncertainty of the results. While existing tensor learning focuses on a specific task, this paper proposes a generic Bayesian framework that can be employed to solve a broad class of tensor learning problems such as tensor completion, tensor regression, and tensorized neural networks. We develop a low-rank tensor prior for automatic rank determination in nonlinear problems. Our method is implemented with both stochastic gradient Hamiltonian Monte Carlo (SGHMC) and Stein Variational Gradient Descent (SVGD). We compare the automatic rank determination and uncertainty quantification of these two solvers. We demonstrate that our proposed method can determine the tensor rank automatically and can quantify the uncertainty of the obtained results. We validate our framework on tensor completion tasks and tensorized neural network training tasks.

Download Full-text

Applications of MaxSAT in Data Analysis

10.29007/3qkh ◽

2019 ◽

Author(s):

Jeremias Berg ◽

Antti Hyttinen ◽

Matti Järvisalo

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Real World ◽

Optimization Problems ◽

Learning Problems ◽

Integral Role ◽

Benchmark Sets

We highlight important real-world optimization problems arising from data analysis and machine learning, representing somewhat atypical applications of SAT-based solver technology, to which the SAT community could focus more attention on. To address the problem of current lack of heterogeneity in benchmark sets available for evaluating MaxSAT solvers, we provide a benchmark library of MaxSAT instances encoding different data analysis and machine learning problems. By doing so, we also advocate extending MaxSAT solvers to accept real-valued weights for soft clauses as input via the presented problem domains in which the use of real-valued costs plays an integral role.

Download Full-text

New Directions in Streaming Algorithms

10.31237/osf.io/59ezx ◽

2020 ◽

Author(s):

Ali Vakilian

Keyword(s):

Machine Learning ◽

Combinatorial Optimization ◽

Design Theory ◽

Large Scale ◽

Computational Models ◽

Optimization Problems ◽

Low Rank ◽

Set Cover ◽

Streaming Algorithms ◽

Combinatorial Optimization Problems

Large volumes of available data have led to the emergence of new computational models for data analysis. One such model is captured by the notion of streaming algorithms: given a sequence of N items, the goal is to compute the value of a given function of the input items by a small number of passes and using a sublinear amount of space in N. Streaming algorithms have applications in many areas such as networking and large scale machine learning. Despite a huge amount of work on this area over the last two decades, there are multiple aspects of streaming algorithms that remained poorly understood, such as (a) streaming algorithms for combinatorial optimization problems and (b) incorporating modern machine learningtechniques in the design of streaming algorithms. In the first part of this thesis, we will describe (essentially) optimal streaming algorithms for set cover and maximum coverage, two classic problems in combinatorial optimization. Next, in the second part, we will show how to augment classic streaming algorithms of the frequency estimation and low-rank approximation problems with machine learning oracles in order to improve their space-accuracy tradeoffs. The new algorithms combine the benefits of machine learning with the formal guarantees available through algorithm design theory.

Download Full-text

Dealer

Proceedings of the VLDB Endowment ◽

10.14778/3447689.3447700 ◽

2021 ◽

Vol 14 (6) ◽

pp. 957-969

Author(s):

Jinfei Liu ◽

Jian Lou ◽

Junxu Liu ◽

Li Xiong ◽

Jian Pei ◽

...

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Optimization Problems ◽

Data Driven ◽

Noise Sensitivity ◽

Efficiency And Effectiveness ◽

Machine Learning Applications ◽

Model Training ◽

Price Functions ◽

Synthetic Datasets

Data-driven machine learning has become ubiquitous. A marketplace for machine learning models connects data owners and model buyers, and can dramatically facilitate data-driven machine learning applications. In this paper, we take a formal data marketplace perspective and propose the first en D -to-end mod e l m a rketp l ace with diff e rential p r ivacy ( Dealer ) towards answering the following questions: How to formulate data owners' compensation functions and model buyers' price functions? How can the broker determine prices for a set of models to maximize the revenue with arbitrage-free guarantee, and train a set of models with maximum Shapley coverage given a manufacturing budget to remain competitive ? For the former, we propose compensation function for each data owner based on Shapley value and privacy sensitivity, and price function for each model buyer based on Shapley coverage sensitivity and noise sensitivity. Both privacy sensitivity and noise sensitivity are measured by the level of differential privacy. For the latter, we formulate two optimization problems for model pricing and model training, and propose efficient dynamic programming algorithms. Experiment results on the real chess dataset and synthetic datasets justify the design of Dealer and verify the efficiency and effectiveness of the proposed algorithms.

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text