scholarly journals Nonconvex Sparse Regularization for Deep Neural Networks and Its Optimality

2021 ◽  
pp. 1-42
Author(s):  
Ilsang Ohn ◽  
Yongdai Kim

Abstract Recent theoretical studies proved that deep neural network (DNN) estimators obtained by minimizing empirical risk with a certain sparsity constraint can attain optimal convergence rates for regression and classification problems. However, the sparsity constraint requires knowing certain properties of the true model, which are not available in practice. Moreover, computation is difficult due to the discrete nature of the sparsity constraint. In this letter, we propose a novel penalized estimation method for sparse DNNs that resolves the problems existing in the sparsity constraint. We establish an oracle inequality for the excess risk of the proposed sparse-penalized DNN estimator and derive convergence rates for several learning tasks. In particular, we prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems. For computation, we develop an efficient gradient-based optimization algorithm that guarantees the monotonic reduction of the objective function.

2014 ◽  
Vol 30 (6) ◽  
pp. 1247-1271 ◽  
Author(s):  
Cheng Li ◽  
Wenxin Jiang ◽  
Martin A. Tanner

The Gibbs posterior is a useful tool for risk minimization, which adopts a Bayesian framework and can incorporate convenient computational algorithms such as Markov chain Monte Carlo. We derive risk bounds for the Gibbs posterior using some general nonasymptotic inequalities, which can be used to derive nearly optimal convergence rates and select models to optimally balance the approximation errors and the stochastic errors. These inequalities are formulated in a very general way that does not require the empirical risk to be a usual sample average over independent observations. We apply this framework to study the convergence rate of the GMM (generalized method of moments) risk and derive an oracle inequality for the ranking risk, where models are selected based on the Gibbs posterior with a nonadditive empirical risk.


2020 ◽  
Author(s):  
Yuxuan Du ◽  
Min-Hsiu Hsieh ◽  
Tongliang Liu ◽  
Shan You ◽  
Dacheng Tao

Abstract Quantum neural network (QNN), or equivalently, the variational quantum circuits with a gradient-based classical optimizer, has been broadly applied to many experimental proposals for noisy intermediate scale quantum (NISQ) devices. However, the learning capability of QNN remains largely unknown due to the non-convex optimization landscape, the measurement error, and the unavoidable gate noise introduced by NISQ machines. In this study, we theoretically explore the learnability of QNN from the perspective of the trainability and generalization. Particularly, we derive the convergence performance of QNN under the NISQ setting, and identify classes of computationally hard concepts that can be efficiently learned by QNN. Our results demonstrate that large gate noise, few quantum measurements, and deep circuit depth will lead to poor convergence rates of QNN towards the empirical risk minimization. Moreover, we prove that any concept class, which is efficiently learnable by a restricted quantum statistical query (QSQ) learning model, can also be efficiently learned by QNN. Since the restricted QSQ learning model can tackle certain problems such as parity learning with a runtime speedup, our result suggests that QNN established on NISQ devices will retain the quantum advantage. Our work provides the theoretical guidance for developing advanced QNNs and opens up avenues for exploring quantum advantages using NISQ devices.


Author(s):  
Radu Boţ ◽  
Guozhi Dong ◽  
Peter Elbau ◽  
Otmar Scherzer

AbstractRecently, there has been a great interest in analysing dynamical flows, where the stationary limit is the minimiser of a convex energy. Particular flows of great interest have been continuous limits of Nesterov’s algorithm and the fast iterative shrinkage-thresholding algorithm, respectively. In this paper, we approach the solutions of linear ill-posed problems by dynamical flows. Because the squared norm of the residual of a linear operator equation is a convex functional, the theoretical results from convex analysis for energy minimising flows are applicable. However, in the restricted situation of this paper they can often be significantly improved. Moreover, since we show that the proposed flows for minimising the norm of the residual of a linear operator equation are optimal regularisation methods and that they provide optimal convergence rates for the regularised solutions, the given rates can be considered the benchmarks for further studies in convex analysis.


2015 ◽  
Vol 2015 ◽  
pp. 1-8
Author(s):  
Mingchen Yao ◽  
Chao Zhang ◽  
Wei Wu

Many generalization results in learning theory are established under the assumption that samples are independent and identically distributed (i.i.d.). However, numerous learning tasks in practical applications involve the time-dependent data. In this paper, we propose a theoretical framework to analyze the generalization performance of the empirical risk minimization (ERM) principle for sequences of time-dependent samples (TDS). In particular, we first present the generalization bound of ERM principle for TDS. By introducing some auxiliary quantities, we also give a further analysis of the generalization properties and the asymptotical behaviors of ERM principle for TDS.


Author(s):  
Zhengling Qi ◽  
Ying Cui ◽  
Yufeng Liu ◽  
Jong-Shi Pang

This paper has two main goals: (a) establish several statistical properties—consistency, asymptotic distributions, and convergence rates—of stationary solutions and values of a class of coupled nonconvex and nonsmooth empirical risk-minimization problems and (b) validate these properties by a noisy amplitude-based phase-retrieval problem, the latter being of much topical interest. Derived from available data via sampling, these empirical risk-minimization problems are the computational workhorse of a population risk model that involves the minimization of an expected value of a random functional. When these minimization problems are nonconvex, the computation of their globally optimal solutions is elusive. Together with the fact that the expectation operator cannot be evaluated for general probability distributions, it becomes necessary to justify whether the stationary solutions of the empirical problems are practical approximations of the stationary solution of the population problem. When these two features, general distribution and nonconvexity, are coupled with nondifferentiability that often renders the problems “non-Clarke regular,” the task of the justification becomes challenging. Our work aims to address such a challenge within an algorithm-free setting. The resulting analysis is, therefore, different from much of the analysis in the recent literature that is based on local search algorithms. Furthermore, supplementing the classical global minimizer-centric analysis, our results offer a promising step to close the gap between computational optimization and asymptotic analysis of coupled, nonconvex, nonsmooth statistical estimation problems, expanding the former with statistical properties of the practically obtained solution and providing the latter with a more practical focus pertaining to computational tractability.


Author(s):  
Huijun Guo ◽  
Junke Kou

This paper considers wavelet estimations of a regression function based on negatively associated sample. We provide upper bound estimations over [Formula: see text] risk of linear and nonlinear wavelet estimators in Besov space, respectively. When the random sample reduces to the independent case, our convergence rates coincide with the optimal convergence rates of classical nonparametric regression estimation.


Sign in / Sign up

Export Citation Format

Share Document