scholarly journals Entropic gradient descent algorithms and wide flat minima*

2021 ◽  
Vol 2021 (12) ◽  
pp. 124015
Author(s):  
Fabrizio Pittorino ◽  
Carlo Lucibello ◽  
Christoph Feinauer ◽  
Gabriele Perugini ◽  
Carlo Baldassi ◽  
...  

Abstract The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: the local entropy, which is useful for analysis and algorithm development, and the local energy, which is easier to compute and was shown empirically in extensive tests on state-of-the-art networks to be the best predictor of generalization capabilities. We show semi-analytically in simple controlled scenarios that these two measures correlate strongly with each other and with generalization. Then, we extend the analysis to the deep learning scenario by extensive numerical validations. We study two algorithms, entropy-stochastic gradient descent and replicated-stochastic gradient descent, that explicitly include the local entropy in the optimization objective. We devise a training schedule by which we consistently find flatter minima (using both flatness measures), and improve the generalization error for common architectures (e.g. ResNet, EfficientNet).

Quantum ◽  
2020 ◽  
Vol 4 ◽  
pp. 263 ◽  
Author(s):  
Jonas M. Kübler ◽  
Andrew Arrasmith ◽  
Lukasz Cincio ◽  
Patrick J. Coles

Variational hybrid quantum-classical algorithms (VHQCAs) have the potential to be useful in the era of near-term quantum computing. However, recently there has been concern regarding the number of measurements needed for convergence of VHQCAs. Here, we address this concern by investigating the classical optimizer in VHQCAs. We introduce a novel optimizer called individual Coupled Adaptive Number of Shots (iCANS). This adaptive optimizer frugally selects the number of measurements (i.e., number of shots) both for a given iteration and for a given partial derivative in a stochastic gradient descent. We numerically simulate the performance of iCANS for the variational quantum eigensolver and for variational quantum compiling, with and without noise. In all cases, and especially in the noisy case, iCANS tends to out-perform state-of-the-art optimizers for VHQCAs. We therefore believe this adaptive optimizer will be useful for realistic VHQCA implementations, where the number of measurements is limited.


2019 ◽  
Vol 9 (4) ◽  
pp. 851-873 ◽  
Author(s):  
Jing An ◽  
Jianfeng Lu ◽  
Lexing Ying

Abstract We propose stochastic modified equations (SMEs) for modelling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME’s precise prediction to the trajectories of ASGD with various forcing terms. As an application, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.


2020 ◽  
Vol 34 (04) ◽  
pp. 6219-6226
Author(s):  
Jun Wang ◽  
Zhi-Hua Zhou

Differentially private learning tackles tasks where the data are private and the learning process is subject to differential privacy requirements. In real applications, however, some public data are generally available in addition to private data, and it is interesting to consider how to exploit them. In this paper, we study a common situation where a small amount of public data can be used when solving the Empirical Risk Minimization problem over a private database. Specifically, we propose Private-Public Stochastic Gradient Descent, which utilizes such public information to adjust parameters in differentially private stochastic gradient descent and fine-tunes the final result with model reuse. Our method keeps differential privacy for the private database, and empirical study validates its superiority compared with existing approaches.


2002 ◽  
Vol 8 (2) ◽  
pp. 103-121 ◽  
Author(s):  
Nicolas Meuleau ◽  
Marco Dorigo

In this article, we study the relationship between the two techniques known as ant colony optimization (ACO) and stochastic gradient descent. More precisely, we show that some empirical ACO algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of ACO algorithms. We then use this insight to explore the mutual contributions of the two techniques.


Author(s):  
Marco Mele ◽  
Cosimo Magazzino ◽  
Nicolas Schneider ◽  
Floriana Nicolai

AbstractAlthough the literature on the relationship between economic growth and CO2 emissions is extensive, the use of machine learning (ML) tools remains seminal. In this paper, we assess this nexus for Italy using innovative algorithms, with yearly data for the 1960–2017 period. We develop three distinct models: the batch gradient descent (BGD), the stochastic gradient descent (SGD), and the multilayer perceptron (MLP). Despite the phase of low Italian economic growth, results reveal that CO2 emissions increased in the predicting model. Compared to the observed statistical data, the algorithm shows a correlation between low growth and higher CO2 increase, which contradicts the main strand of literature. Based on this outcome, adequate policy recommendations are provided.


Sign in / Sign up

Export Citation Format

Share Document