Entropic gradient descent algorithms and wide flat minima*

Abstract The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. In this work we first discuss the relationship between alternative measures of flatness: the local entropy, which is useful for analysis and algorithm development, and the local energy, which is easier to compute and was shown empirically in extensive tests on state-of-the-art networks to be the best predictor of generalization capabilities. We show semi-analytically in simple controlled scenarios that these two measures correlate strongly with each other and with generalization. Then, we extend the analysis to the deep learning scenario by extensive numerical validations. We study two algorithms, entropy-stochastic gradient descent and replicated-stochastic gradient descent, that explicitly include the local entropy in the optimization objective. We devise a training schedule by which we consistently find flatter minima (using both flatness measures), and improve the generalization error for common architectures (e.g. ResNet, EfficientNet).

Download Full-text

An Adaptive Optimizer for Measurement-Frugal Variational Algorithms

Quantum ◽

10.22331/q-2020-05-11-263 ◽

2020 ◽

Vol 4 ◽

pp. 263 ◽

Cited By ~ 6

Author(s):

Jonas M. Kübler ◽

Andrew Arrasmith ◽

Lukasz Cincio ◽

Patrick J. Coles

Keyword(s):

Quantum Computing ◽

Partial Derivative ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Near Term ◽

Variational Algorithms

Variational hybrid quantum-classical algorithms (VHQCAs) have the potential to be useful in the era of near-term quantum computing. However, recently there has been concern regarding the number of measurements needed for convergence of VHQCAs. Here, we address this concern by investigating the classical optimizer in VHQCAs. We introduce a novel optimizer called individual Coupled Adaptive Number of Shots (iCANS). This adaptive optimizer frugally selects the number of measurements (i.e., number of shots) both for a given iteration and for a given partial derivative in a stochastic gradient descent. We numerically simulate the performance of iCANS for the variational quantum eigensolver and for variational quantum compiling, with and without noise. In all cases, and especially in the noisy case, iCANS tends to out-perform state-of-the-art optimizers for VHQCAs. We therefore believe this adaptive optimizer will be useful for realistic VHQCA implementations, where the number of measurements is limited.

Download Full-text

Stochastic modified equations for the asynchronous stochastic gradient descent

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iaz030 ◽

2019 ◽

Vol 9 (4) ◽

pp. 851-873 ◽

Cited By ~ 1

Author(s):

Jing An ◽

Jianfeng Lu ◽

Lexing Ying

Keyword(s):

Optimal Control ◽

Continuous Time ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Algorithms ◽

Modified Equations ◽

Different Types ◽

Precise Prediction ◽

The Relationship

Abstract We propose stochastic modified equations (SMEs) for modelling the asynchronous stochastic gradient descent (ASGD) algorithms. The resulting SME of Langevin type extracts more information about the ASGD dynamics and elucidates the relationship between different types of stochastic gradient algorithms. We show the convergence of ASGD to the SME in the continuous time limit, as well as the SME’s precise prediction to the trajectories of ASGD with various forcing terms. As an application, we propose an optimal mini-batching strategy for ASGD via solving the optimal control problem of the associated SME.

Download Full-text

The Minimization of Empirical Risk Through Stochastic Gradient Descent with Momentum Algorithms

Advances in Intelligent Systems and Computing - Artificial Intelligence Methods in Intelligent Algorithms ◽

10.1007/978-3-030-19810-7_17 ◽

2019 ◽

pp. 168-181

Author(s):

Arindam Chaudhuri

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Empirical Risk

Download Full-text

Differentially Private Learning with Small Public Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6088 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6219-6226

Author(s):

Jun Wang ◽

Zhi-Hua Zhou

Keyword(s):

Gradient Descent ◽

Differential Privacy ◽

Public Information ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Risk Minimization ◽

Empirical Risk ◽

Private Data ◽

Public Data ◽

Common Situation

Differentially private learning tackles tasks where the data are private and the learning process is subject to differential privacy requirements. In real applications, however, some public data are generally available in addition to private data, and it is interesting to consider how to exploit them. In this paper, we study a common situation where a small amount of public data can be used when solving the Empirical Risk Minimization problem over a private database. Specifically, we propose Private-Public Stochastic Gradient Descent, which utilizes such public information to adjust parameters in differentially private stochastic gradient descent and fine-tunes the final result with model reuse. Our method keeps differential privacy for the private database, and empirical study validates its superiority compared with existing approaches.

Download Full-text

Ant Colony Optimization and Stochastic Gradient Descent

Artificial Life ◽

10.1162/106454602320184202 ◽

2002 ◽

Vol 8 (2) ◽

pp. 103-121 ◽

Cited By ~ 56

Author(s):

Nicolas Meuleau ◽

Marco Dorigo

Keyword(s):

Ant Colony Optimization ◽

Gradient Descent ◽

Stochastic Gradient ◽

Ant Colony ◽

Stochastic Gradient Descent ◽

The Family ◽

The Relationship

In this article, we study the relationship between the two techniques known as ant colony optimization (ACO) and stochastic gradient descent. More precisely, we show that some empirical ACO algorithms approximate stochastic gradient descent in the space of pheromones, and we propose an implementation of stochastic gradient descent that belongs to the family of ACO algorithms. We then use this insight to explore the mutual contributions of the two techniques.

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text

Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687946 ◽

2009 ◽

Cited By ~ 45

Author(s):

Yoshimasa Tsuruoka ◽

Jun'ichi Tsujii ◽

Sophia Ananiadou

Keyword(s):

Gradient Descent ◽

Linear Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Log Linear

Download Full-text

Drivetrain System Identification in a Multi-Task Learning Strategy using Partial Asynchronous Elastic Averaging Stochastic Gradient Descent

2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) ◽

10.1109/aim43001.2020.9158977 ◽

2020 ◽

Author(s):

Tom Staessens ◽

Guillaume Crevecoeur

Keyword(s):

System Identification ◽

Gradient Descent ◽

Learning Strategy ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Task Learning

Download Full-text

Revisiting the dynamic interactions between economic growth and environmental pollution in Italy: evidence from a gradient descent algorithm

Environmental Science and Pollution Research ◽

10.1007/s11356-021-14264-z ◽

2021 ◽

Author(s):

Marco Mele ◽

Cosimo Magazzino ◽

Nicolas Schneider ◽

Floriana Nicolai

Keyword(s):

Economic Growth ◽

Gradient Descent ◽

Statistical Data ◽

Stochastic Gradient Descent ◽

Policy Recommendations ◽

Dynamic Interactions ◽

Gradient Descent Algorithm ◽

The Relationship ◽

Main Strand ◽

Yearly Data

AbstractAlthough the literature on the relationship between economic growth and CO2 emissions is extensive, the use of machine learning (ML) tools remains seminal. In this paper, we assess this nexus for Italy using innovative algorithms, with yearly data for the 1960–2017 period. We develop three distinct models: the batch gradient descent (BGD), the stochastic gradient descent (SGD), and the multilayer perceptron (MLP). Despite the phase of low Italian economic growth, results reveal that CO2 emissions increased in the predicting model. Compared to the observed statistical data, the algorithm shows a correlation between low growth and higher CO2 increase, which contradicts the main strand of literature. Based on this outcome, adequate policy recommendations are provided.

Download Full-text

Optimized directed roadmap graph for multi-agent path finding using stochastic gradient descent

Proceedings of the 35th Annual ACM Symposium on Applied Computing ◽

10.1145/3341105.3373916 ◽

2020 ◽

Author(s):

Christian Henkel ◽

Marc Toussaint

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Path Finding ◽

Multi Agent

Download Full-text