Stochastic gradient descent for hybrid quantum-classical optimization

Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of stochastic gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with k measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of k. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases k-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain ``doubly stochastic'' gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

Download Full-text

On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/556 ◽

2019 ◽

Author(s):

Yi Xu ◽

Zhuoning Yuan ◽

Sen Yang ◽

Rong Jin ◽

Tianbao Yang

Keyword(s):

Convex Optimization ◽

Gradient Descent ◽

Optimization Problems ◽

Upper Bounds ◽

Convex Minimization ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

First Order ◽

Convex Optimization Problems ◽

Learning Tasks

Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine learning tasks. However, it has not been analyzed for non-convex minimization and there still remains a gap between the theory and the practice. In this paper, we analyze gradient descent and stochastic gradient descent with extrapolation for finding an approximate first-order stationary point in smooth non-convex optimization problems. Our convergence upper bounds show that the algorithms with extrapolation can be accelerated than without extrapolation.

Download Full-text

Deep Convolutional Spiking Neural Networks for Image Classification

10.18122/td.1782.boisestate ◽

2021 ◽

Author(s):

Ruthvik Vaila

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Spiking Neural Networks ◽

Stochastic Gradient Descent ◽

Data Set ◽

Learning Capabilities ◽

Artificial Neural

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.

Download Full-text

An Adaptive Optimizer for Measurement-Frugal Variational Algorithms

Quantum ◽

10.22331/q-2020-05-11-263 ◽

2020 ◽

Vol 4 ◽

pp. 263 ◽

Cited By ~ 6

Author(s):

Jonas M. Kübler ◽

Andrew Arrasmith ◽

Lukasz Cincio ◽

Patrick J. Coles

Keyword(s):

Quantum Computing ◽

Partial Derivative ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Near Term ◽

Variational Algorithms

Variational hybrid quantum-classical algorithms (VHQCAs) have the potential to be useful in the era of near-term quantum computing. However, recently there has been concern regarding the number of measurements needed for convergence of VHQCAs. Here, we address this concern by investigating the classical optimizer in VHQCAs. We introduce a novel optimizer called individual Coupled Adaptive Number of Shots (iCANS). This adaptive optimizer frugally selects the number of measurements (i.e., number of shots) both for a given iteration and for a given partial derivative in a stochastic gradient descent. We numerically simulate the performance of iCANS for the variational quantum eigensolver and for variational quantum compiling, with and without noise. In all cases, and especially in the noisy case, iCANS tends to out-perform state-of-the-art optimizers for VHQCAs. We therefore believe this adaptive optimizer will be useful for realistic VHQCA implementations, where the number of measurements is limited.

Download Full-text

Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent

Electronics ◽

10.3390/electronics8060631 ◽

2019 ◽

Vol 8 (6) ◽

pp. 631 ◽

Cited By ~ 7

Author(s):

Felipe F. Lopes ◽

João Canas Ferreira ◽

Marcelo A. C. Fernandes

Keyword(s):

Support Vector Machines ◽

Gradient Descent ◽

Parallel Implementation ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Data Set ◽

Viable Solution ◽

Vector Machines ◽

Field Programmable

Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support Vector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that reason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better option for massive data mining applications. Furthermore, even with the use of SGD, training times can become extremely large depending on the data set. For this reason, accelerators such as Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in hardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed FPGA implementation of an SVM with SGD presents speedups of more than 10,000× relative to software implementations running on a quad-core processor and up to 319× compared to state-of-the-art FPGA implementations while requiring fewer hardware resources. The results show that the proposed architecture is a viable solution for highly demanding problems such as those present in big data analysis.

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text

Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687946 ◽

2009 ◽

Cited By ~ 45

Author(s):

Yoshimasa Tsuruoka ◽

Jun'ichi Tsujii ◽

Sophia Ananiadou

Keyword(s):

Gradient Descent ◽

Linear Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Log Linear

Download Full-text

Cloning Safe Driving Behavior for Self-Driving Cars using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181106160002 ◽

2019 ◽

Vol 12 (2) ◽

pp. 120-127 ◽

Cited By ~ 5

Author(s):

Wael Farag

Keyword(s):

Gradient Descent ◽

Autonomous Driving ◽

Driving Behavior ◽

Training Data ◽

Stochastic Gradient Descent ◽

Data Set ◽

Safe Driving ◽

Processing Pipeline ◽

Self Driving Cars ◽

And Training

Background: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies. The training data is collected from a front-facing camera and the steering commands issued by an experienced driver driving in traffic as well as urban roads. Methods: This data is then used to train the proposed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after extensive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. Results: The paper goes through the development and training process in details and shows the image processing pipeline harnessed in the development. Conclusion: The proposed approach proved successful in cloning the driving behavior embedded in the training data set after extensive simulations.

Download Full-text