scholarly journals Effect of barren plateaus on gradient-free optimization

Quantum ◽  
2021 ◽  
Vol 5 ◽  
pp. 558
Author(s):  
Andrew Arrasmith ◽  
M. Cerezo ◽  
Piotr Czarnik ◽  
Lukasz Cincio ◽  
Patrick J. Coles

Barren plateau landscapes correspond to gradients that vanish exponentially in the number of qubits. Such landscapes have been demonstrated for variational quantum algorithms and quantum neural networks with either deep circuits or global cost functions. For obvious reasons, it is expected that gradient-based optimizers will be significantly affected by barren plateaus. However, whether or not gradient-free optimizers are impacted is a topic of debate, with some arguing that gradient-free approaches are unaffected by barren plateaus. Here we show that, indeed, gradient-free optimizers do not solve the barren plateau problem. Our main result proves that cost function differences, which are the basis for making decisions in a gradient-free optimization, are exponentially suppressed in a barren plateau. Hence, without exponential precision, gradient-free optimizers will not make progress in the optimization. We numerically confirm this by training in a barren plateau with several gradient-free optimizers (Nelder-Mead, Powell, and COBYLA algorithms), and show that the numbers of shots required in the optimization grows exponentially with the number of qubits.

2019 ◽  
Author(s):  
Takuya Isomura ◽  
Karl Friston

AbstractThis work considers a class of biologically plausible cost functions for neural networks, where the same cost function is minimised by both neural activity and plasticity. We show that such cost functions can be cast as a variational bound on model evidence under an implicit generative model. Using generative models based on Markov decision processes (MDP), we show, analytically, that neural activity and plasticity perform Bayesian inference and learning, respectively, by maximising model evidence. Using mathematical and numerical analyses, we then confirm that biologically plausible cost functions—used in neural networks—correspond to variational free energy under some prior beliefs about the prevalence of latent states that generate inputs. These prior beliefs are determined by particular constants (i.e., thresholds) that define the cost function. This means that the Bayes optimal encoding of latent or hidden states is achieved when, and only when, the network’s implicit priors match the process that generates the inputs. Our results suggest that when a neural network minimises its cost function, it is implicitly minimising variational free energy under optimal or sub-optimal prior beliefs. This insight is potentially important because it suggests that any free parameter of a neural network’s cost function can itself be optimised—by minimisation with respect to variational free energy.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the years; some recent developments have proposed some initialization strategies based on the optimization of some meta loss cost functions. To simplify this challenge of hyperparameter initialization, this paper introduces a data-dependent deterministic initialization technique. The specific case of the squared exponential kernel-based GP regression problem is focused on, and the proposed technique brings novelty by being deterministic as opposed to random initialization, and fast (due to the deterministic nature) as opposed to optimizing some form of meta cost function as done in some previous works. Although global suitability of this initialization technique is not proven in this paper, as a preliminary study the technique’s effectiveness is demonstrated via several synthetic as well as real data-based nonlinear regression examples, hinting that the technique may have the effectiveness for broader usage.


2016 ◽  
Vol 113 (48) ◽  
pp. E7655-E7662 ◽  
Author(s):  
Carlo Baldassi ◽  
Christian Borgs ◽  
Jennifer T. Chayes ◽  
Alessandro Ingrosso ◽  
Carlo Lucibello ◽  
...  

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.


2020 ◽  
Vol 32 (11) ◽  
pp. 2085-2121
Author(s):  
Takuya Isomura ◽  
Karl Friston

This letter considers a class of biologically plausible cost functions for neural networks, where the same cost function is minimized by both neural activity and plasticity. We show that such cost functions can be cast as a variational bound on model evidence under an implicit generative model. Using generative models based on partially observed Markov decision processes (POMDP), we show that neural activity and plasticity perform Bayesian inference and learning, respectively, by maximizing model evidence. Using mathematical and numerical analyses, we establish the formal equivalence between neural network cost functions and variational free energy under some prior beliefs about latent states that generate inputs. These prior beliefs are determined by particular constants (e.g., thresholds) that define the cost function. This means that the Bayes optimal encoding of latent or hidden states is achieved when the network's implicit priors match the process that generates its inputs. This equivalence is potentially important because it suggests that any hyperparameter of a neural network can itself be optimized—by minimization with respect to variational free energy. Furthermore, it enables one to characterize a neural network formally, in terms of its prior beliefs.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the years; some recent developments have proposed some initialization strategies based on the optimization of some meta loss cost functions. To simplify this challenge of hyperparameter initialization, this paper introduces a data-dependent deterministic initialization technique. The specific case of the squared exponential kernel-based GP regression problem is focused on, and the proposed technique brings novelty by being deterministic as opposed to random initialization, and fast (due to the deterministic nature) as opposed to optimizing some form of meta cost function as done in some previous works. Although global suitability of this initialization technique is not proven in this paper, as a preliminary study the technique’s effectiveness is demonstrated via several synthetic as well as real data-based nonlinear regression examples, hinting that the technique may have the effectiveness for broader usage.


Quantum ◽  
2019 ◽  
Vol 3 ◽  
pp. 214 ◽  
Author(s):  
Edward Grant ◽  
Leonard Wossnig ◽  
Mateusz Ostaszewski ◽  
Marcello Benedetti

Parametrized quantum circuits initialized with random initial parameter values are characterized by barren plateaus where the gradient becomes exponentially small in the number of qubits. In this technical note we theoretically motivate and empirically validate an initialization strategy which can resolve the barren plateau problem for practical applications. The technique involves randomly selecting some of the initial parameter values, then choosing the remaining values so that the circuit is a sequence of shallow blocks that each evaluates to the identity. This initialization limits the effective depth of the circuits used to calculate the first parameter update so that they cannot be stuck in a barren plateau at the start of training. In turn, this makes some of the most compact ansätze usable in practice, which was not possible before even for rather basic problems. We show empirically that variational quantum eigensolvers and quantum neural networks initialized using this strategy can be trained using a gradient based method.


Author(s):  
Giovanni Acampora ◽  
Roberto Schiattarella

AbstractQuantum computers have become reality thanks to the effort of some majors in developing innovative technologies that enable the usage of quantum effects in computation, so as to pave the way towards the design of efficient quantum algorithms to use in different applications domains, from finance and chemistry to artificial and computational intelligence. However, there are still some technological limitations that do not allow a correct design of quantum algorithms, compromising the achievement of the so-called quantum advantage. Specifically, a major limitation in the design of a quantum algorithm is related to its proper mapping to a specific quantum processor so that the underlying physical constraints are satisfied. This hard problem, known as circuit mapping, is a critical task to face in quantum world, and it needs to be efficiently addressed to allow quantum computers to work correctly and productively. In order to bridge above gap, this paper introduces a very first circuit mapping approach based on deep neural networks, which opens a completely new scenario in which the correct execution of quantum algorithms is supported by classical machine learning techniques. As shown in experimental section, the proposed approach speeds up current state-of-the-art mapping algorithms when used on 5-qubits IBM Q processors, maintaining suitable mapping accuracy.


Author(s):  
Roque Corral ◽  
Fernando Gisbert

A methodology to minimize blade secondary losses by modifying turbine end-walls is presented. The optimization is addressed using a gradient-based method, where the computation of the gradient is performed using an adjoint code and the secondary kinetic energy is used as a cost function. The adjoint code is implemented on the basis of the discrete formulation of a parallel multigrid unstructured mesh Navier-Stokes solver. The results of the optimization of two end-walls of a low pressure turbine row are shown.


Sign in / Sign up

Export Citation Format

Share Document