Effect of barren plateaus on gradient-free optimization

Reverse engineering neural networks to characterise their cost functions

10.1101/654467 ◽

2019 ◽

Author(s):

Takuya Isomura ◽

Karl Friston

Keyword(s):

Neural Networks ◽

Free Energy ◽

Cost Function ◽

Neural Activity ◽

Generative Models ◽

Cost Functions ◽

Prior Beliefs ◽

Variational Free Energy ◽

Markov Decision ◽

Model Evidence

AbstractThis work considers a class of biologically plausible cost functions for neural networks, where the same cost function is minimised by both neural activity and plasticity. We show that such cost functions can be cast as a variational bound on model evidence under an implicit generative model. Using generative models based on Markov decision processes (MDP), we show, analytically, that neural activity and plasticity perform Bayesian inference and learning, respectively, by maximising model evidence. Using mathematical and numerical analyses, we then confirm that biologically plausible cost functions—used in neural networks—correspond to variational free energy under some prior beliefs about the prevalence of latent states that generate inputs. These prior beliefs are determined by particular constants (i.e., thresholds) that define the cost function. This means that the Bayes optimal encoding of latent or hidden states is achieved when, and only when, the network’s implicit priors match the process that generates the inputs. Our results suggest that when a neural network minimises its cost function, it is implicitly minimising variational free energy under optimal or sub-optimal prior beliefs. This insight is potentially important because it suggests that any free parameter of a neural network’s cost function can itself be optimised—by minimisation with respect to variational free energy.

Download Full-text

Hyper-Parameter Initialization for Squared Exponential Kernel-based Gaussian Process Regression

10.36227/techrxiv.12149901 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Cost Function ◽

Initial Guess ◽

Machine Learning Techniques ◽

Cost Functions ◽

Regression Problem ◽

Exponential Kernel ◽

Learning Techniques ◽

Gradient Based

Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the years; some recent developments have proposed some initialization strategies based on the optimization of some meta loss cost functions. To simplify this challenge of hyperparameter initialization, this paper introduces a data-dependent deterministic initialization technique. The speciﬁc case of the squared exponential kernel-based GP regression problem is focused on, and the proposed technique brings novelty by being deterministic as opposed to random initialization, and fast (due to the deterministic nature) as opposed to optimizing some form of meta cost function as done in some previous works. Although global suitability of this initialization technique is not proven in this paper, as a preliminary study the technique’s effectiveness is demonstrated via several synthetic as well as real data-based nonlinear regression examples, hinting that the technique may have the effectiveness for broader usage.

Download Full-text

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1608103113 ◽

2016 ◽

Vol 113 (48) ◽

pp. E7655-E7662 ◽

Cited By ~ 42

Author(s):

Carlo Baldassi ◽

Christian Borgs ◽

Jennifer T. Chayes ◽

Alessandro Ingrosso ◽

Carlo Lucibello ◽

...

Keyword(s):

Neural Networks ◽

Cost Function ◽

Message Passing ◽

Optimization Problems ◽

Difficult Case ◽

Similar Reasoning ◽

Learning From Data ◽

Computational Performance ◽

Gradient Based ◽

Network Weight

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

Download Full-text

Reverse-Engineering Neural Networks to Characterize Their Cost Functions

Neural Computation ◽

10.1162/neco_a_01315 ◽

2020 ◽

Vol 32 (11) ◽

pp. 2085-2121

Author(s):

Takuya Isomura ◽

Karl Friston

Keyword(s):

Neural Network ◽

Neural Networks ◽

Free Energy ◽

Cost Function ◽

Neural Activity ◽

Generative Models ◽

Cost Functions ◽

Prior Beliefs ◽

Variational Free Energy ◽

Model Evidence

This letter considers a class of biologically plausible cost functions for neural networks, where the same cost function is minimized by both neural activity and plasticity. We show that such cost functions can be cast as a variational bound on model evidence under an implicit generative model. Using generative models based on partially observed Markov decision processes (POMDP), we show that neural activity and plasticity perform Bayesian inference and learning, respectively, by maximizing model evidence. Using mathematical and numerical analyses, we establish the formal equivalence between neural network cost functions and variational free energy under some prior beliefs about latent states that generate inputs. These prior beliefs are determined by particular constants (e.g., thresholds) that define the cost function. This means that the Bayes optimal encoding of latent or hidden states is achieved when the network's implicit priors match the process that generates its inputs. This equivalence is potentially important because it suggests that any hyperparameter of a neural network can itself be optimized—by minimization with respect to variational free energy. Furthermore, it enables one to characterize a neural network formally, in terms of its prior beliefs.

Download Full-text

Hyper-Parameter Initialization for Squared Exponential Kernel-based Gaussian Process Regression

10.36227/techrxiv.12149901.v1 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Cost Function ◽

Initial Guess ◽

Machine Learning Techniques ◽

Cost Functions ◽

Regression Problem ◽

Exponential Kernel ◽

Learning Techniques ◽

Gradient Based

Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the years; some recent developments have proposed some initialization strategies based on the optimization of some meta loss cost functions. To simplify this challenge of hyperparameter initialization, this paper introduces a data-dependent deterministic initialization technique. The speciﬁc case of the squared exponential kernel-based GP regression problem is focused on, and the proposed technique brings novelty by being deterministic as opposed to random initialization, and fast (due to the deterministic nature) as opposed to optimizing some form of meta cost function as done in some previous works. Although global suitability of this initialization technique is not proven in this paper, as a preliminary study the technique’s effectiveness is demonstrated via several synthetic as well as real data-based nonlinear regression examples, hinting that the technique may have the effectiveness for broader usage.

Download Full-text

An initialization strategy for addressing barren plateaus in parametrized quantum circuits

Quantum ◽

10.22331/q-2019-12-09-214 ◽

2019 ◽

Vol 3 ◽

pp. 214 ◽

Cited By ~ 10

Author(s):

Edward Grant ◽

Leonard Wossnig ◽

Mateusz Ostaszewski ◽

Marcello Benedetti

Keyword(s):

Neural Networks ◽

Technical Note ◽

Quantum Circuits ◽

Initial Parameter ◽

Plateau Problem ◽

Practical Applications ◽

Gradient Based ◽

Effective Depth ◽

Parameter Values ◽

Initialization Strategy

Parametrized quantum circuits initialized with random initial parameter values are characterized by barren plateaus where the gradient becomes exponentially small in the number of qubits. In this technical note we theoretically motivate and empirically validate an initialization strategy which can resolve the barren plateau problem for practical applications. The technique involves randomly selecting some of the initial parameter values, then choosing the remaining values so that the circuit is a sequence of shallow blocks that each evaluates to the identity. This initialization limits the effective depth of the circuits used to calculate the first parameter update so that they cannot be stuck in a barren plateau at the start of training. In turn, this makes some of the most compact ansätze usable in practice, which was not possible before even for rather basic problems. We show empirically that variational quantum eigensolvers and quantum neural networks initialized using this strategy can be trained using a gradient based method.

Download Full-text

Improving Adversarial Attacks on Deep Neural Networks via Constricted Gradient-based Perturbations

Information Sciences ◽

10.1016/j.ins.2021.04.033 ◽

2021 ◽

Author(s):

Yatie Xiao ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Gradient Based

Download Full-text

Deep neural networks for quantum circuit mapping

Neural Computing and Applications ◽

10.1007/s00521-021-06009-3 ◽

2021 ◽

Author(s):

Giovanni Acampora ◽

Roberto Schiattarella

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Quantum Algorithms ◽

Quantum Algorithm ◽

Quantum Circuit ◽

Machine Learning Techniques ◽

Quantum Computers ◽

Physical Constraints ◽

Proper Mapping ◽

Correct Execution

AbstractQuantum computers have become reality thanks to the effort of some majors in developing innovative technologies that enable the usage of quantum effects in computation, so as to pave the way towards the design of efficient quantum algorithms to use in different applications domains, from finance and chemistry to artificial and computational intelligence. However, there are still some technological limitations that do not allow a correct design of quantum algorithms, compromising the achievement of the so-called quantum advantage. Specifically, a major limitation in the design of a quantum algorithm is related to its proper mapping to a specific quantum processor so that the underlying physical constraints are satisfied. This hard problem, known as circuit mapping, is a critical task to face in quantum world, and it needs to be efficiently addressed to allow quantum computers to work correctly and productively. In order to bridge above gap, this paper introduces a very first circuit mapping approach based on deep neural networks, which opens a completely new scenario in which the correct execution of quantum algorithms is supported by classical machine learning techniques. As shown in experimental section, the proposed approach speeds up current state-of-the-art mapping algorithms when used on 5-qubits IBM Q processors, maintaining suitable mapping accuracy.

Download Full-text

Stochastic gradient based adaptive filtering algorithms with general cost functions

Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers ◽

10.1109/acssc.1996.600942 ◽

2002 ◽

Author(s):

R. Sharma ◽

W.A. Sethares ◽

J.A. Bucklew

Keyword(s):

Adaptive Filtering ◽

Stochastic Gradient ◽

Cost Functions ◽

Filtering Algorithms ◽

Gradient Based ◽

General Cost Functions

Download Full-text

Profiled End-Wall Design Using an Adjoint Navier-Stokes Solver

Volume 6: Turbomachinery, Parts A and B ◽

10.1115/gt2006-90650 ◽

2006 ◽

Cited By ~ 3

Author(s):

Roque Corral ◽

Fernando Gisbert

Keyword(s):

Kinetic Energy ◽

Cost Function ◽

Unstructured Mesh ◽

Low Pressure ◽

Navier Stokes ◽

Low Pressure Turbine ◽

Gradient Based ◽

Parallel Multigrid ◽

End Wall ◽

Discrete Formulation

A methodology to minimize blade secondary losses by modifying turbine end-walls is presented. The optimization is addressed using a gradient-based method, where the computation of the gradient is performed using an adjoint code and the secondary kinetic energy is used as a cost function. The adjoint code is implemented on the basis of the discrete formulation of a parallel multigrid unstructured mesh Navier-Stokes solver. The results of the optimization of two end-walls of a low pressure turbine row are shown.

Download Full-text