Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

Download Full-text

Sub-gradient based projection neural networks for non-differentiable optimization problems

2008 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2008.4620520 ◽

2008 ◽

Cited By ~ 1

Author(s):

Guo-Cheng Li ◽

Zhi-Ling Dong

Keyword(s):

Neural Networks ◽

Optimization Problems ◽

Gradient Based

Download Full-text

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Statistics and Computing ◽

10.1007/s11222-020-09964-4 ◽

2020 ◽

Vol 30 (6) ◽

pp. 1645-1663

Author(s):

Ömer Deniz Akyildiz ◽

Dan Crisan ◽

Joaquín Míguez

Keyword(s):

Monte Carlo ◽

Cost Function ◽

Global Minimum ◽

Sequential Monte Carlo ◽

Convergence Rates ◽

Optimization Problems ◽

Search Space ◽

Gradient Based ◽

Multiple Minima ◽

The Cost

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques.

Download Full-text

LEARNING INTERFERENCE REDUCTION IN NEURAL NETWORKS

Modern Physics Letters B ◽

10.1142/s0217984995001169 ◽

1995 ◽

Vol 09 (18) ◽

pp. 1165-1174

Author(s):

DIEGO ZANIN ◽

RICCARDO ZECCHINA

Keyword(s):

Neural Networks ◽

Cost Function ◽

Cross Talk ◽

Single Layer ◽

Learning Performance ◽

Interference Reduction ◽

Computational Performance ◽

Learning Interference ◽

Multi Classification ◽

Training Sets

The learning and generalization properties of a modified learning cost function for Neural Networks models are discussed. We show that the introduction of a “cross-talk” term allows for an improvement of performance based on the control of the convergence subspaces of the network outputs. In the case of an unbiased distribution of binary patterns, we derive analytically the learning performance of the single layer architecture whereas we investigate numerically the generalization capabilities. An enhancement of computational performance is observed for multi-classification purposes, and also for imperfectly classified training sets.

Download Full-text

Effect of barren plateaus on gradient-free optimization

Quantum ◽

10.22331/q-2021-10-05-558 ◽

2021 ◽

Vol 5 ◽

pp. 558

Author(s):

Andrew Arrasmith ◽

M. Cerezo ◽

Piotr Czarnik ◽

Lukasz Cincio ◽

Patrick J. Coles

Keyword(s):

Neural Networks ◽

Cost Function ◽

Quantum Algorithms ◽

Cost Functions ◽

Plateau Problem ◽

Global Cost ◽

Gradient Based

Barren plateau landscapes correspond to gradients that vanish exponentially in the number of qubits. Such landscapes have been demonstrated for variational quantum algorithms and quantum neural networks with either deep circuits or global cost functions. For obvious reasons, it is expected that gradient-based optimizers will be significantly affected by barren plateaus. However, whether or not gradient-free optimizers are impacted is a topic of debate, with some arguing that gradient-free approaches are unaffected by barren plateaus. Here we show that, indeed, gradient-free optimizers do not solve the barren plateau problem. Our main result proves that cost function differences, which are the basis for making decisions in a gradient-free optimization, are exponentially suppressed in a barren plateau. Hence, without exponential precision, gradient-free optimizers will not make progress in the optimization. We numerically confirm this by training in a barren plateau with several gradient-free optimizers (Nelder-Mead, Powell, and COBYLA algorithms), and show that the numbers of shots required in the optimization grows exponentially with the number of qubits.

Download Full-text

A Comprehensive Investigation of Ensembles of Gaussian-Based and Gradient-Based Transformed Reliability Analyses: When and How to Use Them

Volume 2A: 42nd Design Automation Conference ◽

10.1115/detc2016-59151 ◽

2016 ◽

Author(s):

Po Ting Lin ◽

Wei-Hao Lu ◽

Shu-Ping Lin

Keyword(s):

Design Optimization ◽

Design Space ◽

Optimization Problems ◽

Nonlinear Problems ◽

Kernel Functions ◽

Sensitivity Analyses ◽

Normal Space ◽

Highly Nonlinear ◽

Standard Normal ◽

Gradient Based

In the past few years, researchers have begun to investigate the existence of arbitrary uncertainties in the design optimization problems. Most traditional reliability-based design optimization (RBDO) methods transform the design space to the standard normal space for reliability analysis but may not work well when the random variables are arbitrarily distributed. It is because that the transformation to the standard normal space cannot be determined or the distribution type is unknown. The methods of Ensemble of Gaussian-based Reliability Analyses (EoGRA) and Ensemble of Gradient-based Transformed Reliability Analyses (EGTRA) have been developed to estimate the joint probability density function using the ensemble of kernel functions. EoGRA performs a series of Gaussian-based kernel reliability analyses and merged them together to compute the reliability of the design point. EGTRA transforms the design space to the single-variate design space toward the constraint gradient, where the kernel reliability analyses become much less costly. In this paper, a series of comprehensive investigations were performed to study the similarities and differences between EoGRA and EGTRA. The results showed that EGTRA performs accurate and effective reliability analyses for both linear and nonlinear problems. When the constraints are highly nonlinear, EGTRA may have little problem but still can be effective in terms of starting from deterministic optimal points. On the other hands, the sensitivity analyses of EoGRA may be ineffective when the random distribution is completely inside the feasible space or infeasible space. However, EoGRA can find acceptable design points when starting from deterministic optimal points. Moreover, EoGRA is capable of delivering estimated failure probability of each constraint during the optimization processes, which may be convenient for some applications.

Download Full-text

Graph convolutional neural networks with node transition probability-based message passing and DropNode regularization

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114711 ◽

2021 ◽

Vol 174 ◽

pp. 114711

Author(s):

Tien Huu Do ◽

Duc Minh Nguyen ◽

Giannis Bekoulis ◽

Adrian Munteanu ◽

Nikos Deligiannis

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Message Passing ◽

Transition Probability

Download Full-text

Improving Adversarial Attacks on Deep Neural Networks via Constricted Gradient-based Perturbations

Information Sciences ◽

10.1016/j.ins.2021.04.033 ◽

2021 ◽

Author(s):

Yatie Xiao ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Gradient Based

Download Full-text

Biological modelling of a computational spiking neural network with neuronal avalanches

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2016.0286 ◽

2017 ◽

Vol 375 (2096) ◽

pp. 20160286 ◽

Cited By ~ 7

Author(s):

Xiumin Li ◽

Qing Chen ◽

Fangzheng Xue

Keyword(s):

Neural Network ◽

Neural Networks ◽

Critical State ◽

Spike Timing ◽

Spiking Neural Networks ◽

Mathematical Methods ◽

Computational Accuracy ◽

Active Neuron ◽

Computational Performance ◽

Neuronal Avalanches

In recent years, an increasing number of studies have demonstrated that networks in the brain can self-organize into a critical state where dynamics exhibit a mixture of ordered and disordered patterns. This critical branching phenomenon is termed neuronal avalanches. It has been hypothesized that the homeostatic level balanced between stability and plasticity of this critical state may be the optimal state for performing diverse neural computational tasks. However, the critical region for high performance is narrow and sensitive for spiking neural networks (SNNs). In this paper, we investigated the role of the critical state in neural computations based on liquid-state machines, a biologically plausible computational neural network model for real-time computing. The computational performance of an SNN when operating at the critical state and, in particular, with spike-timing-dependent plasticity for updating synaptic weights is investigated. The network is found to show the best computational performance when it is subjected to critical dynamic states. Moreover, the active-neuron-dominant structure refined from synaptic learning can remarkably enhance the robustness of the critical state and further improve computational accuracy. These results may have important implications in the modelling of spiking neural networks with optimal computational performance. This article is part of the themed issue ‘Mathematical methods in medicine: neuroscience, cardiology and pathology’.

Download Full-text

An improved dynamic structure-based neural networks determination approaches to simulation optimization problems

Neural Computing and Applications ◽

10.1007/s00521-010-0348-x ◽

2010 ◽

Vol 19 (6) ◽

pp. 883-901 ◽

Cited By ~ 2

Author(s):

Zheng Jun ◽

Tan Yu-An ◽

Zhang Xue-Lan ◽

Lu Jun

Keyword(s):

Neural Networks ◽

Optimization Problems ◽

Simulation Optimization ◽

Dynamic Structure

Download Full-text

Optimizing the Learning Process of Feedforward Neural Networks Using Lightning Search Algorithm

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213016500330 ◽

2016 ◽

Vol 25 (06) ◽

pp. 1650033 ◽

Cited By ~ 26

Author(s):

Hossam Faris ◽

Ibrahim Aljarah ◽

Nailah Al-Madi ◽

Seyedali Mirjalili

Keyword(s):

Neural Network ◽

Neural Networks ◽

Optimization Problems ◽

Search Algorithm ◽

Optimization Technique ◽

Back Propagation ◽

Feedforward Neural Networks ◽

Training Algorithms ◽

Local Optima ◽

Local Solutions

Evolutionary Neural Networks are proven to be beneficial in solving challenging datasets mainly due to the high local optima avoidance. Stochastic operators in such techniques reduce the probability of stagnation in local solutions and assist them to supersede conventional training algorithms such as Back Propagation (BP) and Levenberg-Marquardt (LM). According to the No-Free-Lunch (NFL), however, there is no optimization technique for solving all optimization problems. This means that a Neural Network trained by a new algorithm has the potential to solve a new set of problems or outperform the current techniques in solving existing problems. This motivates our attempts to investigate the efficiency of the recently proposed Evolutionary Algorithm called Lightning Search Algorithm (LSA) in training Neural Network for the first time in the literature. The LSA-based trainer is benchmarked on 16 popular medical diagnosis problems and compared to BP, LM, and 6 other evolutionary trainers. The quantitative and qualitative results show that the LSA algorithm is able to show not only better local solutions avoidance but also faster convergence speed compared to the other algorithms employed. In addition, the statistical test conducted proves that the LSA-based trainer is significantly superior in comparison with the current algorithms on the majority of datasets.

Download Full-text