scholarly journals Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes

2016 ◽  
Vol 113 (48) ◽  
pp. E7655-E7662 ◽  
Author(s):  
Carlo Baldassi ◽  
Christian Borgs ◽  
Jennifer T. Chayes ◽  
Alessandro Ingrosso ◽  
Carlo Lucibello ◽  
...  

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare—but extremely dense and accessible—regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

2020 ◽  
Vol 30 (6) ◽  
pp. 1645-1663
Author(s):  
Ömer Deniz Akyildiz ◽  
Dan Crisan ◽  
Joaquín Míguez

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques.


1995 ◽  
Vol 09 (18) ◽  
pp. 1165-1174
Author(s):  
DIEGO ZANIN ◽  
RICCARDO ZECCHINA

The learning and generalization properties of a modified learning cost function for Neural Networks models are discussed. We show that the introduction of a “cross-talk” term allows for an improvement of performance based on the control of the convergence subspaces of the network outputs. In the case of an unbiased distribution of binary patterns, we derive analytically the learning performance of the single layer architecture whereas we investigate numerically the generalization capabilities. An enhancement of computational performance is observed for multi-classification purposes, and also for imperfectly classified training sets.


Quantum ◽  
2021 ◽  
Vol 5 ◽  
pp. 558
Author(s):  
Andrew Arrasmith ◽  
M. Cerezo ◽  
Piotr Czarnik ◽  
Lukasz Cincio ◽  
Patrick J. Coles

Barren plateau landscapes correspond to gradients that vanish exponentially in the number of qubits. Such landscapes have been demonstrated for variational quantum algorithms and quantum neural networks with either deep circuits or global cost functions. For obvious reasons, it is expected that gradient-based optimizers will be significantly affected by barren plateaus. However, whether or not gradient-free optimizers are impacted is a topic of debate, with some arguing that gradient-free approaches are unaffected by barren plateaus. Here we show that, indeed, gradient-free optimizers do not solve the barren plateau problem. Our main result proves that cost function differences, which are the basis for making decisions in a gradient-free optimization, are exponentially suppressed in a barren plateau. Hence, without exponential precision, gradient-free optimizers will not make progress in the optimization. We numerically confirm this by training in a barren plateau with several gradient-free optimizers (Nelder-Mead, Powell, and COBYLA algorithms), and show that the numbers of shots required in the optimization grows exponentially with the number of qubits.


Author(s):  
Po Ting Lin ◽  
Wei-Hao Lu ◽  
Shu-Ping Lin

In the past few years, researchers have begun to investigate the existence of arbitrary uncertainties in the design optimization problems. Most traditional reliability-based design optimization (RBDO) methods transform the design space to the standard normal space for reliability analysis but may not work well when the random variables are arbitrarily distributed. It is because that the transformation to the standard normal space cannot be determined or the distribution type is unknown. The methods of Ensemble of Gaussian-based Reliability Analyses (EoGRA) and Ensemble of Gradient-based Transformed Reliability Analyses (EGTRA) have been developed to estimate the joint probability density function using the ensemble of kernel functions. EoGRA performs a series of Gaussian-based kernel reliability analyses and merged them together to compute the reliability of the design point. EGTRA transforms the design space to the single-variate design space toward the constraint gradient, where the kernel reliability analyses become much less costly. In this paper, a series of comprehensive investigations were performed to study the similarities and differences between EoGRA and EGTRA. The results showed that EGTRA performs accurate and effective reliability analyses for both linear and nonlinear problems. When the constraints are highly nonlinear, EGTRA may have little problem but still can be effective in terms of starting from deterministic optimal points. On the other hands, the sensitivity analyses of EoGRA may be ineffective when the random distribution is completely inside the feasible space or infeasible space. However, EoGRA can find acceptable design points when starting from deterministic optimal points. Moreover, EoGRA is capable of delivering estimated failure probability of each constraint during the optimization processes, which may be convenient for some applications.


2021 ◽  
Vol 174 ◽  
pp. 114711
Author(s):  
Tien Huu Do ◽  
Duc Minh Nguyen ◽  
Giannis Bekoulis ◽  
Adrian Munteanu ◽  
Nikos Deligiannis

Author(s):  
Xiumin Li ◽  
Qing Chen ◽  
Fangzheng Xue

In recent years, an increasing number of studies have demonstrated that networks in the brain can self-organize into a critical state where dynamics exhibit a mixture of ordered and disordered patterns. This critical branching phenomenon is termed neuronal avalanches. It has been hypothesized that the homeostatic level balanced between stability and plasticity of this critical state may be the optimal state for performing diverse neural computational tasks. However, the critical region for high performance is narrow and sensitive for spiking neural networks (SNNs). In this paper, we investigated the role of the critical state in neural computations based on liquid-state machines, a biologically plausible computational neural network model for real-time computing. The computational performance of an SNN when operating at the critical state and, in particular, with spike-timing-dependent plasticity for updating synaptic weights is investigated. The network is found to show the best computational performance when it is subjected to critical dynamic states. Moreover, the active-neuron-dominant structure refined from synaptic learning can remarkably enhance the robustness of the critical state and further improve computational accuracy. These results may have important implications in the modelling of spiking neural networks with optimal computational performance. This article is part of the themed issue ‘Mathematical methods in medicine: neuroscience, cardiology and pathology’.


2016 ◽  
Vol 25 (06) ◽  
pp. 1650033 ◽  
Author(s):  
Hossam Faris ◽  
Ibrahim Aljarah ◽  
Nailah Al-Madi ◽  
Seyedali Mirjalili

Evolutionary Neural Networks are proven to be beneficial in solving challenging datasets mainly due to the high local optima avoidance. Stochastic operators in such techniques reduce the probability of stagnation in local solutions and assist them to supersede conventional training algorithms such as Back Propagation (BP) and Levenberg-Marquardt (LM). According to the No-Free-Lunch (NFL), however, there is no optimization technique for solving all optimization problems. This means that a Neural Network trained by a new algorithm has the potential to solve a new set of problems or outperform the current techniques in solving existing problems. This motivates our attempts to investigate the efficiency of the recently proposed Evolutionary Algorithm called Lightning Search Algorithm (LSA) in training Neural Network for the first time in the literature. The LSA-based trainer is benchmarked on 16 popular medical diagnosis problems and compared to BP, LM, and 6 other evolutionary trainers. The quantitative and qualitative results show that the LSA algorithm is able to show not only better local solutions avoidance but also faster convergence speed compared to the other algorithms employed. In addition, the statistical test conducted proves that the LSA-based trainer is significantly superior in comparison with the current algorithms on the majority of datasets.


Sign in / Sign up

Export Citation Format

Share Document