The Eighty Five Percent Rule for Optimal Learning

AbstractResearchers and educators have long wrestled with the question of how best to teach their clients be they human, animal or machine. Here we focus on the role of a single variable, the difficulty of training, and examine its effect on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks, in which ambiguous stimuli must be sorted into one of two classes. For all of these gradient-descent based learning algorithms we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe human and animal learning.

Download Full-text

The Eighty Five Percent Rule for optimal learning

Nature Communications ◽

10.1038/s41467-019-12552-4 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 9

Author(s):

Robert C. Wilson ◽

Amitai Shenhav ◽

Mark Straccia ◽

Jonathan D. Cohen

Keyword(s):

Neural Networks ◽

Broad Class ◽

Binary Classification ◽

Learning Algorithms ◽

Stochastic Gradient Descent ◽

Sweet Spot ◽

Optimal Learning ◽

Rate Of Learning ◽

Classification Tasks

Abstract Researchers and educators have long wrestled with the question of how best to teach their clients be they humans, non-human animals or machines. Here, we examine the role of a single variable, the difficulty of training, on the rate of learning. In many situations we find that there is a sweet spot in which training is neither too easy nor too hard, and where learning progresses most quickly. We derive conditions for this sweet spot for a broad class of learning algorithms in the context of binary classification tasks. For all of these stochastic gradient-descent based learning algorithms, we find that the optimal error rate for training is around 15.87% or, conversely, that the optimal training accuracy is about 85%. We demonstrate the efficacy of this ‘Eighty Five Percent Rule’ for artificial neural networks used in AI and biologically plausible neural networks thought to describe animal learning.

Download Full-text

Wirtinger Calculus Based Gradient Descent and Levenberg-Marquardt Learning Algorithms in Complex-Valued Neural Networks

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24955-6_66 ◽

2011 ◽

pp. 550-559 ◽

Cited By ~ 16

Author(s):

Md. Faijul Amin ◽

Muhammad Ilias Amin ◽

A. Y. H. Al-Nuaimi ◽

Kazuyuki Murase

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithms ◽

Wirtinger Calculus ◽

Levenberg Marquardt ◽

Complex Valued

Download Full-text

CHARACTERIZING ONE-LAYER ASSOCIATIVE NEURAL NETWORKS WITH OPTIMAL NOISE-REDUCTION ABILITY

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001492000497 ◽

1992 ◽

Vol 06 (05) ◽

pp. 1009-1025 ◽

Cited By ~ 1

Author(s):

TAO WANG ◽

XIAOLIANG XING ◽

XINHUA ZHUANG

Keyword(s):

Neural Network ◽

Neural Networks ◽

Cost Function ◽

Noise Reduction ◽

Gradient Descent ◽

Storage Capacity ◽

Learning Algorithm ◽

Optimal Learning ◽

The Neural Network ◽

The Cost

In this paper, we describe an optimal learning algorithm for designing one-layer neural networks by means of global minimization. Taking the properties of a well-defined neural network into account, we derive a cost function to measure the goodness of the network quantitatively. The connection weights are determined by the gradient descent rule to minimize the cost function. The optimal learning algorithm is formed as either the unconstraint-based or the constraint-based minimization problem. It ensures the realization of each desired associative mapping with the best noise reduction ability in the sense of optimization. We also investigate the storage capacity of the neural network, the degree of noise reduction for a desired associative mapping, and the convergence of the learning algorithm in an analytic way. Finally, a large number of computer experimental results are presented.

Download Full-text

Building a Deep Learning Model to Generate Human Readable Text Using Recurrent Neural Networks and LSTM

10.21203/rs.3.rs-753724/v1 ◽

2021 ◽

Author(s):

Anasse HANAFI ◽

Mohammed BOUHORMA ◽

Lotfi ELAACHAK

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Binary Classification ◽

Generative Models ◽

Large Field ◽

Classification Problems ◽

Classification Tasks ◽

Multi Class Classification ◽

Deep Learning Model ◽

Readable Text

Abstract Machine learning (ML) is a large field of study that overlaps with and inherits ideas from many related fields such as artificial intelligence (AI). The main focus of the field is learning from previous experiences. Classification in ML is a supervised learning method, in which the computer program learns from the data given to it and make new classifications. There are many different types of classification tasks in ML and dedicated approaches to modeling that may be used for each. For example, classification predictive modeling involves assigning a class label to input samples, binary classification refers to predicting one of two classes and multi-class classification involves predicting one of more than two categories. Recurrent Neural Networks (RNNs) are very powerful sequence models for classification problems, however, in this paper, we will use RNNs as generative models, which means they can learn the sequences of a problem and then generate entirely a new sequence for the problem domain, with the hope to better control the output of the generated text, because it is not always possible to learn the exact distribution of the data either implicitly or explicitly.

Download Full-text

A Geometric Interpretation of Stochastic Gradient Descent Using Diffusion Metrics

Entropy ◽

10.3390/e22010101 ◽

2020 ◽

Vol 22 (1) ◽

pp. 101

Author(s):

Rita Fioresi ◽

Pratik Chaudhari ◽

Stefano Soatto

Keyword(s):

Neural Networks ◽

General Relativity ◽

Gradient Descent ◽

Deep Neural Networks ◽

Deterministic Model ◽

Geometric Interpretation ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Diffusion Matrix

This paper is a step towards developing a geometric understanding of a popular algorithm for training deep neural networks named stochastic gradient descent (SGD). We built upon a recent result which observed that the noise in SGD while training typical networks is highly non-isotropic. That motivated a deterministic model in which the trajectories of our dynamical systems are described via geodesics of a family of metrics arising from a certain diffusion matrix; namely, the covariance of the stochastic gradients in SGD. Our model is analogous to models in general relativity: the role of the electromagnetic field in the latter is played by the gradient of the loss function of a deep network in the former.

Download Full-text

A comparison of gradient ascent, gradient descent and genetic-algorithm-based artificial neural networks for the binary classification problem

Expert Systems ◽

10.1111/j.1468-0394.2007.00421.x ◽

2007 ◽

Vol 24 (2) ◽

pp. 65-86 ◽

Cited By ~ 11

Author(s):

Parag C. Pendharkar

Keyword(s):

Genetic Algorithm ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Binary Classification ◽

Classification Problem ◽

Gradient Ascent ◽

Binary Classification Problem ◽

Artificial Neural

Download Full-text

Analysis of gradient descent learning algorithms for multilayer feedforward neural networks

10.1109/cdc.1990.203921 ◽

1990 ◽

Cited By ~ 1

Author(s):

H. Guo ◽

S.B. Gelfand

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Learning Algorithms ◽

Feedforward Neural Networks

Download Full-text

A Deep Learning Model and a Dataset for Diagnosing Ophthalmology Diseases

Journal of Information & Knowledge Management ◽

10.1142/s0219649221500362 ◽

2021 ◽

pp. 2150036

Author(s):

Rehab M. Duwairi ◽

Saad A. Al-Zboon ◽

Rami A. Al-Dwairi ◽

Ahmad Obaidi

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Macular Hole ◽

Binary Classification ◽

Rapid Development ◽

Weighted Kappa ◽

Classification Tasks ◽

Medical Domain ◽

Multi Class Classification

The rapid development of artificial neural network techniques, especially convolutional neural networks, encouraged the researchers to adapt such techniques in the medical domain. Specifically, to provide assist tools to help the professionals in patients’ diagnosis. The main problem faced by the researchers in the medical domain is the lack of available annotated datasets which can be used to train and evaluate large and complex deep neural networks. In this paper, to assist researchers who are interested in applying deep learning techniques to aid the ophthalmologists in diagnosing eye-related diseases, we provide an optical coherence tomography dataset with collaboration with ophthalmologists from the King Abdullah University Hospital, Irbid, Jordan. This dataset consists of 21,991 OCT images distributed over seven eye diseases in addition to normal images (no disease), namely, Choroidal Neovascularisation, Full Macular Hole (Full Thickness), Partial Macular Hole, Central Serous Retinopathy, Geographic atrophy, Macular Retinal Oedema, and Vitreomacular Traction. To the best of our knowledge, this dataset is the largest of its kind, where images belong to actual patients from Jordan and the annotation was carried out by ophthalmologists. Two classification tasks were applied to this dataset; a binary classification to distinguish between images which belong to healthy eyes (normal) and images which belong to diseased eyes (abnormal). The second classification task is a multi-class classification, where the deep neural network is trained to distinguish between the seven diseases listed above in addition to the normal case. In both classification tasks, the U-Net neural network was modified and subsequently utilised. This modification adds an additional block of layers to the original U-Net model to become capable of handling classification as the original network is used for image segmentation. The results of the binary classification were equal to 84.90% and 69.50% as accuracy and quadratic weighted kappa, respectively. The results of the multi-class classification, by contrast, were equal to 63.68% and 66.06% as accuracy and quadratic weighted kappa, respectively.

Download Full-text