Multi-timescale biological learning algorithms train spiking neuronal network motor control

Biological learning operates at multiple interlocking timescales, from long evolutionary stretches down to the relatively short time span of an individual's life. While each process has been simulated individually as a basic learning algorithm in the context of spiking neuronal networks (SNNs), the integration of the two has remained limited. In this study, we first train SNNs separately using individual model learning using spike-timing dependent reinforcement learning (STDP-RL) and evolutionary (EVOL) learning algorithms to solve the CartPole reinforcement learning (RL) control problem. We then develop an interleaved algorithm inspired by biological evolution that combines the EVOL and STDP-RL learning in sequence. We use the NEURON simulator with NetPyNE to create an SNN interfaced with the CartPole environment from OpenAI's Gym. In CartPole, the goal is to balance a vertical pole by moving left/right on a 1-D plane. Our SNN contains multiple populations of neurons organized in three layers: sensory layer, association/hidden layer, and motor layer, where neurons are connected by excitatory (AMPA/NMDA) and inhibitory (GABA) synapses. Association and motor layers contain one excitatory (E) population and two inhibitory (I) populations with different synaptic time constants. Each neuron is an event-based integrate-and-fire model with plastic connections between excitatory neurons. In our SNN, the environment activates sensory neurons tuned to specific features of the game state. We split the motor population into subsets representing each movement choice. The subset with more spiking over an interval determines the action. During STDP-RL, we supply intermediary evaluations (reward/punishment) of each action by judging the effectiveness of a move (e.g., moving the CartPole to a balanced position). During EVOL, updates consist of adding together many random perturbations of the connection weights. Each set of random perturbations is weighted by the total episodic reward it achieves when applied independently. We evaluate the performance of each algorithm after training and through the creation of sensory/motor action maps that delineate the network's transformation of sensory inputs into higher-order representations and eventual motor decisions. Both EVOL and STDP-RL training produce SNNs capable of moving the cart left and right and keeping the pole vertical. Compared to the STDP-RL and EVOL algorithms operating on their own, our interleaved training paradigm produced enhanced robustness in performance, with different strategies revealed through analysis of the sensory/motor mappings. Analysis of synaptic weight matrices also shows distributed vs clustered representations after the EVOL and STDP-RL algorithms, respectively. These weight differences also manifest as diffuse vs synchronized firing patterns. Our modeling opens up new capabilities for SNNs in RL and could serve as a testbed for neurobiologists aiming to understand multi-timescale learning mechanisms and dynamics in neuronal circuits.

Download Full-text

Significant Impact of Improved Machine Learning Algorithm in The Processes of Large Data Sets

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206133 ◽

2020 ◽

pp. 458-467

Author(s):

Virendra Tiwari ◽

Balendra Garg ◽

Uday Prakash Sharma

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Dynamic Environment ◽

Large Data ◽

Machine Learning Algorithms ◽

Streaming Data ◽

Machine Learning Techniques ◽

Machine Learning Algorithm ◽

Learning Mechanisms

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.

Download Full-text

Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity

Neural Computation ◽

10.1162/neco.2007.19.6.1468 ◽

2007 ◽

Vol 19 (6) ◽

pp. 1468-1502 ◽

Cited By ~ 159

Author(s):

Răzvan V. Florian

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Neural Model ◽

Spike Timing ◽

Spike Response ◽

Learning Rules ◽

Xor Problem ◽

Eligibility Trace ◽

Spike Response Model ◽

Intrinsic Plasticity

The persistent modification of synaptic efficacy as a function of the relative timing of pre- and postsynaptic spikes is a phenomenon known as spike-timing-dependent plasticity (STDP). Here we show that the modulation of STDP by a global reward signal leads to reinforcement learning. We first derive analytically learning rules involving reward-modulated spike-timing-dependent synaptic and intrinsic plasticity, by applying a reinforcement learning algorithm to the stochastic spike response model of spiking neurons. These rules have several features common to plasticity mechanisms experimentally found in the brain. We then demonstrate in simulations of networks of integrate-and-fire neurons the efficacy of two simple learning rules involving modulated STDP. One rule is a direct extension of the standard STDP model (modulated STDP), and the other one involves an eligibility trace stored at each synapse that keeps a decaying memory of the relationships between the recent pairs of pre- and postsynaptic spike pairs (modulated STDP with eligibility trace). This latter rule permits learning even if the reward signal is delayed. The proposed rules are able to solve the XOR problem with both rate coded and temporally coded input and to learn a target output firing-rate pattern. These learning rules are biologically plausible, may be used for training generic artificial spiking neural networks, regardless of the neural model used, and suggest the experimental investigation in animals of the existence of reward-modulated STDP.

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

Iterative SARSA: The Modified SARSA Algorithm for Finding the Optimal Path

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9429.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4333-4338

Keyword(s):

Reinforcement Learning ◽

Comparative Analysis ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimal Path ◽

Autonomous Mobile Robots ◽

Current Standard ◽

Q Learning ◽

Better Than

This paper presents a thorough comparative analysis of various reinforcement learning algorithms used by autonomous mobile robots for optimal path finding and, we propose a new algorithm called Iterative SARSA for the same. The main objective of the paper is to differentiate between the Q-learning and SARSA, and modify the latter. These algorithms use either the on-policy or off-policy methods of reinforcement learning. For the on-policy method, we have used the SARSA algorithm and for the off-policy method, the Q-learning algorithm has been used. These algorithms also have an impacting effect on finding the shortest path possible for the robot. Based on the results obtained, we have concluded how our algorithm is better than the current standard reinforcement learning algorithms

Download Full-text

A line follower robot implementation using Lego's Mindstorms Kit and Q-Learning

Acta Universitaria ◽

10.15174/au.2012.350 ◽

2012 ◽

Vol 22 ◽

pp. 113-118 ◽

Cited By ~ 1

Author(s):

Víctor Ricardo Cruz-Álvarez ◽

Enrique Hidalgo-Peña ◽

Hector-Gabriel Acosta-Mesa

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Programming Environment ◽

Q Learning

A common problem working with mobile robots is that programming phase could be a long, expensive and heavy process for programmers. The reinforcement learning algorithms offer one of the most general frameworks in learning subjects. This work presents an approach using the Q-Learning algorithm on a Lego robot in order for it to learn by itself how to follow a blackline drawn down on a white surface, using Matlab [5] as programming environment.

Download Full-text

DEEP-SARSA: A REINFORCEMENT LEARNING ALGORITHM FOR AUTONOMOUS NAVIGATION

International Journal of Modern Physics C ◽

10.1142/s0129183101002851 ◽

2001 ◽

Vol 12 (10) ◽

pp. 1513-1523 ◽

Cited By ~ 2

Author(s):

M. ANDRECUT ◽

M. K. ALI

Keyword(s):

Reinforcement Learning ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Robot Navigation ◽

Learning Algorithms ◽

Delayed Reinforcement ◽

Graph Searching ◽

Ill Posed ◽

Autonomous Robot Navigation ◽

Reinforcement Learning Algorithm

In this paper we discuss the application of reinforcement learning algorithms to the problem of autonomous robot navigation. We show that the autonomous navigation using the standard delayed reinforcement learning algorithms is an ill posed problem and we present a more efficient algorithm for which the convergence speed is greatly improved. The proposed algorithm (Deep-Sarsa) is based on a combination between the Depth-First Search (a graph searching algorithm) and Sarsa (a delayed reinforcement learning algorithm).

Download Full-text

Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Robotica ◽

10.1017/s0263574719001632 ◽

2019 ◽

Vol 38 (9) ◽

pp. 1558-1575

Author(s):

Vahid Azimirad ◽

Mohammad Fattahi Sani

Keyword(s):

Reinforcement Learning ◽

Motor Neurons ◽

Learning Algorithm ◽

Experimental Studies ◽

Spike Timing ◽

Mammalian Brain ◽

Behavioral Learning ◽

Experimental Implementation ◽

Robotic Tasks ◽

Izhikevich Model

SUMMARYIn this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text

Multi-Task Deep Reinforcement Learning for Continuous Action Control

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/461 ◽

2017 ◽

Cited By ~ 9

Author(s):

Zhaoyang Yang ◽

Kathryn Merrick ◽

Hussein Abbass ◽

Lianwen Jin

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Learning Algorithm ◽

Learning Algorithms ◽

Action Control ◽

Learning Performance ◽

Sensor Data ◽

Continuous Action ◽

Single Task ◽

Multiple Tasks

In this paper, we propose a deep reinforcement learning algorithm to learn multiple tasks concurrently. A new network architecture is proposed in the algorithm which reduces the number of parameters needed by more than 75% per task compared to typical single-task deep reinforcement learning algorithms. The proposed algorithm and network fuse images with sensor data and were tested with up to 12 movement-based control tasks on a simulated Pioneer 3AT robot equipped with a camera and range sensors. Results show that the proposed algorithm and network can learn skills that are as good as the skills learned by a comparable single-task learning algorithm. Results also show that learning performance is consistent even when the number of tasks and the number of constraints on the tasks increased.

Download Full-text

Solving problems of the oil and gas sector using machine learning algorithms

Acta Montanistica Slovaca ◽

10.46544/ams.v26i2.11 ◽

2021 ◽

pp. 327-337

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Oil And Gas ◽

Learning Algorithm ◽

Learning Algorithms ◽

Oil Production ◽

Machine Learning Algorithms ◽

Classification Problems ◽

Well Placement ◽

Oil And Gas Sector

The article describes the tasks of the oil and gas sector that can be solved by machine learning algorithms. These tasks include the study of the interference of wells, the classification of wells according to their technological and geophysical characteristics, the assessment of the effectiveness of ongoing and planned geological and technical measures, the forecast of oil production for individual wells and the total oil production for a group of wells, the forecast of the base level of oil production, the forecast of reservoir pressures and mapping. For each task, the features of building machine learning models and examples of input data are described. All of the above tasks are related to regression or classification problems. Of particular interest is the issue of well placement optimisation. Such a task cannot be directly solved using a single neural network. It can be attributed to the problems of optimal control theory, which are usually solved using dynamic programming methods. A paper is considered where field management and well placement are based on a reinforcement learning algorithm with Markov chains and Bellman's optimality equation. The disadvantages of the proposed approach are revealed. To eliminate them, a new approach of reinforcement learning based on the Alpha Zero algorithm is proposed. This algorithm is best known in the field of gaming artificial intelligence, beating the world champions in chess and Go. It combines the properties of dynamic and stochastic programming. The article discusses in detail the principle of operation of the algorithm and identifies common features that make it possible to consider this algorithm as a possible promising solution for the problem of optimising the placement of a grid of wells.

Download Full-text