Unsupervised Learning and Clustered Connectivity Enhance Reinforcement Learning in Spiking Neural Networks

Reinforcement learning is a paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. To partition an environment into discrete states, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields specified ad hoc by the researcher. This is problematic as a model for how an organism can learn appropriate behavioral sequences in unknown environments, as it fails to account for the unsupervised and self-organized nature of the required representations. Additionally, this approach presupposes knowledge on the part of the researcher on how the environment should be partitioned and represented and scales poorly with the size or complexity of the environment. To address these issues and gain insights into how the brain generates its own task-relevant mappings, we propose a learning architecture that combines unsupervised learning on the input projections with biologically motivated clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce clearly distinguishable activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.

Download Full-text

Unsupervised learning and clustered connectivity enhance reinforcement learning in spiking neural networks

10.1101/2020.03.17.995563 ◽

2020 ◽

Author(s):

Philipp Weidel ◽

Renato Duarte ◽

Abigail Morrison

Keyword(s):

Reinforcement Learning ◽

Unsupervised Learning ◽

Activity Patterns ◽

Receptive Fields ◽

Place Cells ◽

Spiking Neural Networks ◽

Complex Environments ◽

Proposed Model ◽

Clustered Network ◽

Better Than

ABSTRACTReinforcement learning is a learning paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. However, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields. This is problematic, as such approaches either scale badly as the environment grows in size or complexity, or presuppose knowledge on how the environment should be partitioned. Here, we propose a learning architecture that combines unsupervised learning on the input projections with clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce task-relevant activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

The Role of Executive Function in Shaping Reinforcement Learning

10.31234/osf.io/9cvw3 ◽

2020 ◽

Cited By ~ 1

Author(s):

Milena Rmus ◽

Samuel McDougle ◽

Anne Collins

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Instrumental Behavior ◽

Complex Environments ◽

Neural Computations ◽

Human Decision ◽

Brain And Behavior ◽

And Behavior ◽

The Brain

Reinforcement learning (RL) models have advanced our understanding of how animals learn and make decisions, and how the brain supports some aspects of learning. However, the neural computations that are explained by RL algorithms fall short of explaining many sophisticated aspects of human decision making, including the generalization of learned information, one-shot learning, and the synthesis of task information in complex environments. Instead, these aspects of instrumental behavior are assumed to be supported by the brain’s executive functions (EF). We review recent findings that highlight the importance of EF in learning. Specifically, we advance the theory that EF sets the stage for canonical RL computations in the brain, providing inputs that broaden their flexibility and applicability. Our theory has important implications for how to interpret RL computations in the brain and behavior.

Download Full-text

Encoding and Decoding Dynamic Sensory Signals with Recurrent Neural Networks: An Application of Conceptors to Birdsongs

10.1101/131052 ◽

2017 ◽

Author(s):

Richard Gast ◽

Patrick Faion ◽

Kai Standvoss ◽

Andrea Suckro ◽

Brian Lewis ◽

...

Keyword(s):

Hierarchical Structure ◽

Activity Patterns ◽

Receptive Fields ◽

Model Performance ◽

Relevant Information ◽

Learning Approaches ◽

Sensory Stimuli ◽

Dynamic Stimuli ◽

Neural Populations ◽

The Brain

AbstractIn a constantly changing environment the brain has to make sense of dynamic patterns of sensory input. These patterns can refer to stimuli with a complex and hierarchical structure which has to be inferred from the neural activity of sensory areas in the brain. Such areas were found to be locally recurrently structured as well as hierarchically organized within a given sensory domain. While there is a great body of work identifying neural representations of various sensory stimuli at different hierarchical levels, less is known about the nature of these representations. In this work, we propose a model that describes a way to encode and decode sensory stimuli based on the activity patterns of multiple, recurrently connected neural populations with different receptive fields. We demonstrate the ability of our model to learn and recognize complex, dynamic stimuli using birdsongs as exemplary data. These birdsongs can be described by a 2-level hierarchical structure, i.e. as sequences of syllables. Our model matches this hierarchy by learning single syllables on a first level and sequences of these syllables on a top level. Model performance on recognition tasks is investigated for an increasing number of syllables or songs to recognize and compared to state-of-the-art machine learning approaches. Finally, we discuss the implications of our model for the understanding of sensory pattern processing in the brain. We conclude that the employed encoding and decoding mechanisms might capture general computational principles of how the brain extracts relevant information from the activity of recurrently connected neural populations.

Download Full-text

Multi-UAV Navigation for Partially Observable Communication Coverage by Graph Reinforcement Learning

10.36227/techrxiv.15048273.v1 ◽

2021 ◽

Author(s):

Zhenhui Ye

Keyword(s):

Reinforcement Learning ◽

Ad Hoc Network ◽

Ad Hoc ◽

State Of The Art ◽

Target Area ◽

Proposed Model ◽

Decentralized Execution ◽

Partially Observable ◽

Gated Recurrent Unit ◽

Uav Navigation

<div>In this paper, we aim to design a deep reinforcement learning(DRL) based control solution to navigate a swarm of unmanned aerial vehicles (UAVs) to fly around an unexplored target area under provide optimal communication coverage for the ground mobile users. Compared with existing DRL-based solutions that mainly solve the problem with global observation and centralized training, a practical and efficient Decentralized Training and Decentralized Execution(DTDE) framework is desirable to train and deploy each UAV in a distributed manner. To this end, we propose a novel DRL approach named Deep Recurrent Graph Network(DRGN) that makes use of Graph Attention Network-based Flying Ad-hoc Network(GAT-FANET) to achieve inter-UAV communications and Gated Recurrent Unit (GRU) to record historical information. We conducted extensive experiments to define an appropriate structure for GAT-FANET and examine the performance of DRGN. The simulation results show that the proposed model outperforms four state-of-the-art DRL-based approaches and four heuristic baselines, and demonstrate the scalability, transferability, robustness, and interpretability of DRGN.</div>

Download Full-text

Learn to Steer through Deep Reinforcement Learning

Sensors ◽

10.3390/s18113650 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3650 ◽

Cited By ~ 13

Author(s):

Keyu Wu ◽

Mahdi Esfahani ◽

Shenghai Yuan ◽

Han Wang

Keyword(s):

Reinforcement Learning ◽

Real World ◽

High Efficiency ◽

Feature Representation ◽

Fine Tuning ◽

Computational Time ◽

Complex Environments ◽

Learning Efficiency ◽

Depth Images ◽

Proposed Model

It is crucial for robots to autonomously steer in complex environments safely without colliding with any obstacles. Compared to conventional methods, deep reinforcement learning-based methods are able to learn from past experiences automatically and enhance the generalization capability to cope with unseen circumstances. Therefore, we propose an end-to-end deep reinforcement learning algorithm in this paper to improve the performance of autonomous steering in complex environments. By embedding a branching noisy dueling architecture, the proposed model is capable of deriving steering commands directly from raw depth images with high efficiency. Specifically, our learning-based approach extracts the feature representation from depth inputs through convolutional neural networks and maps it to both linear and angular velocity commands simultaneously through different streams of the network. Moreover, the training framework is also meticulously designed to improve the learning efficiency and effectiveness. It is worth noting that the developed system is readily transferable from virtual training scenarios to real-world deployment without any fine-tuning by utilizing depth images. The proposed method is evaluated and compared with a series of baseline methods in various virtual environments. Experimental results demonstrate the superiority of the proposed model in terms of average reward, learning efficiency, success rate as well as computational time. Moreover, a variety of real-world experiments are also conducted which reveal the high adaptability of our model to both static and dynamic obstacle-cluttered environments. A video of our experiments is available at https://youtu.be/yixnmFXIKf4 and http://v.youku.com/vshow/idXMzg1ODYwMzM5Ng.

Download Full-text

Multi-UAV Navigation for Partially Observable Communication Coverage by Graph Reinforcement Learning

10.36227/techrxiv.15048273 ◽

2021 ◽

Author(s):

Zhenhui Ye

Keyword(s):

Reinforcement Learning ◽

Ad Hoc Network ◽

Ad Hoc ◽

State Of The Art ◽

Target Area ◽

Proposed Model ◽

Decentralized Execution ◽

Partially Observable ◽

Gated Recurrent Unit ◽

Uav Navigation

Download Full-text

Deep Learning Cluster Structures for Management Decisions: The Digital CEO

Sensors ◽

10.3390/s18103327 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3327

Author(s):

Will Serrano

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Cyber Security ◽

Learning Algorithm ◽

Cluster Structure ◽

Long Term Memory ◽

Management Decisions ◽

Proposed Model ◽

The Brain ◽

The Way

This paper presents a Deep Learning (DL) Cluster Structure for Management Decisions that emulates the way the brain learns and makes choices by combining different learning algorithms. The proposed model is based on the Random Neural Network (RNN) Reinforcement Learning for fast local decisions and Deep Learning for long-term memory. The Deep Learning Cluster Structure has been applied in the Cognitive Packet Network (CPN) for routing decisions based on Quality of Service (QoS) metrics (Delay, Loss and Bandwidth) and Cyber Security keys (User, Packet and Node) which includes a layer of DL management clusters (QoS, Cyber and CEO) that take the final routing decision based on the inputs from the DL QoS clusters and RNN Reinforcement Learning algorithm. The model has been validated under different network sizes and scenarios. The simulation results are promising; the presented DL Cluster management structure as a mechanism to transmit, learn and make packet routing decisions is a step closer to emulate the way the brain transmits information, learns the environment and takes decisions.

Download Full-text

A COMPLEX SIS SPREADING MODEL IN AD HOC NETWORKS WITH REDUCED COMMUNICATION EFFORTS

Advances in Complex Systems ◽

10.1142/s0219525920500095 ◽

2020 ◽

pp. 2050009

Author(s):

IMRE VARGA

Keyword(s):

Ad Hoc Network ◽

Ad Hoc ◽

External Control ◽

Agent Based ◽

Scale Free ◽

Spreading Model ◽

Self Organized ◽

Proposed Model ◽

State Changes ◽

Hoc Networks

In this work, spreading of information is investigated in a vehicular ad hoc network (VANET) by agent-based simulation. The proposed model is complex, containing two major states and some minor substates and additionally both reversible and irreversible state changes. According to our results, the spreading is really fast and widespread, the system can be in an up-to-date phase without external control. We show a very simple way to keep the system in this up-to-date status reducing dramatically the communication costs. Nevertheless, the topology of both the instantaneous and the aggregate communication network is also analyzed in this self-organized system and scale-free networks are found.

Download Full-text

Ultrastructure of developing visual cortical synapses in the cat superior colliculus

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100085083 ◽

1991 ◽

Vol 49 ◽

pp. 156-157

Author(s):

Caroline A. Miller ◽

Laura L. Bruce

Keyword(s):

Superior Colliculus ◽

Phosphate Buffer ◽

Receptive Fields ◽

Visual Cortical Area ◽

Cortical Synapses ◽

Visual Cortical ◽

And Function ◽

Phosphate Buffered Saline ◽

The Brain ◽

Cat Superior Colliculus

The first visual cortical axons arrive in the cat superior colliculus by the time of birth. Adultlike receptive fields develop slowly over several weeks following birth. The developing cortical axons go through a sequence of changes before acquiring their adultlike morphology and function. To determine how these axons interact with neurons in the colliculus, cortico-collicular axons were labeled with biocytin (an anterograde neuronal tracer) and studied with electron microscopy.Deeply anesthetized animals received 200-500 nl injections of biocytin (Sigma; 5% in phosphate buffer) in the lateral suprasylvian visual cortical area. After a 24 hr survival time, the animals were deeply anesthetized and perfused with 0.9% phosphate buffered saline followed by fixation with a solution of 1.25% glutaraldehyde and 1.0% paraformaldehyde in 0.1M phosphate buffer. The brain was sectioned transversely on a vibratome at 50 μm. The tissue was processed immediately to visualize the biocytin.

Download Full-text