A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

Predicting Human Mobility with Reinforcement-Learning-Based Long-Term Periodicity Modeling

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3469860 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-23

Author(s):

Shuo Tao ◽

Jingang Jiang ◽

Defu Lian ◽

Kai Zheng ◽

Enhong Chen

Keyword(s):

Reinforcement Learning ◽

Human Mobility ◽

Recurrent Network ◽

Mobility Prediction ◽

Learning Framework ◽

Temporal Features ◽

Wide Range ◽

Spatio Temporal ◽

Historical Trajectory

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.

Download Full-text

Reinforcement Learning With Modulated Spike Timing–Dependent Synaptic Plasticity

Journal of Neurophysiology ◽

10.1152/jn.00364.2007 ◽

2007 ◽

Vol 98 (6) ◽

pp. 3648-3665 ◽

Cited By ~ 69

Author(s):

Michael A. Farries ◽

Adrienne L. Fairhall

Keyword(s):

Synaptic Plasticity ◽

Reinforcement Learning ◽

General Purpose ◽

Spike Timing ◽

Neural Population ◽

Minimal Set ◽

Population Responses ◽

Wide Range ◽

Output Neurons ◽

Novel Model

Spike timing–dependent synaptic plasticity (STDP) has emerged as the preferred framework linking patterns of pre- and postsynaptic activity to changes in synaptic strength. Although synaptic plasticity is widely believed to be a major component of learning, it is unclear how STDP itself could serve as a mechanism for general purpose learning. On the other hand, algorithms for reinforcement learning work on a wide variety of problems, but lack an experimentally established neural implementation. Here, we combine these paradigms in a novel model in which a modified version of STDP achieves reinforcement learning. We build this model in stages, identifying a minimal set of conditions needed to make it work. Using a performance-modulated modification of STDP in a two-layer feedforward network, we can train output neurons to generate arbitrarily selected spike trains or population responses. Furthermore, a given network can learn distinct responses to several different input patterns. We also describe in detail how this model might be implemented biologically. Thus our model offers a novel and biologically plausible implementation of reinforcement learning that is capable of training a neural population to produce a very wide range of possible mappings between synaptic input and spiking output.

Download Full-text

A Multi-Agent Reinforcement Learning Framework for Lithium-ion Battery Scheduling Problems

Energies ◽

10.3390/en13081982 ◽

2020 ◽

Vol 13 (8) ◽

pp. 1982

Author(s):

Yu Sui ◽

Shiming Song

Keyword(s):

Reinforcement Learning ◽

Scheduling Algorithm ◽

Lithium Ion ◽

Thermal Control ◽

Use Cases ◽

Limiting Factor ◽

Scheduling Problems ◽

Battery Lifetime ◽

Learning Framework ◽

Wide Range

This paper presents a reinforcement learning framework for solving battery scheduling problems in order to extend the lifetime of batteries used in electrical vehicles (EVs), cellular phones, and embedded systems. Battery pack lifetime has often been the limiting factor in many of today’s smart systems, from mobile devices and wireless sensor networks to EVs. Smart charge-discharge scheduling of battery packs is essential to obtain super linear gain of overall system lifetime, due to the recovery effect and nonlinearity in the battery characteristics. Additionally, smart scheduling has also been shown to be beneficial for optimizing the system’s thermal profile and minimizing chances of irreversible battery damage. The recent rapidly-growing community and development infrastructure have added deep reinforcement learning (DRL) to the available tools for designing battery management systems. Through leveraging the representation powers of deep neural networks and the flexibility and versatility of reinforcement learning, DRL offers a powerful solution to both roofline analysis and real-world deployment on complicated use cases. This work presents a DRL-based battery scheduling framework to solve battery scheduling problems, with high flexibility to fit various battery models and application scenarios. Through the discussion of this framework, comparisons have also been made between conventional heuristics-based methods and DRL. The experiments demonstrate that DRL-based scheduling framework achieves battery lifetime comparable to the best weighted-k round-robin (kRR) heuristic scheduling algorithm. In the meantime, the framework offers much greater flexibility in accommodating a wide range of battery models and use cases, including thermal control and imbalanced battery.

Download Full-text

A Hebbian-Based Reinforcement Learning Framework for Spike-Timing-Dependent Synapses

Artificial Neural Networks – ICANN 2010 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-15822-3_21 ◽

2010 ◽

pp. 160-169 ◽

Cited By ~ 2

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Learning Framework

Download Full-text

Inferring Linkage Disequilibrium Between a Polymorphic Marker Locus and a Trait Locus in Natural Populations

Genetics ◽

10.1093/genetics/156.1.457 ◽

2000 ◽

Vol 156 (1) ◽

pp. 457-467 ◽

Cited By ~ 1

Author(s):

Z W Luo ◽

S H Tao ◽

Z-B Zeng

Keyword(s):

Linkage Disequilibrium ◽

Allele Frequency ◽

Random Mating ◽

Natural Populations ◽

Polymorphic Marker ◽

Marker Locus ◽

Model Parameters ◽

Phenotypic Variance ◽

Wide Range ◽

Trait Locus

Abstract Three approaches are proposed in this study for detecting or estimating linkage disequilibrium between a polymorphic marker locus and a locus affecting quantitative genetic variation using the sample from random mating populations. It is shown that the disequilibrium over a wide range of circumstances may be detected with a power of 80% by using phenotypic records and marker genotypes of a few hundred individuals. Comparison of ANOVA and regression methods in this article to the transmission disequilibrium test (TDT) shows that, given the genetic variance explained by the trait locus, the power of TDT depends on the trait allele frequency, whereas the power of ANOVA and regression analyses is relatively independent from the allelic frequency. The TDT method is more powerful when the trait allele frequency is low, but much less powerful when it is high. The likelihood analysis provides reliable estimation of the model parameters when the QTL variance is at least 10% of the phenotypic variance and the sample size of a few hundred is used. Potential use of these estimates in mapping the trait locus is also discussed.

Download Full-text

Hierarchical Reinforcement Learning Framework for Secure UAV Communication in the Presence of Multiple UAV Adaptive Eavesdroppers

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344970 ◽

2020 ◽

Author(s):

Liu Jue ◽

Yang Weiwei

Keyword(s):

Reinforcement Learning ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

Antifungal Activity of Propyl Disulfide from Neem (Azadirachta indica) in Vapor and Agar Diffusion Assays against Anthracnose Pathogens (Colletotrichum gloeosporioides and Colletotrichum acutatum) in Mango Fruit

Microorganisms ◽

10.3390/microorganisms9040839 ◽

2021 ◽

Vol 9 (4) ◽

pp. 839

Author(s):

Muhammad Rafiullah Khan ◽

Vanee Chonhenchob ◽

Chongxing Huang ◽

Panitee Suwanamornlert

Keyword(s):

Antifungal Activity ◽

Mycelial Growth ◽

Colletotrichum Gloeosporioides ◽

Colletotrichum Acutatum ◽

Model Parameters ◽

Pathogenicity Test ◽

Agar Diffusion ◽

Wide Range ◽

Significant Difference ◽

Diffusion Assay

Microorganisms causing anthracnose diseases have a medium to a high level of resistance to the existing fungicides. This study aimed to investigate neem plant extract (propyl disulfide, PD) as an alternative to the current fungicides against mango’s anthracnose. Microorganisms were isolated from decayed mango and identified as Colletotrichum gloeosporioides and Colletotrichum acutatum. Next, a pathogenicity test was conducted and after fulfilling Koch’s postulates, fungi were reisolated from these symptomatic fruits and we thus obtained pure cultures. Then, different concentrations of PD were used against these fungi in vapor and agar diffusion assays. Ethanol and distilled water were served as control treatments. PD significantly (p ≤ 0.05) inhibited more of the mycelial growth of these fungi than both controls. The antifungal activity of PD increased with increasing concentrations. The vapor diffusion assay was more effective in inhibiting the mycelial growth of these fungi than the agar diffusion assay. A good fit (R2, 0.950) of the experimental data in the Gompertz growth model and a significant difference in the model parameters, i.e., lag phase (λ), stationary phase (A) and mycelial growth rate, further showed the antifungal efficacy of PD. Therefore, PD could be the best antimicrobial compound against a wide range of microorganisms.

Download Full-text

Goal-driven active learning

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09527-5 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Process ◽

Real World ◽

Imitation Learning ◽

Learning Approaches ◽

Wide Range ◽

Fixed Set ◽

Complex Decision Making ◽

Complex Decision

AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.

Download Full-text

Modelling the Inflation and Elastic Instabilities of Rubber-Like Spherical and Cylindrical Shells Using a New Generalised Neo-Hookean Strain Energy Function

Journal of Elasticity ◽

10.1007/s10659-021-09823-x ◽

2021 ◽

Author(s):

Afshin Anssari-Benam ◽

Andrea Bucchi ◽

Giuseppe Saccomandi

Keyword(s):

Experimental Data ◽

Energy Function ◽

Limit Point ◽

Strain Energy ◽

Cylindrical Shells ◽

Strain Energy Function ◽

Model Parameters ◽

Wide Range ◽

Cylindrical Tubes ◽

Gent Model

AbstractThe application of a newly proposed generalised neo-Hookean strain energy function to the inflation of incompressible rubber-like spherical and cylindrical shells is demonstrated in this paper. The pressure ($P$ P ) – inflation ($\lambda $ λ or $v$ v ) relationships are derived and presented for four shells: thin- and thick-walled spherical balloons, and thin- and thick-walled cylindrical tubes. Characteristics of the inflation curves predicted by the model for the four considered shells are analysed and the critical values of the model parameters for exhibiting the limit-point instability are established. The application of the model to extant experimental datasets procured from studies across 19th to 21st century will be demonstrated, showing favourable agreement between the model and the experimental data. The capability of the model to capture the two characteristic instability phenomena in the inflation of rubber-like materials, namely the limit-point and inflation-jump instabilities, will be made evident from both the theoretical analysis and curve-fitting approaches presented in this study. A comparison with the predictions of the Gent model for the considered data is also demonstrated and is shown that our presented model provides improved fits. Given the simplicity of the model, its ability to fit a wide range of experimental data and capture both limit-point and inflation-jump instabilities, we propose the application of our model to the inflation of rubber-like materials.

Download Full-text

Adaptive Reinforcement Learning Framework for NOMA-UAV Networks

IEEE Communications Letters ◽

10.1109/lcomm.2021.3093385 ◽

2021 ◽

pp. 1-1

Author(s):

Syed Khurram Mahmud ◽

Yuanwei Liu ◽

Yue Chen ◽

Kok Keong Chai

Keyword(s):

Reinforcement Learning ◽

Learning Framework

Download Full-text