scholarly journals Reinforcement learning using associative memory networks

Author(s):  
Ricardo Salmon

It is shown that associative memory networks are capable of solving immediate and general reinforcement learning (RL) problems by combining techniques from associative neural networks and reinforcement learning and in particular Q-learning. The modified model is shown to outperform native RL techniques on a stochastic grid world task by developing correct policies. In addition, we formulated an analogous method to add feature extraction as dimensional reduction and eligibility traces as another mechanism to help solve the credit assignment problem. The network contrary to pure RL methods is based on associative memory principles such as distribution of information, pattern completion, Hebbian learning, and noise tolerance (limit cycles, one to many associations, chaos, etc). Because of this, it can be argued that the model possesses more cognitive explanative power than other RL or hybrid models. It may be an effective tool for bridging the gap between biological memory models and computational memory models.

2021 ◽  
Author(s):  
Ricardo Salmon

It is shown that associative memory networks are capable of solving immediate and general reinforcement learning (RL) problems by combining techniques from associative neural networks and reinforcement learning and in particular Q-learning. The modified model is shown to outperform native RL techniques on a stochastic grid world task by developing correct policies. In addition, we formulated an analogous method to add feature extraction as dimensional reduction and eligibility traces as another mechanism to help solve the credit assignment problem. The network contrary to pure RL methods is based on associative memory principles such as distribution of information, pattern completion, Hebbian learning, and noise tolerance (limit cycles, one to many associations, chaos, etc). Because of this, it can be argued that the model possesses more cognitive explanative power than other RL or hybrid models. It may be an effective tool for bridging the gap between biological memory models and computational memory models.


1995 ◽  
Vol 2 ◽  
pp. 287-318 ◽  
Author(s):  
P. Cichosz

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.


Author(s):  
Faxin Qi ◽  
Xiangrong Tong ◽  
Lei Yu ◽  
Yingjie Wang

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


Author(s):  
Maximilian Moll ◽  
Leonhard Kunczik

AbstractIn recent history, reinforcement learning (RL) proved its capability by solving complex decision problems by mastering several games. Increased computational power and the advances in approximation with neural networks (NN) paved the path to RL’s successful applications. Even though RL can tackle more complex problems nowadays, it still relies on computational power and runtime. Quantum computing promises to solve these issues by its capability to encode information and the potential quadratic speedup in runtime. We compare tabular Q-learning and Q-learning using either a quantum or a classical approximation architecture on the frozen lake problem. Furthermore, the three algorithms are analyzed in terms of iterations until convergence to the optimal behavior, memory usage, and runtime. Within the paper, NNs are utilized for approximation in the classical domain, while in the quantum domain variational quantum circuits, as a quantum hybrid approximation method, have been used. Our simulations show that a quantum approximator is beneficial in terms of memory usage and provides a better sample complexity than NNs; however, it still lacks the computational speed to be competitive.


Sign in / Sign up

Export Citation Format

Share Document