QUANTUM COMPUTATION FOR ACTION SELECTION USING REINFORCEMENT LEARNING

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.

Download Full-text

Variability in Action Selection Relates to Striatal Dopamine 2/3 Receptor Availability in Humans: A PET Neuroimaging Study Using Reinforcement Learning and Active Inference Models

Cerebral Cortex ◽

10.1093/cercor/bhz327 ◽

2020 ◽

Vol 30 (6) ◽

pp. 3573-3589 ◽

Cited By ~ 1

Author(s):

Rick A Adams ◽

Michael Moutoussis ◽

Matthew M Nour ◽

Tarik Dahoun ◽

Declan Lewis ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Model Comparison ◽

Behavioral Model ◽

Action Selection ◽

Striatal Dopamine ◽

Active Inference ◽

Inference Models ◽

Positron Emission ◽

Dopamine Signaling

Abstract Choosing actions that result in advantageous outcomes is a fundamental function of nervous systems. All computational decision-making models contain a mechanism that controls the variability of (or confidence in) action selection, but its neural implementation is unclear—especially in humans. We investigated this mechanism using two influential decision-making frameworks: active inference (AI) and reinforcement learning (RL). In AI, the precision (inverse variance) of beliefs about policies controls action selection variability—similar to decision ‘noise’ parameters in RL—and is thought to be encoded by striatal dopamine signaling. We tested this hypothesis by administering a ‘go/no-go’ task to 75 healthy participants, and measuring striatal dopamine 2/3 receptor (D2/3R) availability in a subset (n = 25) using [11C]-(+)-PHNO positron emission tomography. In behavioral model comparison, RL performed best across the whole group but AI performed best in participants performing above chance levels. Limbic striatal D2/3R availability had linear relationships with AI policy precision (P = 0.029) as well as with RL irreducible decision ‘noise’ (P = 0.020), and this relationship with D2/3R availability was confirmed with a ‘decision stochasticity’ factor that aggregated across both models (P = 0.0006). These findings are consistent with occupancy of inhibitory striatal D2/3Rs decreasing the variability of action selection in humans.

Download Full-text

An Action Selection Method Using Degree of Cooperation in a Multi-agent Reinforcement Learning System

Journal of Robotics Networking and Artificial Life ◽

10.2991/jrnal.2014.1.3.13 ◽

2014 ◽

Vol 1 (3) ◽

pp. 231 ◽

Cited By ~ 1

Author(s):

Masanori Kawamura ◽

Kunikazu Kobayashi

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Learning System ◽

Multi Agent

Download Full-text

Action-Selection Method for Reinforcement Learning Based on Cuckoo Search Algorithm

Arabian Journal for Science and Engineering ◽

10.1007/s13369-017-2873-8 ◽

2017 ◽

Vol 43 (12) ◽

pp. 6771-6785 ◽

Cited By ~ 3

Author(s):

Bilal H. Abed-alguni

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Cuckoo Search ◽

Action Selection ◽

Cuckoo Search Algorithm ◽

Selection Method

Download Full-text

The Missing Link Between Memory and Reinforcement Learning

Frontiers in Psychology ◽

10.3389/fpsyg.2020.560080 ◽

2020 ◽

Vol 11 ◽

Author(s):

Christian Balkenius ◽

Trond A. Tjøstheim ◽

Birger Johansson ◽

Annika Wallin ◽

Peter Gärdenfors

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Sensory Information ◽

Sufficient Evidence ◽

Extended Model ◽

Missing Link ◽

Memory Mechanism ◽

State Action ◽

Vicarious Trial And Error ◽

Over Time

Reinforcement learning systems usually assume that a value function is defined over all states (or state-action pairs) that can immediately give the value of a particular state or action. These values are used by a selection mechanism to decide which action to take. In contrast, when humans and animals make decisions, they collect evidence for different alternatives over time and take action only when sufficient evidence has been accumulated. We have previously developed a model of memory processing that includes semantic, episodic and working memory in a comprehensive architecture. Here, we describe how this memory mechanism can support decision making when the alternatives cannot be evaluated based on immediate sensory information alone. Instead we first imagine, and then evaluate a possible future that will result from choosing one of the alternatives. Here we present an extended model that can be used as a model for decision making that depends on accumulating evidence over time, whether that information comes from the sequential attention to different sensory properties or from internal simulation of the consequences of making a particular choice. We show how the new model explains both simple immediate choices, choices that depend on multiple sensory factors and complicated selections between alternatives that require forward looking simulations based on episodic and semantic memory structures. In this framework, vicarious trial and error is explained as an internal simulation that accumulates evidence for a particular choice. We argue that a system like this forms the “missing link” between more traditional ideas of semantic and episodic memory, and the associative nature of reinforcement learning.

Download Full-text

Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation

Information ◽

10.3390/info10110341 ◽

2019 ◽

Vol 10 (11) ◽

pp. 341 ◽

Cited By ~ 2

Author(s):

Hu ◽

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Transfer Learning ◽

Action Function ◽

State Action ◽

Markov Decision ◽

Decision Making System ◽

Multi Agent ◽

Function Approximator ◽

Multi Robot

Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.

Download Full-text

Deep Reinforcement Learning via Past-Success Directed Exploration

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019979 ◽

2019 ◽

Vol 33 ◽

pp. 9979-9980

Author(s):

Xiaoming Liu ◽

Zhixiong Xu ◽

Lei Cao ◽

Xiliang Chen ◽

Kai Kang

Keyword(s):

Online Learning ◽

Adaptive Control ◽

Reinforcement Learning ◽

Learning Process ◽

Control Method ◽

Learning Algorithms ◽

Action Selection ◽

Experimental Results ◽

Continuous Control ◽

Exploration And Exploitation

The balance between exploration and exploitation has always been a core challenge in reinforcement learning. This paper proposes “past-success exploration strategy combined with Softmax action selection”(PSE-Softmax) as an adaptive control method for taking advantage of the characteristics of the online learning process of the agent to adapt exploration parameters dynamically. The proposed strategy is tested on OpenAI Gym with discrete and continuous control tasks, and the experimental results show that PSE-Softmax strategy delivers better performance than deep reinforcement learning algorithms with basic exploration strategies.

Download Full-text

CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS

International Journal of Neural Systems ◽

10.1142/s0129065714500026 ◽

2014 ◽

Vol 24 (05) ◽

pp. 1450002 ◽

Cited By ~ 29

Author(s):

JOHANNES FRIEDRICH ◽

ROBERT URBANCZIK ◽

WALTER SENN

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Action Selection ◽

Population Based ◽

Population Coding ◽

Spiking Neurons ◽

Learning Rules ◽

Speed Up ◽

Population Reinforcement ◽

Specific Learning

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

Download Full-text

Fog Computing Enabled Locality Based Product Demand Prediction and Decision Making Using Reinforcement Learning

Electronics ◽

10.3390/electronics10030227 ◽

2021 ◽

Vol 10 (3) ◽

pp. 227

Author(s):

Gone Neelakantam ◽

Djeane Debora Onthoni ◽

Prasan Kumar Sahoo

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Fog Computing ◽

Weather Condition ◽

Principal Component ◽

High Demand ◽

State Action ◽

Product Demand ◽

Demand Prediction ◽

Decision Making Model

Wastage of perishable and non-perishable products due to manual monitoring in shopping malls creates huge revenue loss in supermarket industry. Besides, internal and external factors such as calendar events and weather condition contribute to excess wastage of products in different regions of supermarket. It is a challenging job to know about the wastage of the products manually in different supermarkets region-wise. Therefore, the supermarket management needs to take appropriate decision and action to prevent the wastage of products. The fog computing data centers located in each region can collect, process and analyze data for demand prediction and decision making. In this paper, a product-demand prediction model is designed using integrated Principal Component Analysis (PCA) and K-means Unsupervised Learning (UL) algorithms and a decision making model is developed using State-Action-Reward-State-Action (SARSA) Reinforcement Learning (RL) algorithm. Our proposed method can cluster the products into low, medium, and high-demand product by learning from the designed features. Taking the derived cluster model, decision making for distributing low-demand to high-demand product can be made using SARSA. Experimental results show that our proposed method can cluster the datasets well with a Silhouette score of ≥60%. Besides, our adopted SARSA-based decision making model outperforms over Q-Learning, Monte-Carlo, Deep Q-Network (DQN), and Actor-Critic algorithms in terms of maximum cumulative reward, average cumulative reward and execution time.

Download Full-text

Forced ε-Greedy, an Expansion to the ε-Greedy Action Selection Method

10.3233/faia210070 ◽

2021 ◽

Author(s):

George Angelopoulos ◽

Dimitris Metafas

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Board Game ◽

Selection Methods ◽

Training Process ◽

Greedy Method ◽

Learning Methods ◽

Q Learning ◽

Time Required

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.

Download Full-text

Sistem Pendukung Keputusan SNMPTN Jalur Undangan Dengan Metode Electre

Jurasik (Jurnal Riset Sistem Informasi dan Teknik Informatika) ◽

10.30645/jurasik.v3i0.63 ◽

2018 ◽

Vol 3 ◽

pp. 14

Author(s):

Lidia K Simanjuntak ◽

Tessa Y M Sihite ◽

Mesran Mesran ◽

Nuning Kurniasih ◽

Yuhandri Yuhandri

Keyword(s):

Decision Making ◽

Decision Support ◽

Decision Support System ◽

Support System ◽

Action Selection ◽

Multi Criteria Decision Making ◽

Prospective Students ◽

College Entrance ◽

Electre Method ◽

Selection Of

All colleges each year organize the selection of new admissions. Acceptance of prospective students in universities as education providers is done by selecting prospective students based on achievement in school and college entrance selection. To select the best student candidates based on predetermined criteria, then use Multi-Criteria Decision Making (MCDM) or commonly called decision support system. One method in MCDM is the Elimination Et Choix Traduisant la Reality (ELECTRE). The ELECTRE method is the best method of action selection. The ELECTRE method to obtain the best alternative by eliminating alternative that do not fit the criteria and can be applied to the decision SNMPTN invitation path.

Download Full-text