Evolutionary Optimization of Neural Networks for Reinforcement Learning Algorithms

This paper proposes a new reinforcement learning algorithm that can learn, using neural networks and CMAC, a mapping function between highdimensional sensors and the motors of an autonomous robot. Conventional reinforcement learning algorithms require a lot of memory because they use lookup tables to describe high-dimensional mapping functions. Researchers have therefore tried to develop reinforcement learning algorithms that can learn the high-dimensional mapping functions. We apply the proposed method to an autonomous robot navigation problem and a multi-link robot arm reaching problem, and we evaluate the effectiveness of the method.

Download Full-text

Network Parameter Setting for Reinforcement Learning Approaches Using Neural Networks

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p0822 ◽

2011 ◽

Vol 15 (7) ◽

pp. 822-930 ◽

Cited By ~ 2

Author(s):

Kazuaki Yamada ◽

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Mobile Robot ◽

Learning Algorithms ◽

Mapping Function ◽

Learning Approaches ◽

Autonomous Mobile Robot ◽

Q Learning ◽

Initial Values ◽

Continuous State

Reinforcement learning approaches are attracting attention as a technique for constructing a trial-anderror mapping function between sensors and motors of an autonomous mobile robot. Conventional reinforcement learning approaches use a look-up table to express the mapping function between grid state and grid action spaces. The grid size greatly adversely affects the learning performance of reinforcement learning algorithms. To avoid this, researchers have proposed reinforcement learning algorithms using neural networks to express the mapping function between continuous state space and action. A designer, however, must set the number of middle neurons and initial values of weight parameters appropriately to improve the approximate accuracy of neural networks. This paper proposes a new method that automatically sets the number ofmiddle neurons and initial values of weight parameters based on the dimension number of the sensor space. The feasibility of proposed method is demonstrated using an autonomous mobile robot navigation problem and is evaluated by comparing it with two types of Q-learning as follows: Q-learning using RBF networks and Q-learning using neural networks whose parameters are set by a designer.

Download Full-text

Cognitive Radio Networks with Reinforcement Learning Algorithms for Spectrum Allocation: A Survey

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/211952020 ◽

2020 ◽

Vol 9 (5) ◽

pp. 8371-8384

Keyword(s):

Reinforcement Learning ◽

Cognitive Radio ◽

Cognitive Radio Networks ◽

Learning Algorithms ◽

Radio Networks ◽

Spectrum Allocation

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text

Benchmarking reinforcement learning algorithms for demand response applications

2020 IEEE PES Innovative Smart Grid Technologies Europe (ISGT-Europe) ◽

10.1109/isgt-europe47291.2020.9248800 ◽

2020 ◽

Author(s):

Brida V. Mbuwir ◽

Carlo Manna ◽

Fred Spiessens ◽

Geert Deconinck

Keyword(s):

Reinforcement Learning ◽

Demand Response ◽

Learning Algorithms

Download Full-text

Reinforcement Learning Algorithms: Analysis and Applications

10.1007/978-3-030-41188-6 ◽

2021 ◽

Keyword(s):

Reinforcement Learning ◽

Learning Algorithms

Download Full-text

Experimental evaluation of model-free reinforcement learning algorithms for continuous HVAC control

Applied Energy ◽

10.1016/j.apenergy.2021.117164 ◽

2021 ◽

Vol 298 ◽

pp. 117164

Author(s):

Marco Biemann ◽

Fabian Scheller ◽

Xiufeng Liu ◽

Lizhen Huang

Keyword(s):

Reinforcement Learning ◽

Experimental Evaluation ◽

Learning Algorithms ◽

Model Free ◽

Hvac Control

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text