Fully distributed actor-critic architecture for multitask deep reinforcement learning

Abstract We propose a fully distributed actor-critic architecture, named diffusion-distributed-actor-critic Diff-DAC, with application to multitask reinforcement learning (MRL). During the learning process, agents communicate their value and policy parameters to their neighbours, diffusing the information across a network of agents with no need for a central station. Each agent can only access data from its local task, but aims to learn a common policy that performs well for the whole set of tasks. The architecture is scalable, since the computational and communication cost per agent depends on the number of neighbours rather than the overall number of agents. We derive Diff-DAC from duality theory and provide novel insights into the actor-critic framework, showing that it is actually an instance of the dual-ascent method. We prove almost sure convergence of Diff-DAC to a common policy under general assumptions that hold even for deep neural network approximations. For more restrictive assumptions, we also prove that this common policy is a stationary point of an approximation of the original problem. Numerical results on multitask extensions of common continuous control benchmarks demonstrate that Diff-DAC stabilises learning and has a regularising effect that induces higher performance and better generalisation properties than previous architectures.

Download Full-text

Optimising Performance for NB-IoT UE Devices through Data Driven Models

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10010021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 21

Author(s):

Omar Nassef ◽

Toktam Mahmoodi ◽

Foivos Michelinakis ◽

Kashif Mahmood ◽

Ahmed Elmokashfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Descent ◽

Deep Neural Network ◽

Narrow Band ◽

Learning Algorithm ◽

Base Station ◽

User Equipment ◽

Data Driven ◽

Superior Performance

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.

Download Full-text

Incentive-based demand response for smart grid with reinforcement learning and deep neural network

Applied Energy ◽

10.1016/j.apenergy.2018.12.061 ◽

2019 ◽

Vol 236 ◽

pp. 937-949 ◽

Cited By ~ 60

Author(s):

Renzhi Lu ◽

Seung Ho Hong

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Smart Grid ◽

Demand Response ◽

Deep Neural Network

Download Full-text

Application of deep neural network and deep reinforcement learning in wireless communication

PLoS ONE ◽

10.1371/journal.pone.0235447 ◽

2020 ◽

Vol 15 (7) ◽

pp. e0235447 ◽

Cited By ~ 2

Author(s):

Ming Li ◽

Hui Li

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Wireless Communication ◽

Deep Neural Network

Download Full-text

Share Market Prediction using Deep Neural Network

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6447.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8619-8622

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Reinforcement Learning ◽

Stock Market ◽

Deep Neural Network ◽

Learning Technology ◽

Financial Investment ◽

The Future ◽

Future Prediction

People, due to their complexity and volatile actions, are constantly faced with challenges in understanding the situation in the market share and the forecast for the future. For any financial investment, the stock market is a very important aspect. It is necessary to study while understanding the price fluctuations of the stock market. In this paper, the stock market prediction model using the Recurrent Digital natural Network (RDNN) is described. The model is designed using two important machine learning concepts: the recurrent neural network (RNN), multilayer perceptron (MLP) and reinforcement learning (RL). Deep learning is used to automatically extract important functions of the stock market; reinforcement learning of these functions will be useful for future prediction of the stock market, the system uses historical stock market data to understand the dynamic market behavior when you make decisions in an unknown environment. In this paper, the understanding of the dynamic stock market and the deep learning technology for predicting the price of the future stock market are described.

Download Full-text

Image Classification Using Reinforcement Learning

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2020-23-6-1172-1191 ◽

2020 ◽

Vol 23 (6) ◽

pp. 1172-1191

Author(s):

Artem Aleksandrovich Elizarov ◽

Evgenii Viktorovich Razinkov

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Machine Learning ◽

Computer Vision ◽

Reinforcement Learning ◽

Image Classification ◽

Deep Neural Network ◽

Learning Algorithms ◽

Further Development

Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence. The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.

Download Full-text

Continuous Control of an Underground Loader Using Deep Reinforcement Learning

Machines ◽

10.3390/machines9100216 ◽

2021 ◽

Vol 9 (10) ◽

pp. 216

Author(s):

Sofi Backman ◽

Daniel Lindmark ◽

Kenneth Bodin ◽

Martin Servin ◽

Joakim Mörk ◽

...

Keyword(s):

Reinforcement Learning ◽

Deep Neural Network ◽

Depth Camera ◽

Continuous Control ◽

Network Approach ◽

Energy Usage ◽

Neural Network Approach ◽

Loading Cycle ◽

Simulated Environment ◽

Multi Agent

The reinforcement learning control of an underground loader was investigated in a simulated environment by using a multi-agent deep neural network approach. At the start of each loading cycle, one agent selects the dig position from a depth camera image of a pile of fragmented rock. A second agent is responsible for continuous control of the vehicle, with the goal of filling the bucket at the selected loading point while avoiding collisions, getting stuck, or losing ground traction. This relies on motion and force sensors, as well as on a camera and lidar. Using a soft actor–critic algorithm, the agents learn policies for efficient bucket filling over many subsequent loading cycles, with a clear ability to adapt to the changing environment. The best results—on average, 75% of the max capacity—were obtained when including a penalty for energy usage in the reward.

Download Full-text

A Deep Learning Algorithm for the Max-Cut Problem Based on Pointer Network Structure with Supervised Learning and Reinforcement Learning Strategies

Mathematics ◽

10.3390/math8020298 ◽

2020 ◽

Vol 8 (2) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Shenshen Gu ◽

Yue Yang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Reinforcement Learning ◽

Combinatorial Optimization ◽

Supervised Learning ◽

Learning Strategies ◽

Large Scale ◽

Deep Neural Network ◽

Max Cut Problem ◽

Cut Problems

The Max-cut problem is a well-known combinatorial optimization problem, which has many real-world applications. However, the problem has been proven to be non-deterministic polynomial-hard (NP-hard), which means that exact solution algorithms are not suitable for large-scale situations, as it is too time-consuming to obtain a solution. Therefore, designing heuristic algorithms is a promising but challenging direction to effectively solve large-scale Max-cut problems. For this reason, we propose a unique method which combines a pointer network and two deep learning strategies (supervised learning and reinforcement learning) in this paper, in order to address this challenge. A pointer network is a sequence-to-sequence deep neural network, which can extract data features in a purely data-driven way to discover the hidden laws behind data. Combining the characteristics of the Max-cut problem, we designed the input and output mechanisms of the pointer network model, and we used supervised learning and reinforcement learning to train the model to evaluate the model performance. Through experiments, we illustrated that our model can be well applied to solve large-scale Max-cut problems. Our experimental results also revealed that the new method will further encourage broader exploration of deep neural network for large-scale combinatorial optimization problems.

Download Full-text