optimal action Latest Research Papers

A Deep-Q-Network (DQN) controls a virtual agent as the level of a player using only screenshots as inputs. Replay memory selects a limited number of experience replays according to an arbitrary batch size and updates them using the associated Q-function. Hence, relatively fewer experience replays of different states are utilized when the number of states is fixed and the state of the randomly selected transitions becomes identical or similar. The DQN may not be applicable in some environments where it is necessary to perform the learning process using more experience replays than is required by the limited batch size. In addition, because it is unknown whether each action can be executed, a problem of an increasing amount of repetitive learning occurs as more non-executable actions are selected. In this study, an enhanced DQN framework is proposed to resolve the batch size problem and reduce the learning time of a DQN in an environment with numerous non-executable actions. In the proposed framework, non-executable actions are filtered to reduce the number of selectable actions to identify the optimal action for the current state. The proposed method was validated in Gomoku, a strategy board game, in which the application of a traditional DQN would be difficult.

Download Full-text

Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

Information Technology And Control ◽

10.5755/j01.itc.50.3.25979 ◽

2021 ◽

Vol 50 (3) ◽

pp. 507-521

Author(s):

Atif Mehmood ◽

Inam ul Hasan Shaikh ◽

Ahsan Ali

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Tracking Control ◽

Mathematical Framework ◽

Continuous Action ◽

Reward Function ◽

Omnidirectional Mobile Robot ◽

Policy Gradient ◽

The Neural Networks ◽

Optimal Action

Deep reinforcement learning, the fastest growing technique, to solve real-world complex problems by creatinga simple mathematical framework. It includes an agent, action, environment, and a reward. An agent will interactwith the environment, takes an optimal action aiming to maximize the total reward. This paper proposesthe compelling technique of deep deterministic policy gradient for solving the complex continuous actionspace of 3-wheeled omnidirectional mobile robots. Three-wheeled Omnidirectional mobile robots tracking isa difficult task because of the orientation of the wheels which makes it rotate around its own axis rather tofollow the trajectory. A deep deterministic policy gradient (DDPG) algorithm has been designed to train in environmentswith continuous action space to follow the trajectory by training the neural networks defined forthe policy and value function to maximize the reward function defined for the tracking of the trajectory. DDPGagent environment is created in the Reinforcement learning toolbox in MATLAB 2019 while for Actor and criticnetwork design deep neural network designer is used. Results are shown to illustrate the effectiveness of thetechnique with a convergence of error approximately to zero.

Download Full-text

Deep Reinforcement Learning with Explicit Spatio-Sequential Encoding Network for Coronary Ostia Identification in CT Images

Sensors ◽

10.3390/s21186187 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6187

Author(s):

Yeonggul Jang ◽

Byunghwan Jeon

Keyword(s):

Reinforcement Learning ◽

Ground Truth ◽

Policy Network ◽

Accurate Identification ◽

Ground Truth Data ◽

Optimal Action ◽

Coronary Ostia ◽

Automatically Tracking ◽

Target Locations ◽

Essential Prerequisite

Accurate identification of the coronary ostia from 3D coronary computed tomography angiography (CCTA) is a essential prerequisite step for automatically tracking and segmenting three main coronary arteries. In this paper, we propose a novel deep reinforcement learning (DRL) framework to localize the two coronary ostia from 3D CCTA. An optimal action policy is determined using a fully explicit spatial-sequential encoding policy network applying 2.5D Markovian states with three past histories. The proposed network is trained using a dueling DRL framework on the CAT08 dataset. The experiment results show that our method is more efficient and accurate than the other methods. blueFloating-point operations (FLOPs) are calculated to measure computational efficiency. The result shows that there are 2.5M FLOPs on the proposed method, which is about 10 times smaller value than 3D box-based methods. In terms of accuracy, the proposed method shows that 2.22 ± 1.12 mm and 1.94 ± 0.83 errors on the left and right coronary ostia, respectively. The proposed method can be applied to the tasks to identify other target objects by changing the target locations in the ground truth data. Further, the proposed method can be utilized as a pre-processing method for coronary artery tracking methods.

Download Full-text

Time and Action Co-Training in Reinforcement Learning Agents

Frontiers in Control Engineering ◽

10.3389/fcteg.2021.722092 ◽

2021 ◽

Vol 2 ◽

Author(s):

Ashlesha Akella ◽

Chin-Teng Lin

Keyword(s):

Control System ◽

Reinforcement Learning ◽

Formation Control ◽

Least Square ◽

Effective Control ◽

Optimal Time ◽

Spatial Alignment ◽

Learning Agent ◽

Optimal Action ◽

Representation Of Time

In formation control, a robot (or an agent) learns to align itself in a particular spatial alignment. However, in a few scenarios, it is also vital to learn temporal alignment along with spatial alignment. An effective control system encompasses flexibility, precision, and timeliness. Existing reinforcement learning algorithms excel at learning to select an action given a state. However, executing an optimal action at an appropriate time remains challenging. Building a reinforcement learning agent which can learn an optimal time to act along with an optimal action can address this challenge. Neural networks in which timing relies on dynamic changes in the activity of population neurons have been shown to be a more effective representation of time. In this work, we trained a reinforcement learning agent to create its representation of time using a neural network with a population of recurrently connected nonlinear firing rate neurons. Trained using a reward-based recursive least square algorithm, the agent learned to produce a neural trajectory that peaks at the “time-to-act”; thus, it learns “when” to act. A few control system applications also require the agent to temporally scale its action. We trained the agent so that it could temporally scale its action for different speed inputs. Furthermore, given one state, the agent could learn to plan multiple future actions, that is, multiple times to act without needing to observe a new state.

Download Full-text

Kemeny Consensus Complexity

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/28 ◽

2021 ◽

Author(s):

Zack Fitzsimmons ◽

Edith Hemaspaandra

Keyword(s):

Social Preference ◽

Computational Study ◽

Optimal Solution ◽

Polynomial Hierarchy ◽

Natural Question ◽

Preference Functions ◽

Election System ◽

Optimal Action

The computational study of election problems generally focuses on questions related to the winner or set of winners of an election. But social preference functions such as Kemeny rule output a full ranking of the candidates (a consensus). We study the complexity of consensus-related questions, with a particular focus on Kemeny and its qualitative version Slater. The simplest of these questions is the problem of determining whether a ranking is a consensus, and we show that this problem is coNP-complete. We also study the natural question of the complexity of manipulative actions that have a specific consensus as a goal. Though determining whether a ranking is a Kemeny consensus is hard, the optimal action for manipulators is to simply vote their desired consensus. We provide evidence that this simplicity is caused by the combination of election system (Kemeny), manipulative action (manipulation), and manipulative goal (consensus). In the process we provide the first completeness results at the second level of the polynomial hierarchy for electoral manipulation and for optimal solution recognition.

Download Full-text

Research on Vibration Reduction Control Based on Reinforcement Learning

Advances in Civil Engineering ◽

10.1155/2021/7619214 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Rongyao Yuan ◽

Yang Yang ◽

Chao Su ◽

Shaopei Hu ◽

Heng Zhang ◽

...

Keyword(s):

Reinforcement Learning ◽

Control Strategy ◽

Frame Structure ◽

Semiactive Control ◽

Damping Force ◽

Mr Dampers ◽

Damping Control ◽

Optimal Damping ◽

Optimal Value ◽

Optimal Action

Magnetorheological (MR) dampers, as an intelligent vibration damping device, can quickly change the damping size of the material in milliseconds. The traditional semiactive control strategy cannot give full play to the ability of the MR dampers to consume energy and reduce vibration under different currents, and it is difficult to control the MR dampers accurately. In this paper, a semiactive control strategy based on reinforcement learning (RL) is proposed, which is based on “exploring” to learn the optimal value of the MR dampers at each step of the operation, the applied current value. During damping control, the learned optimal action value for each step is input into the MR dampers so that they provide the optimal damping force to the structure. Applying this strategy to a two-layer frame structure was found to provide more accurate control of the MR dampers, significantly improving the damping effect of the MR dampers.

Download Full-text

A Neural Network Approach to Overcoming a Priori Uncertainty in Optimal Action Planning of Intelligent Information Agents for Soft Architectures of Service-oriented Systems

2021 II International Conference on Neural Networks and Neurotechnologies (NeuroNT) ◽

10.1109/neuront53022.2021.9472825 ◽

2021 ◽

Author(s):

Larisa K. Ptitsyna ◽

Nidal el Sabayar Shevchenko ◽

Mikhail P. Belov ◽

Aleksey V. Ptitsyn

Keyword(s):

Neural Network ◽

A Priori ◽

Action Planning ◽

Network Approach ◽

Neural Network Approach ◽

Information Agents ◽

Service Oriented ◽

Optimal Action ◽

Intelligent Information ◽

A Priori Uncertainty

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Optimal Action-based or User Prediction-based Haptic Guidance: Can You Do Even Better?

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems ◽

10.1145/3411764.3445115 ◽

2021 ◽

Author(s):

Hee-Seung Moon ◽

Jiwon Seo

Keyword(s):

Haptic Guidance ◽

Optimal Action

Download Full-text

Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2021.13204 ◽

2021 ◽

Vol 13 (2) ◽

pp. 57-80

Author(s):

Arunita Kundaliya ◽

D.K. Lobiyal

Keyword(s):

Reinforcement Learning ◽

Network Lifetime ◽

Residual Energy ◽

Efficient Solutions ◽

Machine Learning Techniques ◽

Q Learning ◽

Learning Techniques ◽

Aodv Protocol ◽

Optimal Action ◽

Additional Memory

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.

Download Full-text

optimal action
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Enhanced DQN Framework for Selecting Actions and Updating Replay Memory Considering Massive Non-Executable Actions

Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

Deep Reinforcement Learning with Explicit Spatio-Sequential Encoding Network for Coronary Ostia Identification in CT Images

Time and Action Co-Training in Reinforcement Learning Agents

Kemeny Consensus Complexity

Research on Vibration Reduction Control Based on Reinforcement Learning

A Neural Network Approach to Overcoming a Priori Uncertainty in Optimal Action Planning of Intelligent Information Agents for Soft Architectures of Service-oriented Systems

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Optimal Action-based or User Prediction-based Haptic Guidance: Can You Do Even Better?

Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

Export Citation Format

optimal actionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Enhanced DQN Framework for Selecting Actions and Updating Replay Memory Considering Massive Non-Executable Actions

Application of Deep Reinforcement Learning for Tracking Control of 3WD Omnidirectional Mobile Robot

Deep Reinforcement Learning with Explicit Spatio-Sequential Encoding Network for Coronary Ostia Identification in CT Images

Time and Action Co-Training in Reinforcement Learning Agents

Kemeny Consensus Complexity

Research on Vibration Reduction Control Based on Reinforcement Learning

A Neural Network Approach to Overcoming a Priori Uncertainty in Optimal Action Planning of Intelligent Information Agents for Soft Architectures of Service-oriented Systems

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Optimal Action-based or User Prediction-based Haptic Guidance: Can You Do Even Better?

Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

optimal action
Recently Published Documents