Reinforcement learning for optimal path length of nanobots using dynamic programming

Purpose Most manufacturing plants choose the easy way of completely separating human operators from robots to prevent accidents, but as a result, it dramatically affects the overall quality and speed that is expected from human–robot collaboration. It is not an easy task to ensure human safety when he/she has entered a robot’s workspace, and the unstructured nature of those working environments makes it even harder. The purpose of this paper is to propose a real-time robot collision avoidance method to alleviate this problem. Design/methodology/approach In this paper, a model is trained to learn the direct control commands from the raw depth images through self-supervised reinforcement learning algorithm. To reduce the effect of sample inefficiency and safety during initial training, a virtual reality platform is used to simulate a natural working environment and generate obstacle avoidance data for training. To ensure a smooth transfer to a real robot, the automatic domain randomization technique is used to generate randomly distributed environmental parameters through the obstacle avoidance simulation of virtual robots in the virtual environment, contributing to better performance in the natural environment. Findings The method has been tested in both simulations with a real UR3 robot for several practical applications. The results of this paper indicate that the proposed approach can effectively make the robot safety-aware and learn how to divert its trajectory to avoid accidents with humans within the workspace. Research limitations/implications The method has been tested in both simulations with a real UR3 robot in several practical applications. The results indicate that the proposed approach can effectively make the robot be aware of safety and learn how to change its trajectory to avoid accidents with persons within the workspace. Originality/value This paper provides a novel collision avoidance framework that allows robots to work alongside human operators in unstructured and complex environments. The method uses end-to-end policy training to directly extract the optimal path from the visual inputs for the scene.

Download Full-text

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Special Section on Deep Reinforcement Learning and Adaptive Dynamic Programming

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2017.2655663 ◽

2017 ◽

Vol 28 (3) ◽

pp. 772-772

Keyword(s):

Neural Networks ◽

Dynamic Programming ◽

Reinforcement Learning ◽

Special Section ◽

Learning Systems ◽

Adaptive Dynamic Programming ◽

Adaptive Dynamic

Download Full-text

Optimal path length in radiation transmission measurements

International Journal of Mineral Processing ◽

10.1016/0301-7516(78)90020-0 ◽

1978 ◽

Vol 5 (3) ◽

pp. 223-227

Author(s):

G.J. Lyman

Keyword(s):

Path Length ◽

Optimal Path ◽

Radiation Transmission ◽

Transmission Measurements

Download Full-text

Dynamic Programming and Reinforcement Learning

Machine Learning and Artificial Intelligence ◽

10.1007/978-3-030-26622-6_9 ◽

2019 ◽

pp. 91-98

Author(s):

Ameet V Joshi

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336.v1 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

MODIFIED VIRTUAL SEMI-CIRCLE PATH PLANNING

Jurnal Teknologi ◽

10.11113/jt.v78.9026 ◽

2016 ◽

Vol 78 (6-6) ◽

Author(s):

R. N. Farah ◽

Amira Shahirah ◽

N. Irwan ◽

R. L. Zuraida

Keyword(s):

Path Planning ◽

Path Length ◽

Optimal Path ◽

Unmanned Ground Vehicle ◽

Ground Vehicle ◽

Reactive Navigation ◽

Optimal Path Planning ◽

Two Phases

The challenging part of path planning for an Unmanned Ground Vehicle (UGV) is to conduct a reactive navigation. Reactive navigation is implemented to the sensor based UGV. The UGV defined the environment by collecting the information to construct it path planning. The UGV in this research is known as Mobile Guard UGV-Truck for Surveillance (MG-TruckS). Modified Virtual Semi Circle (MVSC) helps the MG-TruckS to reach it predetermined goal point successfully without any collision. MVSC is divided into two phases which are obstacles detection phase and obstacles avoidance phase to compute an optimal path planning. MVSC produces shorter path length, smoothness of velocity and reach it predetermined goal point successfully.

Download Full-text

Fuzzy reinforcement Learning and dynamic programming

Fuzzy Logic in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/3-540-58409-9_1 ◽

1994 ◽

pp. 1-9 ◽

Cited By ~ 12

Author(s):

Hamid R. Berenji

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning

Download Full-text