Learning Reward Function with Matching Network for Mapless Navigation

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

Download Full-text

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/840 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alberto Camacho ◽

Rodrigo Toro Icarte ◽

Toryn Q. Klassen ◽

Richard Valenzano ◽

Sheila A. McIlraith

Keyword(s):

Reinforcement Learning ◽

Normal Form ◽

State Of The Art ◽

Formal Languages ◽

Function Structure ◽

Q Learning ◽

Reward Function ◽

Form Representation ◽

Reward Shaping ◽

Reward Functions

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.

Download Full-text

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016722 ◽

2019 ◽

Vol 33 ◽

pp. 6722-6729 ◽

Cited By ~ 4

Author(s):

Ziming Li ◽

Julia Kiseleva ◽

Maarten De Rijke

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Imitation Learning ◽

Local Optimum ◽

Inverse Reinforcement Learning ◽

High Quality ◽

Overall Performance

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

Download Full-text

Deep Reinforcement Learning Overview of the state of the Art

Journal of Automation Mobile Robotics & Intelligent Systems ◽

10.14313/jamris_3-2018/15 ◽

2018 ◽

Vol 12 (3) ◽

pp. 20-39 ◽

Cited By ~ 2

Author(s):

Youssef Fenjiro ◽

Houda Benbrahim

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State

Download Full-text

Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning

Journal of Intelligent & Robotic Systems ◽

10.1007/s10846-021-01491-2 ◽

2021 ◽

Vol 103 (4) ◽

Author(s):

Bartomeu Rubí ◽

Bernardo Morcego ◽

Ramon Pérez

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Low Cost ◽

Path Following ◽

The State ◽

Gradient Algorithm ◽

Avoidance Task ◽

Learning Approaches ◽

Reward Function ◽

Novel Structure

AbstractA deep reinforcement learning approach for solving the quadrotor path following and obstacle avoidance problem is proposed in this paper. The problem is solved with two agents: one for the path following task and another one for the obstacle avoidance task. A novel structure is proposed, where the action computed by the obstacle avoidance agent becomes the state of the path following agent. Compared to traditional deep reinforcement learning approaches, the proposed method allows to interpret the training process outcomes, is faster and can be safely trained on the real quadrotor. Both agents implement the Deep Deterministic Policy Gradient algorithm. The path following agent was developed in a previous work. The obstacle avoidance agent uses the information provided by a low-cost LIDAR to detect obstacles around the vehicle. Since LIDAR has a narrow field-of-view, an approach for providing the agent with a memory of the previously seen obstacles is developed. A detailed description of the process of defining the state vector, the reward function and the action of this agent is given. The agents are programmed in python/tensorflow and are trained and tested in the RotorS/gazebo platform. Simulations results prove the validity of the proposed approach.

Download Full-text

Single Trajectory Learning: Exploration Versus Exploitation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418590097 ◽

2018 ◽

Vol 32 (06) ◽

pp. 1859009 ◽

Cited By ~ 1

Author(s):

Qiming Fu ◽

Quan Liu ◽

Shan Zhong ◽

Heng Luo ◽

Hongjie Wu ◽

...

Keyword(s):

Reinforcement Learning ◽

Polynomial Function ◽

State Of The Art ◽

Large Set ◽

Crucial Issue ◽

Reward Function ◽

Current State ◽

Exploration Exploitation ◽

Exploration Versus Exploitation ◽

Trajectory Learning

In reinforcement learning (RL), the exploration/exploitation (E/E) dilemma is a very crucial issue, which can be described as searching between the exploration of the environment to find more profitable actions, and the exploitation of the best empirical actions for the current state. We focus on the single trajectory RL problem where an agent is interacting with a partially unknown MDP over single trajectories, and try to deal with the E/E in this setting. Given the reward function, we try to find a good E/E strategy to address the MDPs under some MDP distribution. This is achieved by selecting the best strategy in mean over a potential MDP distribution from a large set of candidate strategies, which is done by exploiting single trajectories drawn from plenty of MDPs. In this paper, we mainly make the following contributions: (1) We discuss the strategy-selector algorithm based on formula set and polynomial function. (2) We provide the theoretical and experimental regret analysis of the learned strategy under an given MDP distribution. (3) We compare these methods with the “state-of-the-art” Bayesian RL method experimentally.

Download Full-text

StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/103 ◽

2018 ◽

Cited By ~ 6

Author(s):

Xiangteng He ◽

Yuxin Peng ◽

Junjie Zhao

Keyword(s):

Reinforcement Learning ◽

Visual Information ◽

Experimental Validation ◽

State Of The Art ◽

Two Stage ◽

Visual Categorization ◽

Fine Grained ◽

Reward Function ◽

Main Challenge ◽

Labor Consumption

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.

Download Full-text

Design a Wireless Covert Channel Based on Dither Analog Chaotic Code

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2021030108 ◽

2021 ◽

Vol 13 (2) ◽

pp. 115-133

Author(s):

Pengcheng Cao ◽

Weiwei Liu ◽

Guangjie Liu ◽

Jiangtao Zhai ◽

Xiao-Peng Ji ◽

...

Keyword(s):

State Of The Art ◽

Error Correcting Code ◽

The State ◽

Covert Channel ◽

Artificial Noise ◽

Similar Performance ◽

Chaotic Mapping ◽

Wireless Signal ◽

Dense Distribution ◽

Simulation Results

To conceal the very existence of communication, the noise-based wireless covert channel modulates secret messages into artificial noise, which is added to the normal wireless signal. Although the state-of-the-art work based on constellation modulation has made the composite and legitimate signal undistinguishable, there exists an imperfection on reliability due to the dense distribution of covert constellation points. In this study, the authors design a wireless covert channel based on dither analog chaotic code to improve the reliability without damaging the undetectability. The dither analog chaotic code (DACC) plays the role as the error correcting code. In the modulation, the analog variables converted from secret messages are encode into joint codewords by chaotic mapping and dither derivation of DACC. The joint codewords are mapped to artificial noise later. Simulation results show that the proposed scheme can achieve better reliability than the state-of-the-art scheme while maintaining the similar performance on undetectability.

Download Full-text

Parameter Analysis of Interfering Applications in Multi-Core Environment for Throughput Enhancement

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2922.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1272-1286

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

The State ◽

Parameter Analysis ◽

Interference Effects ◽

Shared Resources ◽

Throughput Enhancement ◽

Depth Analysis ◽

Simulation Results

In Multi-core systems the applications co-execute in Multi-programmed mode, have interfere with each other during execution, which creates resource bottleneck affecting the performance. To reduce the interference in a given set of resources some conventional approaches don't give guarantee of performance in a conflicting application environment. In this paper, we make an in-depth analysis of benchmark applications interference for shared resources and find out application set which could be executed adopting a designated policy to mitigate the interference effects. In this work, we have performed profiling and analysis of applications on the state-of-the-art simulator gem5. Finally, we conclude the possibility of performance improvement through the designated policy. The simulation results show the scope to have a new scheduler for performance improvement in such systems.

Download Full-text

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm

Sensors ◽

10.3390/s21165643 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5643

Author(s):

Wenqiang Zu ◽

Hongyu Yang ◽

Renyu Liu ◽

Yulong Ji

Keyword(s):

Arrival Time ◽

Specific Pattern ◽

Reward Function ◽

The Neural Network ◽

Dubins Path ◽

Reward Shaping ◽

Neural Network Structure ◽

Simulation Results ◽

Guidance Problem ◽

Application Prospects

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. [d=Zu]In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach.To enhance the performance, in the present study, a multi-layer RL approach to solve the multi-dimensional goal aircraft guidance problem is proposed. The approach [d=Zu]enablesassists the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL [d=Zu]approach is proposedmethod to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. [d=Zu]Experimental and simulation results show that the proposed approachExperiments are conducted and the simulation results reveal that the proposed method can significantly improve the convergence efficiency and trajectory performance. [d=Zu]FurthermoreFurther, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.

Download Full-text