Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a `planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation.We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.This paper is a significantly abridged and IJCAI audience targeted version of the original NIPS 2016 paper with the same title, available here: https://arxiv.org/abs/1602.02867

Download Full-text

Optimal tracking control based on reinforcement learning value iteration algorithm for time-delayed nonlinear systems with external disturbances and input constraints

Information Sciences ◽

10.1016/j.ins.2020.11.057 ◽

2021 ◽

Vol 554 ◽

pp. 84-98

Author(s):

Mehdi Mohammadi ◽

Mohammad Mehdi Arefi ◽

Peyman Setoodeh ◽

Okyay Kaynak

Keyword(s):

Reinforcement Learning ◽

Nonlinear Systems ◽

Tracking Control ◽

Iteration Algorithm ◽

Value Iteration ◽

Input Constraints ◽

External Disturbances ◽

Optimal Tracking ◽

Optimal Tracking Control ◽

Value Iteration Algorithm

Download Full-text

Towards Accurate Deceptive Opinions Detection Based on Word Order-Preserving CNN

Mathematical Problems in Engineering ◽

10.1155/2018/2410206 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Siyuan Zhao ◽

Zhiwei Xu ◽

Limin Liu ◽

Mengjie Guo ◽

Jing Yun

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Word Order ◽

Text Analysis ◽

Important Application ◽

Detection Mechanism ◽

Short Text

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.

Download Full-text

Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control.

International Journal of Emerging Electric Power Systems ◽

10.2202/1553-779x.1066 ◽

2005 ◽

Vol 3 (1) ◽

Cited By ~ 14

Author(s):

Damien Ernst ◽

Mevludin Glavic ◽

Pierre Geurts ◽

Louis Wehenkel

Keyword(s):

Reinforcement Learning ◽

Power System ◽

Control Problem ◽

Learning Algorithm ◽

Electrical Power ◽

Complex Case ◽

Iteration Algorithm ◽

Value Iteration ◽

Learning Context ◽

Power System Control

In this paper we explain how to design intelligent agents able to process the information acquired from interaction with a system to learn a good control policy and show how the methodology can be applied to control some devices aimed to damp electrical power oscillations. The control problem is formalized as a discrete-time optimal control problem and the information acquired from interaction with the system is a set of samples, where each sample is composed of four elements: a state, the action taken while being in this state, the instantaneous reward observed and the successor state of the system. To process this information we consider reinforcement learning algorithms that determine an approximation of the so-called Q-function by mimicking the behavior of the value iteration algorithm. Simulations are first carried on a benchmark power system modeled with two state variables. Then we present a more complex case study on a four-machine power system where the reinforcement learning algorithm controls a Thyristor Controlled Series Capacitor (TCSC) aimed to damp power system oscillations.

Download Full-text

A Model-Based Factored Bayesian Reinforcement Learning Approach

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.1092 ◽

2014 ◽

Vol 513-517 ◽

pp. 1092-1095

Author(s):

Bo Wu ◽

Yan Peng Feng ◽

Hong Yan Zheng

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Practical Applications ◽

Model Based ◽

Online Planning ◽

Bayesian Reinforcement Learning ◽

Bayesian Inference Method ◽

Unknown Structure

Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.

Download Full-text

RL-CNN: Reinforcement Learning-designed Convolutional Neural Network for Urban Traffic Flow Estimation

2021 International Wireless Communications and Mobile Computing (IWCMC) ◽

10.1109/iwcmc51323.2021.9498948 ◽

2021 ◽

Author(s):

Mostafa Karimzadeh ◽

Alessandro Esposito ◽

Zhongliang Zhao ◽

Torsten Braun ◽

Susana Sargento

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Convolutional Neural Network ◽

Traffic Flow ◽

Urban Traffic ◽

Flow Estimation ◽

Traffic Flow Estimation

Download Full-text

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4032875 ◽

2016 ◽

Vol 138 (6) ◽

Author(s):

Thai Duong ◽

Duong Nguyen-Huu ◽

Thinh Nguyen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Probability ◽

Transition Probability Matrix ◽

Rate Of Change ◽

Optimal Decision ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download Full-text

nso-HSVI: A Not-So-Optimistic Heuristic Search Value Iteration Algorithm for POMDPs

2014 IEEE 26th International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2014.108 ◽

2014 ◽

Author(s):

Feng Liu ◽

Haibo Li ◽

Chongjun Wang

Keyword(s):

Heuristic Search ◽

Iteration Algorithm ◽

Value Iteration ◽

Value Iteration Algorithm

Download Full-text

Pemanfaatan Asynchronous Advantage Actor-Critic Dalam Pembuatan AI Game Bot Pada Game Arcade

Journal of Intelligent System and Computation ◽

10.52985/insyst.v1i2.82 ◽

2019 ◽

Vol 1 (2) ◽

pp. 74-84

Author(s):

Evan Kusuma Susanto ◽

Yosi Kristian

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Reinforcement Learning ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Trial And Error ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

Asynchronous Advantage Actor-Critic (A3C) adalah sebuah algoritma deep reinforcement learning yang dikembangkan oleh Google DeepMind. Algoritma ini dapat digunakan untuk menciptakan sebuah arsitektur artificial intelligence yang dapat menguasai berbagai jenis game yang berbeda melalui trial and error dengan mempelajari tempilan layar game dan skor yang diperoleh dari hasil tindakannya tanpa campur tangan manusia. Sebuah network A3C terdiri dari Convolutional Neural Network (CNN) di bagian depan, Long Short-Term Memory Network (LSTM) di tengah, dan sebuah Actor-Critic network di bagian belakang. CNN berguna sebagai perangkum dari citra output layar dengan mengekstrak fitur-fitur yang penting yang terdapat pada layar. LSTM berguna sebagai pengingat keadaan game sebelumnya. Actor-Critic Network berguna untuk menentukan tindakan terbaik untuk dilakukan ketika dihadapkan dengan suatu kondisi tertentu. Dari hasil percobaan yang dilakukan, metode ini cukup efektif dan dapat mengalahkan pemain pemula dalam memainkan 5 game yang digunakan sebagai bahan uji coba.

Download Full-text

Prediction of breast cancer distant recurrence using natural language processing and knowledge-guided convolutional neural network

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2020.101977 ◽

2020 ◽

Vol 110 ◽

pp. 101977 ◽

Cited By ~ 1

Author(s):

Hanyin Wang ◽

Yikuan Li ◽

Seema A Khan ◽

Yuan Luo

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Convolutional Neural Network ◽

Language Processing ◽

Distant Recurrence

Download Full-text

A Deep Paraphrase Identification Model Interacting Semantics with Syntax

Complexity ◽

10.1155/2020/9757032 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Leilei Kong ◽

Zhongyuan Han ◽

Yong Han ◽

Haoliang Qi

Keyword(s):

Neural Network ◽

Natural Language ◽

Convolutional Neural Network ◽

Semantic Representation ◽

Experimental Results ◽

Plagiarism Detection ◽

Linguistic Features ◽

Syntactic Structures ◽

Syntactic Features ◽

Identification Model

Paraphrase identification is central to many natural language applications. Based on the insight that a successful paraphrase identification model needs to adequately capture the semantics of the language objects as well as their interactions, we present a deep paraphrase identification model interacting semantics with syntax (DPIM-ISS) for paraphrase identification. DPIM-ISS introduces the linguistic features manifested in syntactic features to produce more explicit structures and encodes the semantic representation of sentence on different syntactic structures by means of interacting semantics with syntax. Then, DPIM-ISS learns the paraphrase pattern from this representation interacting the semantics with syntax by exploiting a convolutional neural network with convolution-pooling structure. Experiments are conducted on the corpus of Microsoft Research Paraphrase (MSRP), PAN 2010 corpus, and PAN 2012 corpus for paraphrase plagiarism detection. The experimental results demonstrate that DPIM-ISS outperforms the classical word-matching approaches, the syntax-similarity approaches, the convolution neural network-based models, and some deep paraphrase identification models.

Download Full-text