Generative Adversarial Imitation Learning from Failed Experiences (Student Abstract)

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

Solving Channel Allocation by Reinforcement Learning in Cognitive Enabled Vehicular Ad Hoc Networks

10.32920/ryerson.14652336 ◽

2021 ◽

Author(s):

Yunfan Su

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Optimal Policy ◽

Ad Hoc ◽

Transition Probabilities ◽

Channel Allocation ◽

Dynamic Programming Method ◽

Learning Method ◽

Time Intervals ◽

Model Free

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.

Download Full-text

Hand interception of occluded motion in humans: a test of model-based vs. on-line control

Journal of Neurophysiology ◽

10.1152/jn.00475.2015 ◽

2015 ◽

Vol 114 (3) ◽

pp. 1577-1592 ◽

Cited By ~ 26

Author(s):

Barbara La Scaleia ◽

Myrka Zago ◽

Francesco Lacquaniti

Keyword(s):

Visual Information ◽

Target Motion ◽

Physical Models ◽

Model Based Control ◽

Projectile Motion ◽

Starting Position ◽

Model Based ◽

Model Free ◽

Ball Velocity ◽

On Line

Two control schemes have been hypothesized for the manual interception of fast visual targets. In the model-free on-line control, extrapolation of target motion is based on continuous visual information, without resorting to physical models. In the model-based control, instead, a prior model of target motion predicts the future spatiotemporal trajectory. To distinguish between the two hypotheses in the case of projectile motion, we asked participants to hit a ball that rolled down an incline at 0.2 g and then fell in air at 1 g along a parabola. By varying starting position, ball velocity and trajectory differed between trials. Motion on the incline was always visible, whereas parabolic motion was either visible or occluded. We found that participants were equally successful at hitting the falling ball in both visible and occluded conditions. Moreover, in different trials the intersection points were distributed along the parabolic trajectories of the ball, indicating that subjects were able to extrapolate an extended segment of the target trajectory. Remarkably, this trend was observed even at the very first repetition of movements. These results are consistent with the hypothesis of model-based control, but not with on-line control. Indeed, ball path and speed during the occlusion could not be extrapolated solely from the kinematic information obtained during the preceding visible phase. The only way to extrapolate ball motion correctly during the occlusion was to assume that the ball would fall under gravity and air drag when hidden from view. Such an assumption had to be derived from prior experience.

Download Full-text

Virtual Imitation Learning method based on TD3-GAIL for robot manipulator

The Transactions of The Korean Institute of Electrical Engineers ◽

10.5370/kiee.2021.70.1.145 ◽

2021 ◽

Vol 70 (1) ◽

pp. 145-151

Author(s):

Seonghyeon Jo ◽

Jongcheon Park ◽

Sangmoon Lee

Keyword(s):

Robot Manipulator ◽

Imitation Learning ◽

Learning Method

Download Full-text

Enhanced Regional Monitoring of Wheat Powdery Mildew Based on an Instance-Based Transfer Learning Method

Remote Sensing ◽

10.3390/rs11030298 ◽

2019 ◽

Vol 11 (3) ◽

pp. 298 ◽

Cited By ~ 7

Author(s):

Linyi Liu ◽

Yingying Dong ◽

Wenjiang Huang ◽

Xiaoping Du ◽

Juhua Luo ◽

...

Keyword(s):

Powdery Mildew ◽

Transfer Learning ◽

Kappa Coefficient ◽

Learning Method ◽

Training Process ◽

Wheat Powdery Mildew ◽

Field Samples ◽

Monitoring Accuracy ◽

Sample Data ◽

Better Than

In order to monitor the prevalence of wheat powdery mildew, current methods require sufficient sample data to obtain results with higher accuracy and stable validation. However, it is difficult to collect data on wheat powdery mildew in some regions, and this limitation in sampling restricts the accuracy of monitoring regional prevalence of the disease. In this study, an instance-based transfer learning method, i.e., TrAdaBoost, was applied to improve the monitoring accuracy with limited field samples by using auxiliary samples from another region. By taking into account the representativeness of contributions of auxiliary samples to adjust the weight placed on auxiliary samples, an optimized TrAdaBoost algorithm, named OpTrAdaBoost, was generated to map regional wheat powdery mildew. The algorithm conducts this by: (1) producing uncertainty associated with each prediction based on the similarities, and calculating the representativeness contribution of all auxiliary samples by taking into account the overall uncertainty of the wheat powdery mildew map; (2) calculating the errors of the weak learners during the training process and using boosting to filter out the unreliable auxiliary samples by adjusting the weights of auxiliary samples; (3) combining all weak learners according to the weights of training instances to build a strong learner to classify disease severity. OpTrAdaBoost was tested using a dataset with 39 study area samples and 106 auxiliary samples. The overall monitoring accuracy was 82%, and the kappa coefficient was 0.72. Moreover, OpTrAdaBoost performed better than other algorithms that are commonly used to monitor wheat powdery mildew at the regional level. Experimental results demonstrated that OpTrAdaBoost was effective in improving the accuracy of monitoring wheat powdery mildew using limited field samples.

Download Full-text

Energy Optimization of Solar Micro-Grid Using Multi Agent Reinforcement Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.787.843 ◽

2015 ◽

Vol 787 ◽

pp. 843-847

Author(s):

Leo Raju ◽

R.S. Milton ◽

S. Sakthiyanandan

Keyword(s):

Reinforcement Learning ◽

Energy Savings ◽

Learning Method ◽

Solar Pv ◽

Q Learning ◽

Pv Systems ◽

Model Free ◽

Individual Unit ◽

Multi Agent ◽

Micro Grid

In this paper, two solar Photovoltaic (PV) systems are considered; one in the department with capacity of 100 kW and the other in the hostel with capacity of 200 kW. Each one has battery and load. The capital cost and energy savings by conventional methods are compared and it is proved that the energy dependency from grid is reduced in solar micro-grid element, operating in distributed environment. In the smart grid frame work, the grid energy consumption is further reduced by optimal scheduling of the battery, using Reinforcement Learning. Individual unit optimization is done by a model free reinforcement learning method, called Q-Learning and it is compared with distributed operations of solar micro-grid using a Multi Agent Reinforcement Learning method, called Joint Q-Learning. The energy planning is designed according to the prediction of solar PV energy production and observed load pattern of department and the hostel. A simulation model was developed using Python programming.

Download Full-text

An on-line sequential learning method in social networks for node classification

Neurocomputing ◽

10.1016/j.neucom.2014.04.074 ◽

2015 ◽

Vol 149 ◽

pp. 207-214 ◽

Cited By ~ 10

Author(s):

Yongjiao Sun ◽

Ye Yuan ◽

Guoren Wang

Keyword(s):

Social Networks ◽

Sequential Learning ◽

Learning Method ◽

On Line ◽

Node Classification

Download Full-text

Joint Entity and Event Extraction with Generative Adversarial Imitation Learning

Data Intelligence ◽

10.1162/dint_a_00014 ◽

2019 ◽

Vol 1 (2) ◽

pp. 99-120 ◽

Cited By ~ 6

Author(s):

Tongtao Zhang ◽

Heng Ji ◽

Avirup Sil

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Event Extraction ◽

Imitation Learning ◽

Learning Method ◽

Inverse Reinforcement Learning ◽

Generative Adversarial Network ◽

Adversarial Network ◽

The Difference ◽

New Framework

We propose a new framework for entity and event extraction based on generative adversarial imitation learning—an inverse reinforcement learning method using a generative adversarial network (GAN). We assume that instances and labels yield to various extents of difficulty and the gains and penalties (rewards) are expected to be diverse. We utilize discriminators to estimate proper rewards according to the difference between the labels committed by the ground-truth (expert) and the extractor (agent). Our experiments demonstrate that the proposed framework outperforms state-of-the-art methods.

Download Full-text