Persistent Patrol in Stochastic Environments with Limited Sensors

Author(s):  
Vu Anh Huynh ◽  
John Enright ◽  
Emilio Frazzoli
2010 ◽  
Vol 4 (4) ◽  
pp. 407-421 ◽  
Author(s):  
Hal Caswell ◽  
Michael G. Neubert ◽  
Christine M. Hunter

2011 ◽  
Vol 139 (7) ◽  
pp. 2276-2289 ◽  
Author(s):  
Arthur A. Small ◽  
Jason B. Stefik ◽  
Johannes Verlinde ◽  
Nathaniel C. Johnson

Abstract A decision algorithm is presented that improves the productivity of data collection activities in stochastic environments. The algorithm was developed in the context of an aircraft field campaign organized to collect data in situ from boundary layer clouds. Required lead times implied that aircraft deployments had to be scheduled in advance, based on imperfect forecasts regarding the presence of conditions meeting specified requirements. Given an overall cap on the number of flights, daily fly/no-fly decisions were taken traditionally using a discussion-intensive process involving heuristic analysis of weather forecasts by a group of skilled human investigators. An alternative automated decision process uses self-organizing maps to convert weather forecasts into quantified probabilities of suitable conditions, together with a dynamic programming procedure to compute the opportunity costs of using up scarce flights from the limited budget. Applied to conditions prevailing during the 2009 Routine ARM Aerial Facility (AAF) Clouds with Low Optical Water Depths (CLOWD) Optical Radiative Observations (RACORO) campaign of the U.S. Department of Energy’s Atmospheric Radiation Measurement Program, the algorithm shows a 21% increase in data yield and a 66% improvement in skill over the heuristic decision process used traditionally. The algorithmic approach promises to free up investigators’ cognitive resources, reduce stress on flight crews, and increase productivity in a range of data collection applications.


Author(s):  
Jan Leike ◽  
Tor Lattimore ◽  
Laurent Orseau ◽  
Marcus Hutter

We discuss some recent results on Thompson sampling for nonparametric reinforcement learning in countable classes of general stochastic environments. These environments can be non-Markovian, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges in mean to the optimal value and (2) given a recoverability assumption regret is sublinear. We conclude with a discussion about optimality in reinforcement learning.


2012 ◽  
Vol 31 (1) ◽  
pp. 23-28 ◽  
Author(s):  
Hiromu Ito ◽  
Takashi Uehara ◽  
Satoru Morita ◽  
Kei-ichi Tainaka ◽  
Jin Yoshimura

2000 ◽  
Vol 155 (6) ◽  
pp. 724-734 ◽  
Author(s):  
Frédéric Menu ◽  
Jean‐Philippe Roebuck ◽  
Muriel Viala

Sign in / Sign up

Export Citation Format

Share Document