Autonomous Control of Urban Storm Water Networks Using Reinforcement Learning

We investigate the real-time and autonomous operation of a 12 km2 urban storm water network, which has been retrofitted with sensors and control valves. Specifically, we evaluate reinforcement learning, a technique rooted in deep learning, as a system-level control methodology. The controller opens and closes valves in the system, which enhances the performance in the storm water network by coordinating the discharges amongst spatially distributed storm water assets (i.e. detention basins and wetlands). A reinforcement learning control algorithm is implemented to control the storm water network across an urban watershed. Results show that control of valves using reinforcement learning shows great potential, but extensive research still needs to be conducted to develop a fundamental understanding of control robustness. We specifically discuss the role and importance of the reward function (i.e. heuristic control objective), which guides the autonomous controller towards achieving the desired water shed scale response.

Download Full-text

Hierarchical Reinforcement Learning Considering Stochastic Wind Disturbance for Power Line Maintenance Robot

10.21203/rs.3.rs-783306/v1 ◽

2021 ◽

Author(s):

Xiaoliang Zheng ◽

Gongping Wu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Power Line ◽

Gradient Algorithm ◽

State Function ◽

Wind Disturbance ◽

Local Approach ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Autonomous Operation

Abstract Robot intelligence includes motion intelligence and cognitive intelligence. Aiming at the motion intelligence, a hierarchical reinforcement learning architecture considering stochastic wind disturbance is proposed for the decision-making of the power line maintenance robot with autonomous operation. This architecture uses the prior information of the mechanism knowledge and empirical data to improve the safety and efficiency of the robot operation. In this architecture, the high-level policy selection and the low-level motion control at global and local levels are considered comprehensively under the condition of stochastic wind disturbance. Firstly, the operation task is decomposed into three sub-policies: global obstacle avoidance, local approach and local tightening, and each sub-policy is learned. Then, a master policy is learned to select the operation sub-policy in the current state. The dual deep Q network algorithm is used for the master policy, while the deep deterministic policy gradient algorithm is used for the operation policy. In order to improve the training efficiency, the global obstacle avoidance sub-policy takes the random forest composed of dynamic environmental decision tree as the expert algorithm for imitation learning. The architecture is applied to a power line maintenance scenario, the state function and reward function of each policy are designed, and all policies are trained in an asynchronous and parallel computing environment. It is proved that this architecture can realize stable and safe autonomous operating decision for the power line maintenance robot subjected to stochastic wind disturbance.

Download Full-text

Reinforcement and Imitation Learning Applied to Autonomous Aerial Robot Control

10.5753/wtdr_ctdr.2020.14956 ◽

2020 ◽

Author(s):

Gabriel Moraes Barros ◽

Esther Colombini

Keyword(s):

Reinforcement Learning ◽

Autonomous Systems ◽

Control Policy ◽

Primary Objective ◽

Imitation Learning ◽

Level Control ◽

Reward Function ◽

Long Time ◽

Learning Reinforcement ◽

Function Approximator

In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn, improve, adapt, and reproduce tasks with dynamically changing constraints based on exploration and autonomous learning. Reinforcement Learning (RL) aims at addressing this problem by enabling a robot to learn behaviors through trial-and-error. With RL, a Neural Network can be trained as a function approximator to directly map states to actuator commands making any predefined control structure not-needed for training. However, the knowledge required to converge these methods is usually built from scratch. Learning may take a long time, not to mention that RL algorithms need a stated reward function. Sometimes, it is not trivial to define one. Often it is easier for a teacher, human or intelligent agent, do demonstrate the desired behavior or how to accomplish a given task. Humans and other animals have a natural ability to learn skills from observation, often from merely seeing these skills’ effects: without direct knowledge of the underlying actions. The same principle exists in Imitation Learning, a practical approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. In this scenario, this work’s primary objective is to design an agent that can successfully imitate a prior acquired control policy using Imitation Learning. The chosen algorithm is GAIL since we consider that it is the proper algorithm to tackle this problem by utilizing expert (state, action) trajectories. As reference expert trajectories, we implement state-of-the-art on and off-policy methods PPO and SAC. Results show that the learned policies for all three methods can solve the task of low-level control of a quadrotor and that all can account for generalization on the original tasks.

Download Full-text

Motion Planning with Energy Reduction for a Floating Robotic Platform Under Disturbances and Measurement Noise Using Reinforcement Learning

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600059 ◽

2018 ◽

Vol 27 (04) ◽

pp. 1860005 ◽

Cited By ~ 1

Author(s):

Konstantinos Tziortziotis ◽

Nikolaos Tziortziotis ◽

Kostas Vlachos ◽

Konstantinos Blekas

Keyword(s):

Reinforcement Learning ◽

Energy Consumption ◽

Degrees Of Freedom ◽

Optimal Path ◽

Iteration Scheme ◽

Measurement Noise ◽

Target Point ◽

Level Control ◽

Reward Function ◽

Marine Platform

This paper investigates the use of reinforcement learning for the navigation of an over-actuated, i.e. more control inputs than degrees of freedom, marine platform in unknown environment. The proposed approach uses an online least-squared policy iteration scheme for value function approximation in order to estimate optimal policy, in conjunction with a low-level control system that controls the magnitude of the linear velocity, and the orientation of the platform. Primary goal of the proposed scheme is the reduction of the consumed energy. To that end, we propose a variable reward function that depends on the energy consumption of the platform. We evaluate our approach in a complex and realistic simulation environment and report results concerning its performance on estimating optimal navigation policies under different environmental disturbances, and position GPS measurement noise. The proposed framework is compared, in terms of energy consumption, to a baseline approach based on virtual potential fields. The results show that the marine platform successfully discovers the target point following a sub-optimal path, maintaining reduced energy consumption.

Download Full-text

Evaluating a Sewershed Urban Storm Water Model for Variability in Parameter Sensitivity and Resolution Effects

10.31224/osf.io/u5tsz ◽

2020 ◽

Author(s):

Zhaokai Dong ◽

Daniel Bain ◽

Murat Akcakaya ◽

Carla Ng

Keyword(s):

Pipe Flow ◽

Peak Flow ◽

Model Performance ◽

Parameter Sensitivity ◽

Geometric Parameters ◽

Storm Water ◽

Parameter Estimates ◽

Kappa Statistics ◽

Thiessen Polygon ◽

Urban Storm

A high-quality parameter set is essential for reliable stormwater models. Model performance can be improved by optimizing initial parameter estimates. Parameter sensitivity analysis is a robust way to distinguish the influence of parameters on model output and efficiently target the most important parameters to modify. This study evaluates efficient construction of a sewershed model using relatively low-resolution (e.g., 30 meter DEM) data and explores model sensitivity to parameters and regional characteristics using the EPA’s Storm Water Management Model (SWMM). A SWMM model was developed for a sewershed in the City of Pittsburgh, where stormwater management is a critical concern. We assumed uniform or log-normal distributions for parameters and used Monte Carlo simulations to explore and rank the influence of parameters on predicted surface runoff, peak flow, maximum pipe flow and model performance, as measured using the Nash–Sutcliffe efficiency metric. By using the Thiessen polygon approach for sub-catchment delineations, we substantially simplified the parameterization of the areas and hydraulic parameters. Despite this simplification, our approach provided good agreement with monitored pipe flow (Nash–Sutcliffe efficiency: 0.41 – 0.85). Total runoff and peak flow were very sensitive to the model discretization. The size of the polygons (modeled subcatchment areas) and imperviousness had the most influence on both outputs. The imperviousness, infiltration and Manning’s roughness (in the pervious area) contributed strongly to the Nash-Sutcliffe efficiency (70%), as did pipe geometric parameters (92%). Parameter rank sets were compared by using kappa statistics between any two model elements to identify generalities. Within our relatively large (9.7 km^2) sewershed, optimizing parameters for the highly impervious (>50%) areas and larger pipes lower in the network contributed most to improving Nash–Sutcliffe efficiency. The geometric parameters influence the water quantity distribution and flow conveyance, while imperviousness determines the subcatchment subdivision and influences surface water generation. Application of the Thiessen polygon approach can simplify the construction of large-scale urban storm water models, but the model is sensitive to the sewer network configuration and care must be taken in parameterizing areas (polygons) with heterogenous land uses.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Impacts of urban storm-water drainage channels on a northern Australian mangrove forest

Trees ◽

10.1007/s00468-001-0143-5 ◽

2001 ◽

Vol 16 (2-3) ◽

pp. 195-203 ◽

Cited By ~ 1

Author(s):

Anja Moritz-Zimmermann ◽

Keith A. McGuinness ◽

Manfred Küppers

Keyword(s):

Mangrove Forest ◽

Storm Water ◽

Water Drainage ◽

Drainage Channels ◽

Urban Storm

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text

Reinforcement Learning Approaches in Social Robotics

Sensors ◽

10.3390/s21041292 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1292

Author(s):

Neziha Akalin ◽

Amy Loutfi

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Social Robotics ◽

Research Field ◽

Social Robots ◽

Learning Approaches ◽

Reward Function ◽

Optimal Behavior ◽

Learning Challenges ◽

Starting Point

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

Download Full-text