Health Aware Planning Under Uncertainty for Collaborating Heterogeneous Teams of Mobile Agents

We consider the problem of solving hybrid discrete-continuous Markov Decision Processes (MDPs) that are often encountered in computing optimal policies for complex multi-agent missions with both continuous vehicle dynamics and discrete mission-state transition models, in the presence of potential health degradations and failures of individual agents. A comprehensive Health Aware Planning (HAP) framework is proposed that establishes a feedback between mission planning and vehicle-level learning-focused adaptive controllers through online learned own models of agent health and capabilities. The HAP framework accounts for predicted likelihood of vehicle health degradations captured through probabilistic state-dependent models that are integrated into the MDP formulation. This proactive ability to anticipate health degradation and plan accordingly enables the HAP approach to consistently outperform planners that change the policies only after failures have occurred (reactive planners). The approach is tested on a large-scale (≈ 1010 state–action pairs) long-duration (persistent) target tracking scenario using a novel on-trajectory planning algorithm, and demonstrated to sustain higher mission performance by reducing the number of failures and re-assessing Unmanned Aerial Vehicle (UAV) capabilities.

Download Full-text

HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty

The International Journal of Robotics Research ◽

10.1177/0278364920937074 ◽

2020 ◽

pp. 027836492093707

Author(s):

Panpan Cai ◽

Yuanfu Luo ◽

David Hsu ◽

Wee Sun Lee

Keyword(s):

Real Time ◽

Computational Cost ◽

Search Tree ◽

Planning Under Uncertainty ◽

State Action ◽

Online Planning ◽

Planning Algorithm ◽

One Step ◽

Robotic Tasks ◽

High Computational Cost

Robust planning under uncertainty is critical for robots in uncertain, dynamic environments, but incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, Hybrid Parallel DESPOT (HyP-DESPOT) is a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs; it performs parallel Monte Carlo simulations at the leaf nodes of the search tree using GPUs. HyP-DESPOT provably converges in finite time under moderate conditions and guarantees near-optimality of the solution. Experimental results show that HyP-DESPOT speeds up online planning by up to a factor of several hundred in several challenging robotic tasks in simulation, compared with the original DESPOT algorithm. It also exhibits real-time performance on a robot vehicle navigating among many pedestrians.

Download Full-text

Perseus: Randomized Point-based Value Iteration for POMDPs

Journal of Artificial Intelligence Research ◽

10.1613/jair.1659 ◽

2005 ◽

Vol 24 ◽

pp. 195-220 ◽

Cited By ~ 209

Author(s):

M. T.J. Spaan ◽

N. Vlassis

Keyword(s):

Large Scale ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Finite Set ◽

Partially Observable ◽

Set Of Points ◽

Action Spaces ◽

Belief Set

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

Download Full-text

DESPOT: Online POMDP Planning with Regularization

Journal of Artificial Intelligence Research ◽

10.1613/jair.5328 ◽

2017 ◽

Vol 58 ◽

pp. 231-266 ◽

Cited By ~ 27

Author(s):

Nan Ye ◽

Adhiraj Somani ◽

David Hsu ◽

Wee Sun Lee

Keyword(s):

Autonomous Driving ◽

Vehicle Control ◽

Planning Under Uncertainty ◽

Driving System ◽

Online Planning ◽

Markov Decision ◽

Planning Algorithm ◽

Regret Bound ◽

Partially Observable ◽

Autonomous Driving System

The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.

Download Full-text

Placenta-Expanded Stromal Cell Therapy in a Rodent Model of Simulated Weightlessness

Cells ◽

10.3390/cells10040940 ◽

2021 ◽

Vol 10 (4) ◽

pp. 940

Author(s):

Linda Rubinstein ◽

Amber M. Paul ◽

Charles Houseman ◽

Metadel Abegaz ◽

Steffy Tabares Ruiz ◽

...

Keyword(s):

Health Risks ◽

Stromal Cells ◽

Therapeutic Potential ◽

Medical Intervention ◽

Hindlimb Unloading ◽

Simulated Weightlessness ◽

Potential Health ◽

Long Duration ◽

Induced Changes ◽

Splenic Atrophy

Long duration spaceflight poses potential health risks to astronauts during flight and re-adaptation after return to Earth. There is an emerging need for NASA to provide successful and reliable therapeutics for long duration missions when capability for medical intervention will be limited. Clinically relevant, human placenta-derived therapeutic stromal cells (PLX-PAD) are a promising therapeutic alternative. We found that treatment of adult female mice with PLX-PAD near the onset of simulated weightlessness by hindlimb unloading (HU, 30 d) was well-tolerated and partially mitigated decrements caused by HU. Specifically, PLX-PAD treatment rescued HU-induced thymic atrophy, and mitigated HU-induced changes in percentages of circulating neutrophils, but did not rescue changes in the percentages of lymphocytes, monocytes, natural killer (NK) cells, T-cells and splenic atrophy. Further, PLX-PAD partially mitigated HU effects on the expression of select cytokines in the hippocampus. In contrast, PLX-PAD failed to protect bone and muscle from HU-induced effects, suggesting that the mechanisms which regulate the structure of these mechanosensitive tissues in response to disuse are discrete from those that regulate the immune- and central nervous system (CNS). These findings support the therapeutic potential of placenta-derived stromal cells for select physiological deficits during simulated spaceflight. Multiple countermeasures are likely needed for comprehensive protection from the deleterious effects of prolonged spaceflight.

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

The non-linear effects of the Fed asset purchases

Studies in Nonlinear Dynamics & Econometrics ◽

10.1515/snde-2020-0022 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Alessio Anzuini

Keyword(s):

Financial Markets ◽

Large Scale ◽

Balance Sheet ◽

Monetary Economics ◽

American Economic Review ◽

Vector Autoregressive ◽

Portfolio Balance ◽

State Dependent ◽

Linear Effects ◽

Asset Purchases

Abstract The Federal Reserve responded to the great financial crisis deploying new monetary policy tools, the most notable of which being the expansion of its balance sheet. In a recent paper, Weale, M., and T. Wieladek. 2016. “What Are the Macroeconomic Effects of Asset Purchases?” Journal of Monetary Economics 79 (C): 81–93 show that the asset purchases were effective in stimulating economic activity as well as inflation and asset prices. Here I show that their results are state dependent: large scale asset purchase are effective only when financial markets are impaired. Financial markets are under stress when the effective risk-bearing capacity of the financial sector is drastically reduced, i.e. when the excess bond premium (EBP) of Gilchrist, S., and E. Zakrajšek. 2012. “Credit Spreads and Business Cycle Fluctuations.” The American Economic Review 102 (4): 1692–72 exceed a certain threshold. Using an estimated threshold vector autoregressive model conditional on the EBP regime, I show that an increase in the balance sheet has expansionary effects on GDP and inflation when EBP is high, but not when it is low (as its effects become mostly insignificant). I argue that the high EBP can be interpreted as a proxy of market dis-functioning so that only when this channel of transmission is on, the unconventional policy is particularly effective. This suggests that models of transmission of unconventional policies, based on asset purchases, should focus also on the market functioning channel and not only on the portfolio balance one.

Download Full-text

The Two-Party System in British Politics

American Political Science Review ◽

10.2307/1952027 ◽

1953 ◽

Vol 47 (2) ◽

pp. 337-358 ◽

Cited By ~ 6

Author(s):

Leslie Lipson

Keyword(s):

Economic Policy ◽

Party System ◽

Large Scale ◽

British Politics ◽

State Planning ◽

Party Government ◽

Contemporary State ◽

Long Duration ◽

Effective System ◽

Laissez Faire

Britain may fairly be called the classic home of two-party government. This claim is justifiable because of some characteristics for which the system, as employed in Britain, is distinctive. Chief among these is its long duration. Although there is room for disagreement among historians about the time and circumstances of its birth, it would be difficult to deny that two-party government was established earlier, has lasted longer, and at the present time is probably more firmly rooted there than in any contemporary state. Indeed, the practice of simplifying the complexities of politics into a contest for office between a pair of major claimants has endured in Britain through a catalogue of changes which would assuredly have wrecked a less effective system. In that country it has survived the evolution from an oligarchy of aristocrats to a democracy of the whole people; the transfer of power from monarchy to parliament and then from parliament to cabinet; the rise of large-scale industry with its social aftermath; the switch in economic policy from mercantilism to laissez faire and from this to state planning; and withal, the expansion and subsequent shrinkage of Britain's international might.

Download Full-text

Implementing post-trial access plans for HIV prevention research

Journal of Medical Ethics ◽

10.1136/medethics-2017-104637 ◽

2018 ◽

Vol 44 (5) ◽

pp. 354-358 ◽

Cited By ~ 5

Author(s):

Amy Paul ◽

Maria W Merritt ◽

Jeremy Sugarman

Keyword(s):

Hiv Prevention ◽

Large Scale ◽

Healthcare Systems ◽

System Level ◽

Planning Under Uncertainty ◽

Work Related ◽

Prevention Trials ◽

Practical Experiences ◽

Post Trial ◽

Local Healthcare

Ethics guidance increasingly recognises that researchers and sponsors have obligations to consider provisions for post-trial access (PTA) to interventions that are found to be beneficial in research. Yet, there is little information regarding whether and how such plans can actually be implemented. Understanding practical experiences of developing and implementing these plans is critical to both optimising their implementation and informing conceptual work related to PTA. This viewpoint is informed by experiences with developing and implementing PTA plans for six large-scale multicentre HIV prevention trials supported by the HIV Prevention Trials Network. These experiences suggest that planning and implementing PTA often involve challenges of planning under uncertainty and confronting practical barriers to accessing healthcare systems. Even in relatively favourable circumstances where a tested intervention medication is approved and available in the local healthcare system, system-level barriers can threaten the viability of PTA plans. The aggregate experience across these HIV prevention trials suggests that simply referring participants to local healthcare systems for PTA will not necessarily result in continued access to beneficial interventions for trial participants. Serious commitments to PTA will require additional efforts to learn from future approaches, measuring the success of PTA plans with dedicated follow-up and further developing normative guidance to help research stakeholders navigate the complex practical challenges of realising PTA.

Download Full-text

Dynamic Pricing and Routing for Same-Day Delivery

Transportation Science ◽

10.1287/trsc.2019.0958 ◽

2020 ◽

Vol 54 (4) ◽

pp. 1016-1033 ◽

Cited By ~ 3

Author(s):

Marlin W. Ulmer

Keyword(s):

Dynamic Pricing ◽

Function Approximation ◽

Computational Study ◽

Routing Problem ◽

Value Function Approximation ◽

Routing Policy ◽

State Dependent ◽

Markov Decision ◽

Fixed Prices ◽

Number Of Customers

An increasing number of e-commerce retailers offers same-day delivery. To deliver the ordered goods, providers dynamically dispatch a fleet of vehicles transporting the goods from the warehouse to the customers. In many cases, retailers offer different delivery deadline options, from four-hour delivery up to next-hour delivery. Due to the deadlines, vehicles often only deliver a few orders per trip. The overall number of served orders within the delivery horizon is small and the revenue low. As a result, many companies currently struggle to conduct same-day delivery cost-efficiently. In this paper, we show how dynamic pricing is able to substantially increase both revenue and the number of customers we are able to serve the same day. To this end, we present an anticipatory pricing and routing policy (APRP) method that incentivizes customers to select delivery deadline options efficiently for the fleet to fulfill. This maintains the fleet’s flexibility to serve more future orders. We model the respective pricing and routing problem as a Markov decision process (MDP). To apply APRP, the state-dependent opportunity costs per customer and option are required. To this end, we use a guided offline value function approximation (VFA) based on state space aggregation. The VFA approximates the opportunity cost for every state and delivery option with respect to the fleet’s flexibility. As an offline method, APRP is able to determine suitable prices instantly when a customer orders. In an extensive computational study, we compare APRP with a policy based on fixed prices and with conventional temporal and geographical pricing policies. APRP outperforms the benchmark policies significantly, leading to both a higher revenue and more customers served the same day.

Download Full-text