Analysis of partially observed recursive tile systems

Introduction: In some phase I trial settings, there is uncertainty in assessing whether a given patient meets the criteria for dose-limiting toxicity. Methods: We present a design which accommodates dose-limiting toxicity outcomes that are assessed with uncertainty for some patients. Our approach could be utilized in many available phase I trial designs, but we focus on the continual reassessment method due to its popularity. We assume that for some patients, instead of the usual binary dose-limiting toxicity outcome, we observe a physician-assessed probability of dose-limiting toxicity specific to a given patient. Data augmentation is used to estimate the posterior probabilities of dose-limiting toxicity at each dose level based on both the fully observed and partially observed patient outcomes. A simulation study is used to assess the performance of the design relative to using the continual reassessment method on the true dose-limiting toxicity outcomes (available in simulation setting only) and relative to simple thresholding approaches. Results: Among the designs utilizing the partially observed outcomes, our proposed design has the best overall performance in terms of probability of selecting correct maximum tolerated dose and number of patients treated at the maximum tolerated dose. Conclusion: Incorporating uncertainty in dose-limiting toxicity assessment can improve the performance of the continual reassessment method design.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text