scholarly journals Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

Author(s):  
Arpita Biswas ◽  
Gaurav Aggarwal ◽  
Pradeep Varakantham ◽  
Milind Tambe

In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a few months. To avoid such disengagements, it is important to provide timely interventions. Such interventions are often expensive and can be provided to only a small fraction of the beneficiaries. We model this scenario as a restless multi-armed bandit (RMAB) problem, where each beneficiary is assumed to transition from one state to another depending on the intervention. Moreover, since the transition probabilities are unknown a priori, we propose a Whittle index based Q-Learning mechanism and show that it converges to the optimal solution. Our method improves over existing learning-based methods for RMABs on multiple benchmarks from literature and also on the maternal healthcare dataset.

2016 ◽  
Vol 31 (1) ◽  
pp. 59-76 ◽  
Author(s):  
Yann-Michaël de Hauwere ◽  
Sam Devlin ◽  
Daniel Kudenko ◽  
Ann Nowé

AbstractPotential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agentsa priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.


2006 ◽  
Vol 15 (05) ◽  
pp. 803-821 ◽  
Author(s):  
PING YAN ◽  
MINGYUE DING ◽  
CHANGWEN ZHENG

In this paper, the route-planning problems of Unmanned Aerial Vehicle (UAV) in uncertain and adversarial environment are addressed, including not only single-mission route planning in known a priori environment, but also the route replanning in partially known and mission-changeable environments. A mission-adaptable hybrid route-planning algorithm based on flight roadmap is proposed, which combines existing global and local methods (Dijkstra algorithm, SAS and D*) into a two-level framework. The environment information and constraints for UAV are integrated into the procedure of building flight roadmap and searching for routes. The route-planning algorithm utilizes domain-specific knowledge and operates in real time with near-optimal solution quality, which is important to uncertain and adversarial environment. Other planners do not provide all of the functionality, namely real-time planning and replanning, near-optimal solution quality, and the ability to model complex 3D constraints.


2021 ◽  
pp. 154-170
Author(s):  
Lachlan J. Gibson ◽  
Peter Jacko ◽  
Yoni Nazarathy
Keyword(s):  

2019 ◽  
Vol 44 (4) ◽  
pp. 251-266 ◽  
Author(s):  
Chunxi Tan ◽  
Ruijian Han ◽  
Rougang Ye ◽  
Kani Chen

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.


1997 ◽  
Vol 08 (05) ◽  
pp. 1013-1024 ◽  
Author(s):  
Moshe Sipper ◽  
Marco Tomassini

Cellular programming is a coevolutionary algorithm by which parallel cellular systems evolve to solve computational tasks. The evolving system is a massively parallel, locally interconnected grid of cells, where each cell operates according to a local interaction rule. If this rule is identical for all cells, the system is referred to as uniform, otherwise, it is non-uniform. This paper describes an experiment that addresses the following question: Employing a local coevolutionary process to solve a hard problem, known as density classification, can an optimal uniform solution be found? Since our approach involves the evolution of non-uniform CAs, where cellular rules are initially assigned at random, such convergence to uniformity cannot be a priori expected to easily emerge. The question is of both theoretical and practical interest. As for the latter, one major advantage of local evolutionary processes is their amenability to parallel implementation, using commercially available parallel machines or specialized hardware. Our experiment shows that when such local evolution is applied to the density problem, the optimal solution can be found.


Author(s):  
Leonid I. Perlovsky

This paper establishes close relationships between fundamental problems in the philosophical and mathematical theories of mind. It reviews the mathematical concepts of intelligence, including pattern recognition algorithms, neural networks and rule systems. Mathematical difficulties manifest as combinatorial complexity of algorithms are related to the roles of a priori knowledge and adaptive learning, the same issues that have shaped the two-thousand year old debate on the origins of the universal concepts of mind. Combining philosophical and mathematical analyses enables tracing current mathematical difficulties to the contradiction between Aristotelian logic and Aristotelian theory of mind (Forms). Aristotelian logic is shown to be the culprit for the current mathematical difficulties. I will also discuss connections to Gödel’s theorems. The conclusion is that fuzzy logic is a fundamental requirement for combining adaptivity and apriority. Relating the mathematical and philosophical helps clarifying both and helps analyzing future research directions of the mathematics of intelligence.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6567-6567 ◽  
Author(s):  
R. L. Comis ◽  
D. D. Colaizzi ◽  
J. D. Miller

6567 Background: A web based survey of attitudes and awareness of cancer survivors (Ca. surv) towards CCT was performed from 3–4, 2005. The survey was developed jointly by the Coalition of Cancer Cooperative Groups and Michigan State University (MSU) and executed by MSU and Knowledge Networks (KN). Methods: Ca surv. were obtained from a panel of 40,000 adults through KN based on a US household probablility sample who agree to weekly surveys in exchange for a free MSN box and ISP service. 2,029 panel members reported a cancer diagnosis (dx); 1,788/2,029 (88%) agreed to participate. Results: About 10% of Ca surv. are aware of CCT opportunities at the time of dx. 73% become aware through a physician (ASCO 2006: 6061). Ca surv. were asked to rank the most trusted sources of health care information from a list of 23 categories on a 0 (least) to 10 (most) scale. Physicians scored the highest (8.6) followed by information from the NCI (8.4) and reports from societies of cancer physicians/researchers (8.3). Although not significantly different from each other, all were significantly different from the other 20 sources (p<.05). CCT aware Ca surv. were asked whether the physician discouraged, was neutral or encouraged participation or made a little, moderate or great deal of effort to educate them and find a CCT. Enrollment (%) was directly related (p< 0.01) to the perceived physician involvement as follows: Encouragement: discouraged (0); neutral (16); encouraged (84); Educate: little (22); moderate (41%); great deal (64%); Find trial: little (23); moderate (39); great deal (82%). Of the 90% of Ca surv. who were not aware of CCT, 65% indicated that they would be somewhat or very receptive to enrollment if they had been made aware of an opportunity. Conclusion: Ca surv. are not CCT averse a priori. The physician is the most trusted, primary source of awareness and influence in decisions concerning CCT. Although there are myriad reported barriers to CCT participation, increased CCT participation hinges upon physician commitment and communication; conversely, a lack thereof may be the greatest barrier to increased CCT participation. No significant financial relationships to disclose.


Author(s):  
Vladimir Mikhailovich Levin ◽  
Ammar Abdulazez Yahya

The Bayesian classifier is a priori the optimal solution for minimizing the total error in problems of statistical pattern recognition. The article suggests using the classifier as a regular tool to increase the reliability of defect recognition in power oil-filled transformers based on the results of the analysis of gases dissolved in oil. The wide application of the Bayesian method for solving tasks of technical diagnostics of electrical equipment is limited by the problem of the multidimensional distribution of random parameters (features) and the nonlinearity of classification. The application of a generalized feature of a defect in the form of a nonlinear function of the transformer state parameters is proposed. This simultaneously reduces the dimension of the initial space of the controlled parameters and significantly improves the stochastic properties of the random distribution of the generalized feature. A special algorithm has been developed to perform statistical calculations and the procedure for recognizing the current technical condition of the transformer using the generated decision rule. The presented research results illustrate the possibility of the practical application of the developed method in the conditions of real operation of power transformers.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 145
Author(s):  
Hongdi Liu ◽  
Hongtao Zhang ◽  
Yuan He ◽  
Yong Sun

Modern adaptive radars can switch work modes to perform various missions and simultaneously use pulse parameter agility in each mode to improve survivability, which leads to a multiplicative increase in the decision-making complexity and declining performance of the existing jamming methods. In this paper, a two-level jamming decision-making framework is developed, based on which a dual Q-learning (DQL) model is proposed to optimize the jamming strategy and a dynamic method for jamming effectiveness evaluation is designed to update the model. Specifically, the jamming procedure is modeled as a finite Markov decision process. On this basis, the high-dimensional jamming action space is disassembled into two low-dimensional subspaces containing jamming mode and pulse parameters respectively, then two specialized Q-learning models with interaction are built to obtain the optimal solution. Moreover, the jamming effectiveness is evaluated through indicator vector distance measuring to acquire the feedback for the DQL model, where indicators are dynamically weighted to adapt to the environment. The experiments demonstrate the advantage of the proposed method in learning radar joint strategy of mode switching and parameter agility, shown as improving the average jamming-to-signal radio (JSR) by 4.05% while reducing the convergence time by 34.94% compared with the normal Q-learning method.


2014 ◽  
Vol 26 (4) ◽  
pp. 761-780 ◽  
Author(s):  
Guoqiang Zhong ◽  
Mohamed Cheriet

We present a supervised model for tensor dimensionality reduction, which is called large margin low rank tensor analysis (LMLRTA). In contrast to traditional vector representation-based dimensionality reduction methods, LMLRTA can take any order of tensors as input. And unlike previous tensor dimensionality reduction methods, which can learn only the low-dimensional embeddings with a priori specified dimensionality, LMLRTA can automatically and jointly learn the dimensionality and the low-dimensional representations from data. Moreover, LMLRTA delivers low rank projection matrices, while it encourages data of the same class to be close and of different classes to be separated by a large margin of distance in the low-dimensional tensor space. LMLRTA can be optimized using an iterative fixed-point continuation algorithm, which is guaranteed to converge to a local optimal solution of the optimization problem. We evaluate LMLRTA on an object recognition application, where the data are represented as 2D tensors, and a face recognition application, where the data are represented as 3D tensors. Experimental results show the superiority of LMLRTA over state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document