markov decision problem
Recently Published Documents


TOTAL DOCUMENTS

43
(FIVE YEARS 6)

H-INDEX

9
(FIVE YEARS 0)

Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4245
Author(s):  
Yair Bar David ◽  
Tal Geller ◽  
Ilai Bistritz ◽  
Irad Ben-Gal ◽  
Nicholas Bambos ◽  
...  

Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring determines the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step, we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. Then, we empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) dataset of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ≈50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.



2021 ◽  
Author(s):  
Peter Bayer ◽  
Johan Dubbeldam ◽  
Mark Broom

This paper develops and analyzes a Markov chain model for the treatment of cancer. Cancer therapy is modeled as the patient's Markov Decision Problem, with the objective of maximizing the patient's discounted expected quality of life years. Patients choose the number of treatment rounds they wish to administer based on the progression of the disease as well as their own preferences. We obtain a powerful analytic decision tool by which patients may select their preferred treatment strategy. In a second model patients may make choices on the timing of treatment rounds as well. By delaying a round of therapy the patient forgoes the gains of therapy for a time in order to delay its side effects. We obtain an analytic tool that allows numerical approximations of the optimal times of delay.



Author(s):  
Yair Bar David ◽  
Tal Geller ◽  
Ilai Bistritz ◽  
Irad Ben-Gal ◽  
Nicholas Bambos ◽  
...  

Abstract: Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring limits the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. We then empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) data set of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ~50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.



2021 ◽  
Vol 11 (3) ◽  
pp. 995
Author(s):  
Netanel Hasidi ◽  
Meir Kalech

Troubleshooting is the process of diagnosing and repairing a system that is behaving abnormally. It involves performing various diagnostic and repair actions. Performing these actions may incur costs, and traditional troubleshooting algorithms aim to minimize the costs incurred until the system is fixed. Prognosis deals with predicting future failures. We propose to incorporate prognosis and diagnosis techniques to solve troubleshooting problems. This integration enables (1) better fault isolation and (2) more intelligent decision making with respect to the repair actions to employ to minimize troubleshooting costs over time. In particular, we consider an anticipatory troubleshooting challenge in which we aim to minimize the costs incurred to fix the system over time, while reasoning about both current and future failures. Anticipatory troubleshooting raises two main dilemmas: the fix–replace dilemma and the replace-healthy dilemma. The fix–replace dilemma is the question of how to repair a faulty component: fixing it or replacing it with a new one. The replace-healthy dilemma is the question of whether a healthy component should be replaced with a new one in order to prevent it from failing in the future. We propose to solve these dilemmas by modeling them as a Markov decision problem and reasoning about future failures using techniques from the survival analysis literature. The resulting algorithm was evaluated experimentally, showing that the proposed anticipatory troubleshooting algorithms yield lower overall costs compared to troubleshooting algorithms that do not reason about future faults.



Author(s):  
Jean Walrand

AbstractThis chapter is concerned with making successive decisions in the presence of uncertainty. The decisions affect the cost at each step but also the “state” of the system. We start with a simple example: choosing a route with uncertain travel times. We then examine a more general model: controlling a Markov chain.Section 13.1 presents a model of route section when the travel times are random. Section 13.2 shows one formulation where one plans the trip long in advance. Section 13.3 explains how the problem changes if one is able to adjust the route based on real-time information. That section introduces the main ideas of stochastic dynamic programming. Section 13.4 discusses a generalization of the route planning problem: a Markov decision problem. Section 13.5 solves the problem when the horizon is infinite.



Author(s):  
Siqi Mu ◽  
Zhangdui Zhong

AbstractWith the diversity of the communication technology and the heterogeneity of the computation resources at network edge, both the edge cloud and peer devices (collaborators) can be scavenged to provide computation resources for the resource-limited Internet-of-Things (IoT) devices. In this paper, a novel cooperative computing paradigm is proposed, in which the computation resources of IoT device, opportunistically idle collaborators and dedicated edge cloud are fully exploited. Computation/offloading assistance is provided by collaborators at idle/busy states, respectively. Considering the channel randomness and opportunistic computation resource share of collaborators, we study the stochastic offloading control for an IoT device, regarding how much computation load is processed locally, offloaded to the edge cloud and a collaborator. The problem is formulated into a finite horizon Markov decision problem with the objective of minimizing the expected total energy consumption of the IoT device and the collaborator, subject to satisfying the hard computation deadline constraint. Optimal offloading policy is derived based on the stochastic optimization theory, which demonstrates that the energy consumption can be reduced by a proportional factor through the cooperative computing. More energy saving is achieved with better wireless channel condition or higher computation energy efficiency of collaborators. Simulation results validate the optimality of the proposed policy and the efficiency of the cooperative computing between end devices and edge cloud, compared to several other offloading schemes.



Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6566
Author(s):  
Essia Hamouda

Overloaded network devices are becoming an increasing problem especially in resource limited networks with the continuous and rapid increase of wireless devices and the huge volume of data generated. Admission and routing control policy at a network device can be used to balance the goals of maximizing throughput and ensuring sufficient resources for high priority flows. In this paper we formulate the admission and routing control problem of two types of flows where one has a higher priority than the other as a Markov decision problem. We characterize the optimal admission and routing policy, and show that it is a state-dependent threshold type policy. Furthermore, we conduct extensive numerical experiments to gain more insight into the behavior of the optimal policy under different systems’ parameters. While dynamic programming can be used to solve such problems, the large size of the state space makes it untractable and too resource intensive to run on wireless devices. Therefore, we propose a fast heuristic that exploits the structure of the optimal policy. We empirically show that the heuristic performs very well with an average reward deviation of 1.4% from the optimal while being orders of magnitude faster than the optimal policy. We further generalize the heuristic for the general case of a system with n (n>2) types of flows.



2020 ◽  
Vol 54 (6) ◽  
pp. 1676-1696 ◽  
Author(s):  
John Miller ◽  
Yu (Marco) Nie ◽  
Xiaobo Liu

Online freight exchange (OFEX) platforms serve the purpose of matching demand and supply for freight in real time. This paper studies a truck routing problem that aims to leverage the power of an OFEX platform. The OFEX routing problem is formulated as a Markov decision problem, which we solve by finding the bidding strategy at each possible location and time along the route that maximizes the expected profit. At the core of the OFEX routing problem is a combined pricing and bidding model that simultaneously (1) considers the probability of winning a load at a given bid price and current market competition, (2) anticipates the future profit corresponding to the current decision, and (3) prioritizes the bidding order among possible load options. Results from numerical experiments constructed using real-world data from a Chinese OFEX platform indicate that the proposed routing model could (1) improve a truck’s expected profit substantially, compared with the benchmark solutions built to represent the state of the practice, and (2) enhance the robustness of the overall profitability against the impact of market competition and spatial variations.



Sign in / Sign up

Export Citation Format

Share Document