markov decision problem Latest Research Papers

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Sensors ◽

10.3390/s21124245 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4245

Author(s):

Yair Bar David ◽

Tal Geller ◽

Ilai Bistritz ◽

Irad Ben-Gal ◽

Nicholas Bambos ◽

...

Keyword(s):

Health Monitoring ◽

Optimal Policy ◽

Belief State ◽

Health State ◽

Body Area ◽

Markov Decision Problem ◽

Trade Off ◽

Look Ahead ◽

One Step ◽

Greedy Policy

Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring determines the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step, we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. Then, we empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) dataset of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ≈50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.

A Markov chain model of cancer treatment

10.1101/2021.06.16.448669 ◽

2021 ◽

Author(s):

Peter Bayer ◽

Johan Dubbeldam ◽

Mark Broom

Keyword(s):

Markov Chain ◽

Decision Problem ◽

Markov Chain Model ◽

Chain Model ◽

Decision Tool ◽

Markov Decision Problem ◽

Life Years ◽

Markov Decision ◽

Number Of Treatment

This paper develops and analyzes a Markov chain model for the treatment of cancer. Cancer therapy is modeled as the patient's Markov Decision Problem, with the objective of maximizing the patient's discounted expected quality of life years. Patients choose the number of treatment rounds they wish to administer based on the progression of the disease as well as their own preferences. We obtain a powerful analytic decision tool by which patients may select their preferred treatment strategy. In a second model patients may make choices on the timing of treatment rounds as well. By delaying a round of therapy the patient forgoes the gains of therapy for a time in order to delay its side effects. We obtain an analytic tool that allows numerical approximations of the optimal times of delay.

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

10.20944/preprints202104.0470.v1 ◽

2021 ◽

Author(s):

Yair Bar David ◽

Tal Geller ◽

Ilai Bistritz ◽

Irad Ben-Gal ◽

Nicholas Bambos ◽

...

Keyword(s):

Health Monitoring ◽

Optimal Policy ◽

Belief State ◽

Health State ◽

Body Area ◽

Markov Decision Problem ◽

Trade Off ◽

Look Ahead ◽

One Step ◽

Greedy Policy

Abstract: Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring limits the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. We then empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) data set of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ~50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.

Anticipatory Troubleshooting

Applied Sciences ◽

10.3390/app11030995 ◽

2021 ◽

Vol 11 (3) ◽

pp. 995

Author(s):

Netanel Hasidi ◽

Meir Kalech

Keyword(s):

Decision Making ◽

Survival Analysis ◽

Decision Problem ◽

Fault Isolation ◽

Markov Decision Problem ◽

Intelligent Decision Making ◽

Intelligent Decision ◽

Markov Decision ◽

Faulty Component ◽

Over Time

Troubleshooting is the process of diagnosing and repairing a system that is behaving abnormally. It involves performing various diagnostic and repair actions. Performing these actions may incur costs, and traditional troubleshooting algorithms aim to minimize the costs incurred until the system is fixed. Prognosis deals with predicting future failures. We propose to incorporate prognosis and diagnosis techniques to solve troubleshooting problems. This integration enables (1) better fault isolation and (2) more intelligent decision making with respect to the repair actions to employ to minimize troubleshooting costs over time. In particular, we consider an anticipatory troubleshooting challenge in which we aim to minimize the costs incurred to fix the system over time, while reasoning about both current and future failures. Anticipatory troubleshooting raises two main dilemmas: the fix–replace dilemma and the replace-healthy dilemma. The fix–replace dilemma is the question of how to repair a faulty component: fixing it or replacing it with a new one. The replace-healthy dilemma is the question of whether a healthy component should be replaced with a new one in order to prevent it from failing in the future. We propose to solve these dilemmas by modeling them as a Markov decision problem and reasoning about future failures using techniques from the survival analysis literature. The resulting algorithm was evaluated experimentally, showing that the proposed anticipatory troubleshooting algorithms yield lower overall costs compared to troubleshooting algorithms that do not reason about future faults.

Route Planning: A

Probability in Electrical Engineering and Computer Science ◽

10.1007/978-3-030-49995-2_13 ◽

2021 ◽

pp. 243-257

Author(s):

Jean Walrand

Keyword(s):

Route Planning ◽

Stochastic Dynamic ◽

Planning Problem ◽

Travel Times ◽

Markov Decision Problem ◽

Markov Decision ◽

The Cost ◽

Main Ideas ◽

Real Time Information ◽

Time Information

AbstractThis chapter is concerned with making successive decisions in the presence of uncertainty. The decisions affect the cost at each step but also the “state” of the system. We start with a simple example: choosing a route with uncertain travel times. We then examine a more general model: controlling a Markov chain.Section 13.1 presents a model of route section when the travel times are random. Section 13.2 shows one formulation where one plans the trip long in advance. Section 13.3 explains how the problem changes if one is able to adjust the route based on real-time information. That section introduces the main ideas of stochastic dynamic programming. Section 13.4 discusses a generalization of the route planning problem: a Markov decision problem. Section 13.5 solves the problem when the horizon is infinite.

Planning with Hierarchical Temporal Memory for Deterministic Markov Decision Problem

Proceedings of the 13th International Conference on Agents and Artificial Intelligence ◽

10.5220/0010317710731081 ◽

2021 ◽

Author(s):

Petr Kuderov ◽

Aleksandr Panov

Keyword(s):

Decision Problem ◽

Markov Decision Problem ◽

Temporal Memory ◽

Hierarchical Temporal Memory ◽

Markov Decision

Computation offloading to edge cloud and dynamically resource-sharing collaborators in Internet of Things

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01865-4 ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Siqi Mu ◽

Zhangdui Zhong

Keyword(s):

Energy Consumption ◽

Internet Of Things ◽

Resource Sharing ◽

Optimization Theory ◽

Wireless Channel ◽

Computation Offloading ◽

Total Energy Consumption ◽

Markov Decision Problem ◽

Computing Paradigm ◽

Cooperative Computing

AbstractWith the diversity of the communication technology and the heterogeneity of the computation resources at network edge, both the edge cloud and peer devices (collaborators) can be scavenged to provide computation resources for the resource-limited Internet-of-Things (IoT) devices. In this paper, a novel cooperative computing paradigm is proposed, in which the computation resources of IoT device, opportunistically idle collaborators and dedicated edge cloud are fully exploited. Computation/offloading assistance is provided by collaborators at idle/busy states, respectively. Considering the channel randomness and opportunistic computation resource share of collaborators, we study the stochastic offloading control for an IoT device, regarding how much computation load is processed locally, offloaded to the edge cloud and a collaborator. The problem is formulated into a finite horizon Markov decision problem with the objective of minimizing the expected total energy consumption of the IoT device and the collaborator, subject to satisfying the hard computation deadline constraint. Optimal offloading policy is derived based on the stochastic optimization theory, which demonstrates that the energy consumption can be reduced by a proportional factor through the cooperative computing. More energy saving is achieved with better wireless channel condition or higher computation energy efficiency of collaborators. Simulation results validate the optimality of the proposed policy and the efficiency of the cooperative computing between end devices and edge cloud, compared to several other offloading schemes.

An Optimal Flow Admission and Routing Control Policy for Resource Constrained Networks

Sensors ◽

10.3390/s20226566 ◽

2020 ◽

Vol 20 (22) ◽

pp. 6566

Author(s):

Essia Hamouda

Keyword(s):

Optimal Policy ◽

Control Policy ◽

Wireless Devices ◽

Markov Decision Problem ◽

Large Size ◽

Resource Limited ◽

State Dependent ◽

Markov Decision ◽

Routing Control ◽

Network Device

Overloaded network devices are becoming an increasing problem especially in resource limited networks with the continuous and rapid increase of wireless devices and the huge volume of data generated. Admission and routing control policy at a network device can be used to balance the goals of maximizing throughput and ensuring sufficient resources for high priority flows. In this paper we formulate the admission and routing control problem of two types of flows where one has a higher priority than the other as a Markov decision problem. We characterize the optimal admission and routing policy, and show that it is a state-dependent threshold type policy. Furthermore, we conduct extensive numerical experiments to gain more insight into the behavior of the optimal policy under different systems’ parameters. While dynamic programming can be used to solve such problems, the large size of the state space makes it untractable and too resource intensive to run on wireless devices. Therefore, we propose a fast heuristic that exploits the structure of the optimal policy. We empirically show that the heuristic performs very well with an average reward deviation of 1.4% from the optimal while being orders of magnitude faster than the optimal policy. We further generalize the heuristic for the general case of a system with n (n>2) types of flows.

Hyperpath Truck Routing in an Online Freight Exchange Platform

Transportation Science ◽

10.1287/trsc.2020.0989 ◽

2020 ◽

Vol 54 (6) ◽

pp. 1676-1696 ◽

Cited By ~ 1

Author(s):

John Miller ◽

Yu (Marco) Nie ◽

Xiaobo Liu

Keyword(s):

Market Competition ◽

Expected Profit ◽

Real World Data ◽

Markov Decision Problem ◽

Routing Problem ◽

Truck Routing ◽

Bid Price ◽

Markov Decision ◽

The Impact ◽

Benchmark Solutions

Online freight exchange (OFEX) platforms serve the purpose of matching demand and supply for freight in real time. This paper studies a truck routing problem that aims to leverage the power of an OFEX platform. The OFEX routing problem is formulated as a Markov decision problem, which we solve by finding the bidding strategy at each possible location and time along the route that maximizes the expected profit. At the core of the OFEX routing problem is a combined pricing and bidding model that simultaneously (1) considers the probability of winning a load at a given bid price and current market competition, (2) anticipates the future profit corresponding to the current decision, and (3) prioritizes the bidding order among possible load options. Results from numerical experiments constructed using real-world data from a Chinese OFEX platform indicate that the proposed routing model could (1) improve a truck’s expected profit substantially, compared with the benchmark solutions built to represent the state of the practice, and (2) enhance the robustness of the overall profitability against the impact of market competition and spatial variations.

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

Mathematics of Operations Research ◽

10.1287/moor.2019.1000 ◽

2020 ◽

Vol 45 (2) ◽

pp. 517-546

Author(s):

Mengdi Wang

Keyword(s):

Linear Programming ◽

Decision Problem ◽

Markov Decision Problem ◽

Markov Decision

markov decision problem
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

A Markov chain model of cancer treatment

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Anticipatory Troubleshooting

Route Planning: A

Planning with Hierarchical Temporal Memory for Deterministic Markov Decision Problem

Computation offloading to edge cloud and dynamically resource-sharing collaborators in Internet of Things

An Optimal Flow Admission and Routing Control Policy for Resource Constrained Networks

Hyperpath Truck Routing in an Online Freight Exchange Platform

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

Export Citation Format

markov decision problemRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

A Markov chain model of cancer treatment

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Anticipatory Troubleshooting

Route Planning: A

Planning with Hierarchical Temporal Memory for Deterministic Markov Decision Problem

Computation offloading to edge cloud and dynamically resource-sharing collaborators in Internet of Things

An Optimal Flow Admission and Routing Control Policy for Resource Constrained Networks

Hyperpath Truck Routing in an Online Freight Exchange Platform

Randomized Linear Programming Solves the Markov Decision Problem in Nearly Linear (Sometimes Sublinear) Time

markov decision problem
Recently Published Documents