greedy policy Latest Research Papers

A Q-Learning based Fault-Tolerant Controller with Application to CSTH System

10.5121/csit.2021.111605 ◽

2021 ◽

Author(s):

Seyed Ali Hosseini ◽

Karim Salahshoor

Keyword(s):

Pid Controller ◽

Closed Loop ◽

Fault Tolerant ◽

Learning Algorithm ◽

Critical Systems ◽

Operation Performance ◽

Q Learning ◽

Continuous Stirred Tank ◽

Safety Critical Systems ◽

Greedy Policy

Systems are continually subjected to faults or malfunctions because of age or sudden events, which might degrade the operation performance and even result in operation failure that is a quite important issue in safety-critical systems. Thus, this important problem is the main reason to use the Fault-Tolerant strategy to improve the system’s performance with the presence of faults. A fascinating property in Fault-Tolerant Controllers (FTCs) is adaptability to system changes as they evolve throughout system operations. In this paper, a Q-learning algorithm with a greedy policy was used to realize the FTC adaptability. Then, some fault scenarios are introduced in a Continuous Stirred Tank Heater (CSTH) to compare the closed-loop performance of the developed Q-learning-based FTC with concerning conventional PID controller and an RL-based FTC. The obtained results show the effectiveness of Q-learningbased FTC in different fault scenarios.

Download Full-text

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Sensors ◽

10.3390/s21124245 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4245

Author(s):

Yair Bar David ◽

Tal Geller ◽

Ilai Bistritz ◽

Irad Ben-Gal ◽

Nicholas Bambos ◽

...

Keyword(s):

Health Monitoring ◽

Optimal Policy ◽

Belief State ◽

Health State ◽

Body Area ◽

Markov Decision Problem ◽

Trade Off ◽

Look Ahead ◽

One Step ◽

Greedy Policy

Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring determines the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step, we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. Then, we empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) dataset of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ≈50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.

Download Full-text

Average Throughput Performance of Greedy Policy in Cognitive Radio Enabled Vehicular Networks

2021 29th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu53274.2021.9477926 ◽

2021 ◽

Author(s):

Omer Melih Gul

Keyword(s):

Cognitive Radio ◽

Vehicular Networks ◽

Average Throughput ◽

Throughput Performance ◽

Greedy Policy

Download Full-text

HLF-D*: An Approximate Greedy Policy for Age-based Information Freshness of Real-time Updates

ICC 2021 - IEEE International Conference on Communications ◽

10.1109/icc42927.2021.9500597 ◽

2021 ◽

Author(s):

Devarpita Sinha ◽

Rajarshi Roy

Keyword(s):

Real Time ◽

Greedy Policy

Download Full-text

HAGP: A Heuristic Algorithm Based on Greedy Policy for Task Offloading with Reliability of MDs in MEC of the Industrial Internet

Sensors ◽

10.3390/s21103513 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3513

Author(s):

Min Guo ◽

Xing Huang ◽

Wei Wang ◽

Bing Liang ◽

Yanbing Yang ◽

...

Keyword(s):

Heuristic Algorithm ◽

Residual Energy ◽

Mixed Integer ◽

Mobile Environment ◽

Cycle Frequency ◽

Industrial Internet ◽

Battery Capacity ◽

System Overhead ◽

Greedy Policy ◽

Task Offloading

In the Industrial Internet, computing- and power-limited mobile devices (MDs) in the production process can hardly support the computation-intensive or time-sensitive applications. As a new computing paradigm, mobile edge computing (MEC) can almost meet the requirements of latency and calculation by handling tasks approximately close to MDs. However, the limited battery capacity of MDs causes unreliable task offloading in MEC, which will increase the system overhead and reduce the economic efficiency of manufacturing in actual production. To make the offloading scheme adaptive to that uncertain mobile environment, this paper considers the reliability of MDs, which is defined as residual energy after completing a computation task. In more detail, we first investigate the task offloading in MEC and also consider reliability as an important criterion. To optimize the system overhead caused by task offloading, we then construct the mathematical models for two different computing modes, namely, local computing and remote computing, and formulate task offloading as a mixed integer non-linear programming (MINLP) problem. To effectively solve the optimization problem, we further propose a heuristic algorithm based on greedy policy (HAGP). The algorithm achieves the optimal CPU cycle frequency for local computing and the optimal transmission power for remote computing by alternating optimization (AP) methods. It then makes the optimal offloading decision for each MD with a minimal system overhead in both of these two modes by the greedy policy under the limited wireless channels constraint. Finally, multiple experiments are simulated to verify the advantages of HAGP, and the results strongly confirm that the considered task offloading reliability of MDs can reduce the system overhead and further save energy consumption to prolong the life of the battery and support more computation tasks.

Download Full-text

Adaptive Influence Maximization

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3447396 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-23

Author(s):

Jianxiong Guo ◽

Weili Wu

Keyword(s):

Seed Set ◽

Influence Maximization ◽

Maximization Problem ◽

Small Subset ◽

Influence Spread ◽

Knapsack Constraint ◽

Real Scenario ◽

Submodular Maximization ◽

Greedy Policy ◽

Influence Maximization Problem

Influence maximization problem attempts to find a small subset of nodes that makes the expected influence spread maximized, which has been researched intensively before. They all assumed that each user in the seed set we select is activated successfully and then spread the influence. However, in the real scenario, not all users in the seed set are willing to be an influencer. Based on that, we consider each user associated with a probability with which we can activate her as a seed, and we can attempt to activate her many times. In this article, we study the adaptive influence maximization with multiple activations (Adaptive-IMMA) problem, where we select a node in each iteration, observe whether she accepts to be a seed, if yes, wait to observe the influence diffusion process; if no, we can attempt to activate her again with a higher cost or select another node as a seed. We model the multiple activations mathematically and define it on the domain of integer lattice. We propose a new concept, adaptive dr-submodularity, and show our Adaptive-IMMA is the problem that maximizing an adaptive monotone and dr-submodular function under the expected knapsack constraint. Adaptive dr-submodular maximization problem is never covered by any existing studies. Thus, we summarize its properties and study its approximability comprehensively, which is a non-trivial generalization of existing analysis about adaptive submodularity. Besides, to overcome the difficulty to estimate the expected influence spread, we combine our adaptive greedy policy with sampling techniques without losing the approximation ratio but reducing the time complexity. Finally, we conduct experiments on several real datasets to evaluate the effectiveness and efficiency of our proposed policies.

Download Full-text

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

10.20944/preprints202104.0470.v1 ◽

2021 ◽

Author(s):

Yair Bar David ◽

Tal Geller ◽

Ilai Bistritz ◽

Irad Ben-Gal ◽

Nicholas Bambos ◽

...

Keyword(s):

Health Monitoring ◽

Optimal Policy ◽

Belief State ◽

Health State ◽

Body Area ◽

Markov Decision Problem ◽

Trade Off ◽

Look Ahead ◽

One Step ◽

Greedy Policy

Abstract: Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring limits the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. We then empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) data set of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ~50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.

Download Full-text

Stability and Instability of the MaxWeight Policy

Mathematics of Operations Research ◽

10.1287/moor.2020.1106 ◽

2021 ◽

Author(s):

Maury Bramson ◽

Bernardo D’Auria ◽

Neil Walton

Keyword(s):

Communication Networks ◽

Queueing Networks ◽

Fluid Model ◽

Queueing Network ◽

Traffic Intensity ◽

Weighted Version ◽

Simple Formulation ◽

Model Variant ◽

Stability And Instability ◽

Greedy Policy

Consider a switched queueing network with general routing among its queues. The MaxWeight policy assigns available service by maximizing the objective function [Formula: see text] among the different feasible service options, where [Formula: see text] denotes queue size and [Formula: see text] denotes the amount of service to be executed at queue [Formula: see text]. MaxWeight is a greedy policy that does not depend on knowledge of arrival rates and is straightforward to implement. These properties and its simple formulation suggest MaxWeight as a serious candidate for implementation in the setting of switched queueing networks; MaxWeight has been extensively studied in the context of communication networks. However, a fluid model variant of MaxWeight was previously shown not to be maximally stable. Here, we prove that MaxWeight itself is not in general maximally stable. We also prove MaxWeight is maximally stable in a much more restrictive setting, and that a weighted version of MaxWeight, where the weighting depends on the traffic intensity, is always stable.

Download Full-text

Partially Observable Minimum-Age Scheduling: The Greedy Policy

IEEE Transactions on Communications ◽

10.1109/tcomm.2021.3123362 ◽

2021 ◽

pp. 1-1

Author(s):

Yulin Shao ◽

Qi Cao ◽

Soung Chang Liew ◽

He Chen

Keyword(s):

Minimum Age ◽

Partially Observable ◽

Greedy Policy

Download Full-text

A Memory-Greedy Policy With Guaranteed Convergence for Accelerating Reinforcement Learning

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4049539 ◽

2021 ◽

Vol 1 (1) ◽

Author(s):

Xinglin Yu ◽

Yuhu Wu ◽

Xi-Ming Sun ◽

Wenya Zhou

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Memory Storage ◽

Exploration And Exploitation ◽

Q Learning ◽

Greedy Policy

Abstract Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.

Download Full-text

greedy policy
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Q-Learning based Fault-Tolerant Controller with Application to CSTH System

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Average Throughput Performance of Greedy Policy in Cognitive Radio Enabled Vehicular Networks

HLF-D*: An Approximate Greedy Policy for Age-based Information Freshness of Real-time Updates

HAGP: A Heuristic Algorithm Based on Greedy Policy for Task Offloading with Reliability of MDs in MEC of the Industrial Internet

Adaptive Influence Maximization

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Stability and Instability of the MaxWeight Policy

Partially Observable Minimum-Age Scheduling: The Greedy Policy

A Memory-Greedy Policy With Guaranteed Convergence for Accelerating Reinforcement Learning

Export Citation Format

greedy policyRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Q-Learning based Fault-Tolerant Controller with Application to CSTH System

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Average Throughput Performance of Greedy Policy in Cognitive Radio Enabled Vehicular Networks

HLF-D*: An Approximate Greedy Policy for Age-based Information Freshness of Real-time Updates

HAGP: A Heuristic Algorithm Based on Greedy Policy for Task Offloading with Reliability of MDs in MEC of the Industrial Internet

Adaptive Influence Maximization

Wireless Body Area Network Control Policies for Energy-Efficient Health Monitoring

Stability and Instability of the MaxWeight Policy

Partially Observable Minimum-Age Scheduling: The Greedy Policy

A Memory-Greedy Policy With Guaranteed Convergence for Accelerating Reinforcement Learning

greedy policy
Recently Published Documents