greedy policy
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 16)

H-INDEX

5
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Seyed Ali Hosseini ◽  
Karim Salahshoor

Systems are continually subjected to faults or malfunctions because of age or sudden events, which might degrade the operation performance and even result in operation failure that is a quite important issue in safety-critical systems. Thus, this important problem is the main reason to use the Fault-Tolerant strategy to improve the system’s performance with the presence of faults. A fascinating property in Fault-Tolerant Controllers (FTCs) is adaptability to system changes as they evolve throughout system operations. In this paper, a Q-learning algorithm with a greedy policy was used to realize the FTC adaptability. Then, some fault scenarios are introduced in a Continuous Stirred Tank Heater (CSTH) to compare the closed-loop performance of the developed Q-learning-based FTC with concerning conventional PID controller and an RL-based FTC. The obtained results show the effectiveness of Q-learningbased FTC in different fault scenarios.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4245
Author(s):  
Yair Bar David ◽  
Tal Geller ◽  
Ilai Bistritz ◽  
Irad Ben-Gal ◽  
Nicholas Bambos ◽  
...  

Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring determines the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step, we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. Then, we empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) dataset of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ≈50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.


Sensors ◽  
2021 ◽  
Vol 21 (10) ◽  
pp. 3513
Author(s):  
Min Guo ◽  
Xing Huang ◽  
Wei Wang ◽  
Bing Liang ◽  
Yanbing Yang ◽  
...  

In the Industrial Internet, computing- and power-limited mobile devices (MDs) in the production process can hardly support the computation-intensive or time-sensitive applications. As a new computing paradigm, mobile edge computing (MEC) can almost meet the requirements of latency and calculation by handling tasks approximately close to MDs. However, the limited battery capacity of MDs causes unreliable task offloading in MEC, which will increase the system overhead and reduce the economic efficiency of manufacturing in actual production. To make the offloading scheme adaptive to that uncertain mobile environment, this paper considers the reliability of MDs, which is defined as residual energy after completing a computation task. In more detail, we first investigate the task offloading in MEC and also consider reliability as an important criterion. To optimize the system overhead caused by task offloading, we then construct the mathematical models for two different computing modes, namely, local computing and remote computing, and formulate task offloading as a mixed integer non-linear programming (MINLP) problem. To effectively solve the optimization problem, we further propose a heuristic algorithm based on greedy policy (HAGP). The algorithm achieves the optimal CPU cycle frequency for local computing and the optimal transmission power for remote computing by alternating optimization (AP) methods. It then makes the optimal offloading decision for each MD with a minimal system overhead in both of these two modes by the greedy policy under the limited wireless channels constraint. Finally, multiple experiments are simulated to verify the advantages of HAGP, and the results strongly confirm that the considered task offloading reliability of MDs can reduce the system overhead and further save energy consumption to prolong the life of the battery and support more computation tasks.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-23
Author(s):  
Jianxiong Guo ◽  
Weili Wu

Influence maximization problem attempts to find a small subset of nodes that makes the expected influence spread maximized, which has been researched intensively before. They all assumed that each user in the seed set we select is activated successfully and then spread the influence. However, in the real scenario, not all users in the seed set are willing to be an influencer. Based on that, we consider each user associated with a probability with which we can activate her as a seed, and we can attempt to activate her many times. In this article, we study the adaptive influence maximization with multiple activations (Adaptive-IMMA) problem, where we select a node in each iteration, observe whether she accepts to be a seed, if yes, wait to observe the influence diffusion process; if no, we can attempt to activate her again with a higher cost or select another node as a seed. We model the multiple activations mathematically and define it on the domain of integer lattice. We propose a new concept, adaptive dr-submodularity, and show our Adaptive-IMMA is the problem that maximizing an adaptive monotone and dr-submodular function under the expected knapsack constraint. Adaptive dr-submodular maximization problem is never covered by any existing studies. Thus, we summarize its properties and study its approximability comprehensively, which is a non-trivial generalization of existing analysis about adaptive submodularity. Besides, to overcome the difficulty to estimate the expected influence spread, we combine our adaptive greedy policy with sampling techniques without losing the approximation ratio but reducing the time complexity. Finally, we conduct experiments on several real datasets to evaluate the effectiveness and efficiency of our proposed policies.


Author(s):  
Yair Bar David ◽  
Tal Geller ◽  
Ilai Bistritz ◽  
Irad Ben-Gal ◽  
Nicholas Bambos ◽  
...  

Abstract: Wireless body area networks (WBANs) have strong potential in the field of health monitoring. However, the energy consumption required for accurate monitoring limits the time between battery charges of the wearable sensors, which is a key performance factor (and can be critical in the case of implantable devices). In this paper, we study the inherent trade-off between the power consumption of the sensors and the probability of misclassifying a patient’s health state. We formulate this trade-off as a dynamic problem, in which at each step we can choose to activate a subset of sensors that provide noisy measurements of the patient’s health state. We assume that the (unknown) health state follows a Markov chain, so our problem is formulated as a partially observable Markov decision problem (POMDP). We show that all the past measurements can be summarized as a belief state on the true health state of the patient, which allows tackling the POMDP problem as an MDP on the belief state. We then empirically study the performance of a greedy one-step look-ahead policy compared to the optimal policy obtained by solving the dynamic program. For that purpose, we use an open-source Continuous Glucose Monitoring (CGM) data set of 232 patients over six months and extract the transition matrix and sensor accuracies from the data. We find that the greedy policy saves ~50% of the energy costs while reducing the misclassification costs by less than 2% compared to the most accurate policy possible that always activates all sensors. Our sensitivity analysis reveals that the greedy policy remains nearly optimal across different cost parameters and a varying number of sensors. The results also have practical importance, because while the optimal policy is too complicated, a greedy one-step look-ahead policy can be easily implemented in WBAN systems.


Author(s):  
Maury Bramson ◽  
Bernardo D’Auria ◽  
Neil Walton

Consider a switched queueing network with general routing among its queues. The MaxWeight policy assigns available service by maximizing the objective function [Formula: see text] among the different feasible service options, where [Formula: see text] denotes queue size and [Formula: see text] denotes the amount of service to be executed at queue [Formula: see text]. MaxWeight is a greedy policy that does not depend on knowledge of arrival rates and is straightforward to implement. These properties and its simple formulation suggest MaxWeight as a serious candidate for implementation in the setting of switched queueing networks; MaxWeight has been extensively studied in the context of communication networks. However, a fluid model variant of MaxWeight was previously shown not to be maximally stable. Here, we prove that MaxWeight itself is not in general maximally stable. We also prove MaxWeight is maximally stable in a much more restrictive setting, and that a weighted version of MaxWeight, where the weighting depends on the traffic intensity, is always stable.


Author(s):  
Yulin Shao ◽  
Qi Cao ◽  
Soung Chang Liew ◽  
He Chen

2021 ◽  
Vol 1 (1) ◽  
Author(s):  
Xinglin Yu ◽  
Yuhu Wu ◽  
Xi-Ming Sun ◽  
Wenya Zhou

Abstract Balancing the exploration and exploitation in reinforcement learning is a commonly dilemma and time-wasting work. In this paper, a novel exploration policy used in Q-Learning, called Memory-greedy policy, is proposed to accelerate learning. By memory storage and playback, the probability of random action selecting can be effectively dealt with or reduced, which hence speeds up learning. The principle of this policy is analyzed by maze scene, and the theoretical convergence is given according to dynamic programming.


Sign in / Sign up

Export Citation Format

Share Document