scholarly journals Optimality of Multichannel Myopic Sensing in the Presence of Sensing Error for Opportunistic Spectrum Access

2013 ◽  
Vol 2013 ◽  
pp. 1-12 ◽  
Author(s):  
Xiaofeng Jiang ◽  
Hongsheng Xi

The optimization problem for the performance of opportunistic spectrum access is considered in this study. A user, with the limited sensing capacity, has opportunistic access to a communication system with multiple channels. The user can only choose several channels to sense and decides whether to access these channels based on the sensing information in each time slot. Meanwhile, the presence of sensing error is considered. A reward is obtained when the user accesses a channel. The objective is to maximize the expected (discounted or average) reward accrued over an infinite horizon. This problem can be formulated as a partially observable Markov decision process. This study shows the optimality of the simple and robust myopic policy which focuses on maximizing the immediate reward. The results show that the myopic policy is optimal in the case of practical interest.

2014 ◽  
Vol 926-930 ◽  
pp. 2867-2870
Author(s):  
Yu Meng Wang ◽  
Liang Shen ◽  
Xiang Gao ◽  
Cheng Long Xu ◽  
Xiao Ya Li ◽  
...  

This paper studies the problem of distributed multiuser Opportunistic Spectrum Access based on Partially Observable Markov Decision Process (POMDP). Due to the similarity of spectrum environment, secondary users may choose the same channel adopting their own single user approach, which leads to collision. Referring to the previous works, we propose a more flexible and adaptive policy named “threshold-deciding”. Firstly, the SU gets a channel by adopting the random policy. Secondly, the SU decides whether to sense the channel by comparing the available probability with the given threshold. The policy not only decreases the collisions among SUs but also reduces the consumption of time and energy. The simulation results shows that the upgrade of performance is up to 100% compared with the existing random policy, which demonstrate the advantage of the proposed policy.


Author(s):  
Karel Horák ◽  
Branislav Bošanský ◽  
Krishnendu Chatterjee

Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.


2001 ◽  
Vol 15 ◽  
pp. 351-381 ◽  
Author(s):  
J. Baxter ◽  
P. L. Bartlett ◽  
L. Weaver

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.


2011 ◽  
Vol 10 (06) ◽  
pp. 1175-1197 ◽  
Author(s):  
JOHN GOULIONIS ◽  
D. STENGOS

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.


Sign in / Sign up

Export Citation Format

Share Document