A Sequential Assignment Match Process with General Renewal Arrival Times

1995 ◽  
Vol 9 (3) ◽  
pp. 475-492 ◽  
Author(s):  
Israel David

This work studies sequential assignment match processes, in which random offers arrive sequentially according to a renewal process, and when an offer arrives it must be assigned to one of given waiting candidates or rejected. Each candidate as well as each offer is characterized by an attribute. If the offer is assigned to a candidate that it matches, a reward R is received; if it is assigned to a candidate that it does not match, a reward r ≤ R is received; and if it is rejected, there is no reward. There is an arbitrary discount function, which corresponds to the process terminating after a random lifetime. Using continuoustime dynamic programming, we show that if this lifetime is decreasing in failure rate and candidates have distinct attributes, then the policy that maximizes total expected discounted reward is of a very simple form that is easily determined from the optimal single-candidate policy. If the lifetime is increasing in failure rate, the optimal policy can be recursively determined: a solution algorithm is presented that involves scalar rather than functional equations. The model originated in the study of optimal donor-recipient assignment in live-organ transplants. Some other applications are mentioned as well.

1987 ◽  
Vol 1 (2) ◽  
pp. 189-202 ◽  
Author(s):  
Rhonda Righter

Resources are to be allocated sequentially to activities to maximize the total expected return, where the return from an allocation is the product of the value of the resource and the value of the activity. The set of activities and their values are given ahead of time, but the resources arrive according to a Poisson process and their values are independent random variables that are observed upon arrival. It is assumed that either there is a single random deadline for all activities, which is the same as discounting the returns, or the activities have independent random deadlines. The model has applications machine scheduling, packet switching, and kidney allocation for transplant. It is known that the optimal policy in the discounted case has a very simple form that does not depend on the activity values. We show that this is also true when the deadlines are independent and in this case the solution can expressed in terms of solutions to single activity models. These results also hold when there are batch arrivals of resources. The effects of pooling separate identical systems with a single activity into a combined system is investigated for both models. When activities have independent deadlines it is optimal to reject a resource in the combined system if and only if it is optimal to reject it in the single activity system. However, when returns are discounted, it is sometimes optimal to accept a resource in the combined system that would be rejected in the single activity system.


2021 ◽  
Author(s):  
Yunfan Su

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.


2018 ◽  
Vol 11 (4) ◽  
pp. 1177-1190
Author(s):  
Pushpendra Semwal

In this paper we investigate the existence and uniqueness of common fixed point theorems for certain contractive type of mappings. As an application the existence and uniqueness of common solutions for a system of functional equations arising in dynamic programming are discuss by using the our results.


2021 ◽  
Author(s):  
Yunfan Su

Vehicular ad hoc network (VANET) is a promising technique that improves traffic safety and transportation efficiency and provides a comfortable driving experience. However, due to the rapid growth of applications that demand channel resources, efficient channel allocation schemes are required to utilize the performance of the vehicular networks. In this thesis, two Reinforcement learning (RL)-based channel allocation methods are proposed for a cognitive enabled VANET environment to maximize a long-term average system reward. First, we present a model-based dynamic programming method, which requires the calculations of the transition probabilities and time intervals between decision epochs. After obtaining the transition probabilities and time intervals, a relative value iteration (RVI) algorithm is used to find the asymptotically optimal policy. Then, we propose a model-free reinforcement learning method, in which we employ an agent to interact with the environment iteratively and learn from the feedback to approximate the optimal policy. Simulation results show that our reinforcement learning method can acquire a similar performance to that of the dynamic programming while both outperform the greedy method.


2011 ◽  
Vol 25 (4) ◽  
pp. 477-485 ◽  
Author(s):  
Rhonda Righter

We extend the classic sequential stochastic assignment problem to include arrivals of workers. When workers are all of the same type, we show that the socially optimal policy is the same as the individually optimal policy for which workers are given priority according to last come–first served. This result also holds under several variants in the model assumptions. When workers have different types, we show that the socially optimal policy is determined by thresholds such that more valuable jobs are given to more valuable workers, but now the individually optimal policy is no longer socially optimal. We also show that the overall value increases when worker or job values become more variable.


Sign in / Sign up

Export Citation Format

Share Document