scholarly journals Throughput Maximization for full-duplex two-way relay with finite buffers

Author(s):  
Betene Anyugu Francis Lin

<div>Optimal queueing control of multi-hop networks remains a challenging problem, e</div><div>specially in two-way relaying systems, even in the most straightforward scenarios.</div><div>In this paper, we explore two-way relaying having a full-duplex decode-and-forward</div><div>relay with two fifinite buffers. Principally, we propose a novel concept based on the</div><div>multi-agent reinforcement learning (that maximizes the cumulative network through</div><div>put) based on the combination of the buffer states and the lossy links; a decision is</div><div>generated as to whether it can transmit, receive or even simultaneously receive and</div><div>transmit information. Towards this objective, chieflfly, based on the queue state transi</div><div>tion and the lossy links, an analytic Markov decision process is proposed to analyze</div><div>this scheme, and the throughput and queueing delay are derived. Our numerical results</div><div>reveal exciting insights. First, artifificial intelligence based on reinforcement learning</div><div>is optimal when the length of the buffer is superior to a certain threshold. Second, we</div><div>demonstrate that reinforcement learning can boost transmission effificiency and prevent</div><div>buffer overflflow.</div>

2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Author(s):  
Ranran Sun ◽  
Bin Yang ◽  
Siqi Ma ◽  
Yulong Shen ◽  
Xiaohong Jiang

Sign in / Sign up

Export Citation Format

Share Document