A Reinforcement Learning Algorithm for Data Collection in UAV-aided IoT Networks with Uncertain Time Windows

Cihan Tugrul Cicek
2017 ◽  
Vol 47 (6) ◽  
pp. 1367-1379 ◽  
Zhen Zhang ◽  
Dongbin Zhao ◽  
Junwei Gao ◽  
Dongqing Wang ◽  
Yujie Dai

Science ◽  
2018 ◽  
Vol 362 (6419) ◽  
pp. 1140-1144 ◽  
David Silver ◽  
Thomas Hubert ◽  
Julian Schrittwieser ◽  
Ioannis Antonoglou ◽  
Matthew Lai ◽  

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Jun Gao ◽  
Wei Bi ◽  
Xiaojiang Liu ◽  
Junhui Li ◽  
Shuming Shi

Neural generative models have become popular and achieved promising performance on short-text conversation tasks. They are generally trained to build a 1-to-1 mapping from the input post to its output response. However, a given post is often associated with multiple replies simultaneously in real applications. Previous research on this task mainly focuses on improving the relevance and informativeness of the top one generated response for each post. Very few works study generating multiple accurate and diverse responses for the same post. In this paper, we propose a novel response generation model, which considers a set of responses jointly and generates multiple diverse responses simultaneously. A reinforcement learning algorithm is designed to solve our model. Experiments on two short-text conversation tasks validate that the multiple responses generated by our model obtain higher quality and larger diversity compared with various state-ofthe-art generative models.

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 70223-70235 ◽  
Zhen Zhang ◽  
Dongqing Wang ◽  
Dongbin Zhao ◽  
Qiaoni Han ◽  
Tingting Song

Sign in / Sign up

Export Citation Format

Share Document