Reinforcement Learning
Recently Published Documents


(FIVE YEARS 15474)



2021 ◽  
Vol 932 ◽  
Jingran Qiu ◽  
Navid Mousavi ◽  
Kristian Gustavsson ◽  
Chunxiao Xu ◽  
Bernhard Mehlig ◽  

Marine micro-organisms must cope with complex flow patterns and even turbulence as they navigate the ocean. To survive they must avoid predation and find efficient energy sources. A major difficulty in analysing possible survival strategies is that the time series of environmental cues in nonlinear flow is complex and that it depends on the decisions taken by the organism. One way of determining and evaluating optimal strategies is reinforcement learning. In a proof-of-principle study, Colabrese et al. (Phys. Rev. Lett., vol. 118, 2017, 158004) used this method to find out how a micro-swimmer in a vortex flow can navigate towards the surface as quickly as possible, given a fixed swimming speed. The swimmer measured its instantaneous swimming direction and the local flow vorticity in the laboratory frame, and reacted to these cues by swimming either left, right, up or down. However, usually a motile micro-organism measures the local flow rather than global information, and it can only react in relation to the local flow because, in general, it cannot access global information (such as up or down in the laboratory frame). Here we analyse optimal strategies with local signals and actions that do not refer to the laboratory frame. We demonstrate that symmetry breaking is required to find such strategies. Using reinforcement learning, we analyse the emerging strategies for different sets of environmental cues that micro-organisms are known to measure.

2021 ◽  
Vol 2021 ◽  
pp. 1-7
Chen Wang ◽  
Xudong Li ◽  
Xiaolin Tao ◽  
Kai Ling ◽  
Quhui Liu ◽  

Navigation technology enables indoor robots to arrive at their destinations safely. Generally, the varieties of the interior environment contribute to the difficulty of robotic navigation and hurt their performance. This paper proposes a transfer navigation algorithm and improves its generalization by leveraging deep reinforcement learning and a self-attention module. To simulate the unfurnished indoor environment, we build the virtual indoor navigation (VIN) environment to compare our model and its competitors. In the VIN environment, our method outperforms other algorithms by adapting to an unseen indoor environment. The code of the proposed model and the virtual indoor navigation environment will be released.

2021 ◽  
Shicong Cen ◽  
Chen Cheng ◽  
Yuxin Chen ◽  
Yuting Wei ◽  
Yuejie Chi

Preconditioning and Regularization Enable Faster Reinforcement Learning Natural policy gradient (NPG) methods, in conjunction with entropy regularization to encourage exploration, are among the most popular policy optimization algorithms in contemporary reinforcement learning. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited. In “Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization”, Cen, Cheng, Chen, Wei, and Chi develop nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on tabular discounted Markov decision processes. Assuming access to exact policy evaluation, the authors demonstrate that the algorithm converges linearly at an astonishing rate that is independent of the dimension of the state-action space. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Accommodating a wide range of learning rates, this convergence result highlights the role of preconditioning and regularization in enabling fast convergence.

2022 ◽  
Vol 307 ◽  
pp. 118224
Zhichen Zeng ◽  
Dong Ni ◽  
Gang Xiao

Sign in / Sign up

Export Citation Format

Share Document