Two-stage population based training method for deep reinforcement learning

Author(s):  
Yinda Zhou ◽  
Weiming Liu ◽  
Bin Li
2021 ◽  
Vol 6 (2) ◽  
pp. 1950-1957
Author(s):  
Zhe Hu ◽  
Yu Zheng ◽  
Jia Pan

2021 ◽  
Vol 32 (2) ◽  
Author(s):  
Amir Erfan Eshratifar ◽  
David Eigen ◽  
Michael Gormish ◽  
Massoud Pedram

Aerospace ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 299
Author(s):  
Bin Yang ◽  
Pengxuan Liu ◽  
Jinglang Feng ◽  
Shuang Li

This paper presents a novel and robust two-stage pursuit strategy for the incomplete-information impulsive space pursuit-evasion missions considering the J2 perturbation. The strategy firstly models the impulsive pursuit-evasion game problem into a far-distance rendezvous stage and a close-distance game stage according to the perception range of the evader. For the far-distance rendezvous stage, it is transformed into a rendezvous trajectory optimization problem and a new objective function is proposed to obtain the pursuit trajectory with the optimal terminal pursuit capability. For the close-distance game stage, a closed-loop pursuit approach is proposed using one of the reinforcement learning algorithms, i.e., the deep deterministic policy gradient algorithm, to solve and update the pursuit trajectory for the incomplete-information impulsive pursuit-evasion missions. The feasibility of this novel strategy and its robustness to different initial states of the pursuer and evader and to the evasion strategies are demonstrated for the sun-synchronous orbit pursuit-evasion game scenarios. The results of the Monte Carlo tests show that the successful pursuit ratio of the proposed method is over 91% for all the given scenarios.


Author(s):  
Graham Kalton ◽  
Ismael Flores Cervantes ◽  
Carlos Arieira ◽  
Mike Kwanisai ◽  
Elizabeth Radin ◽  
...  

Abstract The units at the early stages of multi-stage area samples are generally sampled with probabilities proportional to their estimated sizes (PPES). With such a design, an overall equal probability (EP) sample design would yield a constant number of final stage units from each final stage cluster if the measures of size used in the PPES selection at each sampling stage were directly proportional to the number of final stage units. However, there are often sizable relative differences between the measures of size used in the PPES selections and the number of final stage units. Two common approaches for dealing with these differences are: (1) to retain a self-weighting sample design, allowing the sample sizes to vary across the sampled primary sampling units (PSUs) and (2) to retain the fixed sample size in each PSU and to compensate for the unequal selection probabilities by weighting adjustments in the analyses. This article examines these alternative designs in the context of two-stage sampling in which PSUs are sampled with PPES at the first stage, and an equal probability sample of final stage units is selected from each sampled PSU at the second stage. Two-stage sample designs of this type are used for household surveys in many countries. The discussion is illustrated with data from the Population-based HIV Impact Assessment surveys that were conducted using this design in several African countries.


2011 ◽  
Vol 5 (5) ◽  
pp. 644-651 ◽  
Author(s):  
T. Jiang ◽  
D. Grace ◽  
Y. Liu

Sign in / Sign up

Export Citation Format

Share Document