Two-stage population based training method for deep reinforcement learning

This paper presents a novel and robust two-stage pursuit strategy for the incomplete-information impulsive space pursuit-evasion missions considering the J2 perturbation. The strategy firstly models the impulsive pursuit-evasion game problem into a far-distance rendezvous stage and a close-distance game stage according to the perception range of the evader. For the far-distance rendezvous stage, it is transformed into a rendezvous trajectory optimization problem and a new objective function is proposed to obtain the pursuit trajectory with the optimal terminal pursuit capability. For the close-distance game stage, a closed-loop pursuit approach is proposed using one of the reinforcement learning algorithms, i.e., the deep deterministic policy gradient algorithm, to solve and update the pursuit trajectory for the incomplete-information impulsive pursuit-evasion missions. The feasibility of this novel strategy and its robustness to different initial states of the pursuer and evader and to the evasion strategies are demonstrated for the sun-synchronous orbit pursuit-evasion game scenarios. The results of the Monte Carlo tests show that the successful pursuit ratio of the proposed method is over 91% for all the given scenarios.

Download Full-text

Postural Control of Two-Stage Inverted Pendulum Using Reinforcement Learning and Self-organizing Map

Adaptive and Natural Computing Algorithms - Lecture Notes in Computer Science ◽

10.1007/978-3-540-71629-7_81 ◽

2007 ◽

pp. 722-729

Author(s):

Jae-kang Lee ◽

Tae-seok Oh ◽

Yun-su Shin ◽

Tae-jun Yoon ◽

Il-hwan Kim

Keyword(s):

Reinforcement Learning ◽

Postural Control ◽

Inverted Pendulum ◽

Self Organizing Map ◽

Two Stage ◽

Self Organizing

Download Full-text

Dealing with Inaccurate Measures of Size in Two-Stage Probability Proportional to Size Sample Designs: Applications in African Household Surveys

Journal of Survey Statistics and Methodology ◽

10.1093/jssam/smaa020 ◽

2020 ◽

Author(s):

Graham Kalton ◽

Ismael Flores Cervantes ◽

Carlos Arieira ◽

Mike Kwanisai ◽

Elizabeth Radin ◽

...

Keyword(s):

Final Stage ◽

Population Based ◽

Equal Probability ◽

Household Surveys ◽

Sample Design ◽

African Countries ◽

Two Stage ◽

Fixed Sample ◽

Multi Stage ◽

Weighting Adjustments

Abstract The units at the early stages of multi-stage area samples are generally sampled with probabilities proportional to their estimated sizes (PPES). With such a design, an overall equal probability (EP) sample design would yield a constant number of final stage units from each final stage cluster if the measures of size used in the PPES selection at each sampling stage were directly proportional to the number of final stage units. However, there are often sizable relative differences between the measures of size used in the PPES selections and the number of final stage units. Two common approaches for dealing with these differences are: (1) to retain a self-weighting sample design, allowing the sample sizes to vary across the sampled primary sampling units (PSUs) and (2) to retain the fixed sample size in each PSU and to compensate for the unequal selection probabilities by weighting adjustments in the analyses. This article examines these alternative designs in the context of two-stage sampling in which PSUs are sampled with PPES at the first stage, and an equal probability sample of final stage units is selected from each sampled PSU at the second stage. Two-stage sample designs of this type are used for household surveys in many countries. The discussion is illustrated with data from the Population-based HIV Impact Assessment surveys that were conducted using this design in several African countries.

Download Full-text