Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

2020 ◽  
Vol 50 (10) ◽  
pp. 3301-3317
Author(s):  
Diqi Chen ◽  
Yizhou Wang ◽  
Wen Gao
Author(s):  
Stefan Schneider ◽  
Ramin Khalili ◽  
Adnan Manzoor ◽  
Haydar Qarawlus ◽  
Rafael Schellenberg ◽  
...  

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 70223-70235 ◽  
Author(s):  
Zhen Zhang ◽  
Dongqing Wang ◽  
Dongbin Zhao ◽  
Qiaoni Han ◽  
Tingting Song

Author(s):  
STEFAN WIEGAND ◽  
CHRISTIAN IGEL ◽  
UWE HANDMANN

For face recognition from video streams speed and accuracy are vital aspects. The first decision whether a preprocessed image region represents a human face or not is often made by a feed-forward neural network (NN), e.g. in the Viisage-FaceFINDER® video surveillance system. We describe the optimisation of such a NN by a hybrid algorithm combining evolutionary multi-objective optimisation (EMO) and gradient-based learning. The evolved solutions perform considerably faster than an expert-designed architecture without loss of accuracy. We compare an EMO and a single objective approach, both with online search strategy adaptation. It turns out that EMO is preferable to the single objective approach in several respects.


Author(s):  
Akkhachai Phuphanin ◽  
Wipawee Usaha

Coverage control is crucial for the deployment of wireless sensor networks (WSNs). However, most coverage control schemes are based on single objective optimization such as coverage area only, which do not consider other contradicting objectives such as energy consumption, the number of working nodes, wasteful overlapping areas. This paper proposes on a Multi-Objective Optimization (MOO) coverage control called Scalarized Q Multi-Objective Reinforcement Learning (SQMORL). The two objectives are to achieve the maximize area coverage and to minimize the overlapping area to reduce energy consumption. Performance evaluation is conducted for both simulation and multi-agent lighting control testbed experiments. Simulation results show that SQMORL can obtain more efficient area coverage with fewer working nodes than other existing schemes.  The hardware testbed results show that SQMORL algorithm can find the optimal policy with good accuracy from the repeated runs.


Nanophotonics ◽  
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Sean Hooten ◽  
Raymond G. Beausoleil ◽  
Thomas Van Vaerenbergh

Abstract We present a proof-of-concept technique for the inverse design of electromagnetic devices motivated by the policy gradient method in reinforcement learning, named PHORCED (PHotonic Optimization using REINFORCE Criteria for Enhanced Design). This technique uses a probabilistic generative neural network interfaced with an electromagnetic solver to assist in the design of photonic devices, such as grating couplers. We show that PHORCED obtains better performing grating coupler designs than local gradient-based inverse design via the adjoint method, while potentially providing faster convergence over competing state-of-the-art generative methods. As a further example of the benefits of this method, we implement transfer learning with PHORCED, demonstrating that a neural network trained to optimize 8° grating couplers can then be re-trained on grating couplers with alternate scattering angles while requiring >10× fewer simulations than control cases.


Sign in / Sign up

Export Citation Format

Share Document