scholarly journals Enhancing semantics with multi‐objective reinforcement learning for video description

2021 ◽  
Author(s):  
Qinyu Li ◽  
Longyu Yang ◽  
Pengjie Tang ◽  
Hanli Wang
Author(s):  
Stefan Schneider ◽  
Ramin Khalili ◽  
Adnan Manzoor ◽  
Haydar Qarawlus ◽  
Rafael Schellenberg ◽  
...  

Author(s):  
Akkhachai Phuphanin ◽  
Wipawee Usaha

Coverage control is crucial for the deployment of wireless sensor networks (WSNs). However, most coverage control schemes are based on single objective optimization such as coverage area only, which do not consider other contradicting objectives such as energy consumption, the number of working nodes, wasteful overlapping areas. This paper proposes on a Multi-Objective Optimization (MOO) coverage control called Scalarized Q Multi-Objective Reinforcement Learning (SQMORL). The two objectives are to achieve the maximize area coverage and to minimize the overlapping area to reduce energy consumption. Performance evaluation is conducted for both simulation and multi-agent lighting control testbed experiments. Simulation results show that SQMORL can obtain more efficient area coverage with fewer working nodes than other existing schemes.  The hardware testbed results show that SQMORL algorithm can find the optimal policy with good accuracy from the repeated runs.


2020 ◽  
Vol 50 (8) ◽  
pp. 2370-2383 ◽  
Author(s):  
Yao Qin ◽  
Hua Wang ◽  
Shanwen Yi ◽  
Xiaole Li ◽  
Linbo Zhai

Author(s):  
Tomohiro Yamaguchi ◽  
Shota Nagahama ◽  
Yoshihiro Ichikawa ◽  
Yoshimichi Honma ◽  
Keiki Takadama

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.


Sign in / Sign up

Export Citation Format

Share Document