Action Classification from Egocentric Videos Using Reinforcement Learning-based Pose Estimation

Author(s):  
Shunya Ohaga ◽  
Ren Togo ◽  
Takahiro Ogawa ◽  
Miki Haseyama
Author(s):  
Alexander Krull ◽  
Eric Brachmann ◽  
Sebastian Nowozin ◽  
Frank Michel ◽  
Jamie Shotton ◽  
...  

Author(s):  
Frederik Nørby Rasmussen ◽  
Sebastian Terp Andersen ◽  
Bjarne Grossmann ◽  
Evangelos Boukas ◽  
Lazaros Nalpantidis

Author(s):  
Xun Wang ◽  
Yan Tian ◽  
Xuran Zhao ◽  
Tao Yang ◽  
Judith Gelernter ◽  
...  

2020 ◽  
Vol 34 (07) ◽  
pp. 10835-10844
Author(s):  
Erik Gärtner ◽  
Aleksis Pirinen ◽  
Cristian Sminchisescu

Most 3d human pose estimation methods assume that input – be it images of a scene collected from one or several viewpoints, or from a video – is given. Consequently, they focus on estimates leveraging prior knowledge and measurement by fusing information spatially and/or temporally, whenever available. In this paper we address the problem of an active observer with freedom to move and explore the scene spatially – in ‘time-freeze’ mode – and/or temporally, by selecting informative viewpoints that improve its estimation accuracy. Towards this end, we introduce Pose-DRL, a fully trainable deep reinforcement learning-based active pose estimation architecture which learns to select appropriate views, in space and time, to feed an underlying monocular pose estimator. We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines.


2021 ◽  
Vol 13 (19) ◽  
pp. 3995
Author(s):  
Zhen Fan ◽  
Xiu Li ◽  
Yipeng Li

Most multi-view based human pose estimation techniques assume the cameras are fixed. While in dynamic scenes, the cameras should be able to move and seek the best views to avoid occlusions and extract 3D information of the target collaboratively. In this paper, we address the problem of online view selection for a fixed number of cameras to estimate multi-person 3D poses actively. The proposed method exploits a distributed multi-agent based deep reinforcement learning framework, where each camera is modeled as an agent, to optimize the action of all the cameras. An inter-agent communication protocol was developed to transfer the cameras’ relative positions between agents for better collaboration. Experiments on the Panoptic dataset show that our method outperforms other view selection methods by a large margin given an identical number of cameras. To the best of our knowledge, our method is the first to address online active multi-view 3D pose estimation with multi-agent reinforcement learning.


Sign in / Sign up

Export Citation Format

Share Document