scholarly journals Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection

2020 ◽  
Vol 34 (07) ◽  
pp. 10869-10876 ◽  
Author(s):  
Yuchao Gu ◽  
Lijuan Wang ◽  
Ziqin Wang ◽  
Yun Liu ◽  
Ming-Ming Cheng ◽  
...  

Spatiotemporal information is essential for video salient object detection (VSOD) due to the highly attractive object motion for human's attention. Previous VSOD methods usually use Long Short-Term Memory (LSTM) or 3D ConvNet (C3D), which can only encode motion information through step-by-step propagation in the temporal domain. Recently, the non-local mechanism is proposed to capture long-range dependencies directly. However, it is not straightforward to apply the non-local mechanism into VSOD, because i) it fails to capture motion cues and tends to learn motion-independent global contexts; ii) its computation and memory costs are prohibitive for video dense prediction tasks such as VSOD. To address the above problems, we design a Constrained Self-Attention (CSA) operation to capture motion cues, based on the prior that objects always move in a continuous trajectory. We group a set of CSA operations in Pyramid structures (PCSA) to capture objects at various scales and speeds. Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Our code is available at https://github.com/guyuchao/PyramidCSA.

2021 ◽  
Vol 115 ◽  
pp. 103672
Author(s):  
Zhaoying Liu ◽  
Xuesi Zhang ◽  
Tianpeng Jiang ◽  
Ting Zhang ◽  
Bo Liu ◽  
...  

Author(s):  
Zhiming Luo ◽  
Akshaya Mishra ◽  
Andrew Achkar ◽  
Justin Eichel ◽  
Shaozi Li ◽  
...  

Author(s):  
M. N. Favorskaya ◽  
L. C. Jain

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.


Author(s):  
Zhengzheng Tu ◽  
Zhun Li ◽  
Chenglong Li ◽  
Yang Lang ◽  
Jin Tang

Sign in / Sign up

Export Citation Format

Share Document