Saliency Prediction on Omnidirectional Images with Brain-Like Shallow Neural Network

Author(s):  
Dandan Zhu ◽  
Yongqing Chen ◽  
Xiongkuo Min ◽  
Defang Zhao ◽  
Yucheng Zhu ◽  
...  
Author(s):  
Lai Jiang ◽  
Zhe Wang ◽  
Mai Xu ◽  
Zulin Wang

The transformed domain fearures of images show effectiveness in distinguishing salient and non-salient regions. In this paper, we propose a novel deep complex neural network, named SalDCNN, to predict image saliency by learning features in both pixel and transformed domains. Before proposing Sal-DCNN, we analyze the saliency cues encoded in discrete Fourier transform (DFT) domain. Consequently, we have the following findings: 1) the phase spectrum encodes most saliency cues; 2) a certain pattern of the amplitude spectrum is important for saliency prediction; 3) the transformed domain spectrum is robust to noise and down-sampling for saliency prediction. According to these findings, we develop the structure of SalDCNN, including two main stages: the complex dense encoder and three-stream multi-domain decoder. Given the new SalDCNN structure, the saliency maps can be predicted under the supervision of ground-truth fixation maps in both pixel and transformed domains. Finally, the experimental results show that our Sal-DCNN method outperforms other 8 state-of-theart methods for image saliency prediction on 3 databases.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 147743-147754 ◽  
Author(s):  
Zhenhao Sun ◽  
Xu Wang ◽  
Qiudan Zhang ◽  
Jianmin Jiang

Author(s):  
Dandan Zhu ◽  
Yongqing Chen ◽  
Defang Zhao ◽  
Qiangqiang Zhou ◽  
Xiaokang Yang

2020 ◽  
Vol 34 (07) ◽  
pp. 12410-12417 ◽  
Author(s):  
Xinyi Wu ◽  
Zhenyao Wu ◽  
Jinglin Zhang ◽  
Lili Ju ◽  
Song Wang

The performance of predicting human fixations in videos has been much enhanced with the help of development of the convolutional neural networks (CNN). In this paper, we propose a novel end-to-end neural network “SalSAC” for video saliency prediction, which uses the CNN-LSTM-Attention as the basic architecture and utilizes the information from both static and dynamic aspects. To better represent the static information of each frame, we first extract multi-level features of same size from different layers of the encoder CNN and calculate the corresponding multi-level attentions, then we randomly shuffle these attention maps among levels and multiply them to the extracted multi-level features respectively. Through this way, we leverage the attention consistency across different layers to improve the robustness of the network. On the dynamic aspect, we propose a correlation-based ConvLSTM to appropriately balance the influence of the current and preceding frames to the prediction. Experimental results on the DHF1K, Hollywood2 and UCF-sports datasets show that SalSAC outperforms many existing state-of-the-art methods.


2021 ◽  
Author(s):  
Dandan Zhu ◽  
Yongqing Chen ◽  
Xiongkuo Min ◽  
Yucheng Zhu ◽  
Guokai Zhang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document