Generating description with multi-feature and saliency maps of image

Author(s):  
Lisha Liu ◽  
Chunna Tian ◽  
Ruiguo Zhang ◽  
Yuxuan Ding
Keyword(s):  
Author(s):  
Shafin Rahman ◽  
Sejuti Rahman ◽  
Omar Shahid ◽  
Md. Tahmeed Abdullah ◽  
Jubair Ahmed Sourov

2021 ◽  
Vol 11 (16) ◽  
pp. 7217
Author(s):  
Cristina Luna-Jiménez ◽  
Jorge Cristóbal-Martín ◽  
Ricardo Kleinlein ◽  
Manuel Gil-Martín ◽  
José M. Moya ◽  
...  

Spatial Transformer Networks are considered a powerful algorithm to learn the main areas of an image, but still, they could be more efficient by receiving images with embedded expert knowledge. This paper aims to improve the performance of conventional Spatial Transformers when applied to Facial Expression Recognition. Based on the Spatial Transformers’ capacity of spatial manipulation within networks, we propose different extensions to these models where effective attentional regions are captured employing facial landmarks or facial visual saliency maps. This specific attentional information is then hardcoded to guide the Spatial Transformers to learn the spatial transformations that best fit the proposed regions for better recognition results. For this study, we use two datasets: AffectNet and FER-2013. For AffectNet, we achieve a 0.35% point absolute improvement relative to the traditional Spatial Transformer, whereas for FER-2013, our solution gets an increase of 1.49% when models are fine-tuned with the Affectnet pre-trained weights.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1280
Author(s):  
Hyeonseok Lee ◽  
Sungchan Kim

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 970
Author(s):  
Miguel Ángel Martínez-Domingo ◽  
Juan Luis Nieves ◽  
Eva M. Valero

Saliency prediction is a very important and challenging task within the computer vision community. Many models exist that try to predict the salient regions on a scene from its RGB image values. Several new models are developed, and spectral imaging techniques may potentially overcome the limitations found when using RGB images. However, the experimental study of such models based on spectral images is difficult because of the lack of available data to work with. This article presents the first eight-channel multispectral image database of outdoor urban scenes together with their gaze data recorded using an eyetracker over several observers performing different visualization tasks. Besides, the information from this database is used to study whether the complexity of the images has an impact on the saliency maps retrieved from the observers. Results show that more complex images do not correlate with higher differences in the saliency maps obtained.


2013 ◽  
Vol 765-767 ◽  
pp. 1401-1405
Author(s):  
Chi Zhang ◽  
Wei Qiang Wang

Object-level saliency detection is an important branch of visual saliency. In this paper, we propose a novel method which can conduct object-level saliency detection in both images and videos in a unified way. We employ a more effective spatial compactness assumption to measure saliency instead of the popular contrast assumption. In addition, we present a combination framework which integrates multiple saliency maps generated in different feature maps. The proposed algorithm can automatically select saliency maps of high quality according to the quality evaluation score we define. The experimental results demonstrate that the proposed method outperforms all state-of-the-art methods on both of the datasets of still images and video sequences.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3377 ◽  
Author(s):  
Jifang Pei ◽  
Yulin Huang ◽  
Weibo Huo ◽  
Yuxuan Miao ◽  
Yin Zhang ◽  
...  

Finding out interested targets from synthetic aperture radar (SAR) imagery is an attractive but challenging problem in SAR application. Traditional target detection is independent on SAR imaging process, which is purposeless and unnecessary. Hence, a new SAR processing approach for simultaneous target detection and image formation is proposed in this paper. This approach is based on SAR imagery formation in time domain and human visual saliency detection. First, a series of sub-aperture SAR images with resolutions from low to high are generated by the time domain SAR imaging method. Then, those multiresolution SAR images are detected by the visual saliency processing, and the corresponding intermediate saliency maps are obtained. The saliency maps are accumulated until the result with a sufficient confidence level. After some screening operations, the target regions on the imaging scene are located, and only these regions are focused with full aperture integration. Finally, we can get the SAR imagery with high-resolution detected target regions but low-resolution clutter background. Experimental results have shown the superiority of the proposed approach for simultaneous target detection and image formation.


Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1365
Author(s):  
Bogdan Muşat ◽  
Răzvan Andonie

Convolutional neural networks utilize a hierarchy of neural network layers. The statistical aspects of information concentration in successive layers can bring an insight into the feature abstraction process. We analyze the saliency maps of these layers from the perspective of semiotics, also known as the study of signs and sign-using behavior. In computational semiotics, this aggregation operation (known as superization) is accompanied by a decrease of spatial entropy: signs are aggregated into supersign. Using spatial entropy, we compute the information content of the saliency maps and study the superization processes which take place between successive layers of the network. In our experiments, we visualize the superization process and show how the obtained knowledge can be used to explain the neural decision model. In addition, we attempt to optimize the architecture of the neural model employing a semiotic greedy technique. To the extent of our knowledge, this is the first application of computational semiotics in the analysis and interpretation of deep neural networks.


Sign in / Sign up

Export Citation Format

Share Document