scholarly journals A Biologically Motivated, Proto-Object-Based Audiovisual Saliency Model

AI ◽  
2020 ◽  
Vol 1 (4) ◽  
pp. 487-509
Author(s):  
Sudarshan Ramenahalli

The natural environment and our interaction with it are essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object-based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with 360∘ field of view, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of a proto-object-based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object-based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.

2020 ◽  
Author(s):  
Felipe C. Belém ◽  
Alexandre X. Falcão ◽  
Silvio Jamil F. Guimarães

Superpixel segmentation methods aim to partition the image into homogeneous connected regions of pixels (i.e., superpixels) such that the union of its comprising superpixels precisely defines the objects of interest. However, the homogeneity criterion is often based solely on color, which, in certain conditions, might be insufficient for inferring the extension of the objects (e.g., low gradient regions). In this dissertation, we address such issue by incorporating prior object information — represented as monochromatic object saliency maps — into a state-of-the-art method, the Iterative Spanning Forest (ISF) framework, resulting in a novel framework named Object-based ISF (OISF). For a given saliency map, OISF-based methods are capable of increasing the superpixel resolution within the objects of interest, whilst permitting a higher adherence to the map’s borders, when color is insufficient for delineation. We compared our work with state-of-the-art methods, considering two classic superpixel segmentation metrics, in three datasets. Experimental results show that our approach presents effective object delineation with a significantly lower number of superpixels than the baselines, especially in terms of preventing superpixel leaking.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1280
Author(s):  
Hyeonseok Lee ◽  
Sungchan Kim

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.


2014 ◽  
Vol 1044-1045 ◽  
pp. 1049-1052 ◽  
Author(s):  
Chin Chen Chang ◽  
I Ta Lee ◽  
Tsung Ta Ke ◽  
Wen Kai Tai

Common methods for reducing image size include scaling and cropping. However, these two approaches have some quality problems for reduced images. In this paper, we propose an image reducing algorithm by separating the main objects and the background. First, we extract two feature maps, namely, an enhanced visual saliency map and an improved gradient map from an input image. After that, we integrate these two feature maps to an importance map. Finally, we generate the target image using the importance map. The proposed approach can obtain desired results for a wide range of images.


Author(s):  
Adhi Prahara ◽  
Murinto Murinto ◽  
Dewi Pramudi Ismi

The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.


2020 ◽  
Vol 2 (4) ◽  
pp. 558-578
Author(s):  
Théo Combey ◽  
António Loison ◽  
Maxime Faucher ◽  
Hatem Hajri

Neural network classifiers (NNCs) are known to be vulnerable to malicious adversarial perturbations of inputs including those modifying a small fraction of the input features named sparse or L0 attacks. Effective and fast L0 attacks, such as the widely used Jacobian-based Saliency Map Attack (JSMA) are practical to fool NNCs but also to improve their robustness. In this paper, we show that penalising saliency maps of JSMA by the output probabilities and the input features of the NNC leads to more powerful attack algorithms that better take into account each input’s characteristics. This leads us to introduce improved versions of JSMA, named Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA), and demonstrate through a variety of white-box and black-box experiments on three different datasets (MNIST, CIFAR-10 and GTSRB), that they are both significantly faster and more efficient than the original targeted and non-targeted versions of JSMA. Experiments also demonstrate, in some cases, very competitive results of our attacks in comparison with the Carlini-Wagner (CW) L0 attack, while remaining, like JSMA, significantly faster (WJSMA and TJSMA are more than 50 times faster than CW L0 on CIFAR-10). Therefore, our new attacks provide good trade-offs between JSMA and CW for L0 real-time adversarial testing on datasets such as the ones previously cited.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Yildiz Aydin ◽  
Bekir Dizdaroğlu

Degradations frequently occur in archive films that symbolize the historical and cultural heritage of a nation. In this study, the problem of detection blotches commonly encountered in archive films is handled. Here, a block-based blotch detection method is proposed based on a visual saliency map. The visual saliency map reveals prominent areas in an input frame and thus enables more accurate results in the blotch detection. A simple and effective visual saliency map method is taken into consideration in order to reduce computational complexity for the detection phase. After the visual saliency maps of the given frames are obtained, blotch regions are estimated by considered spatiotemporal patches—without the requirement for motion estimation—around the saliency pixels, which are subjected to a prethresholding process. Experimental results show that the proposed block-based blotch detection method provides a significant advantage with reducing false alarm rates over HOG feature (Yous and Serir, 2017), LBP feature (Yous and Serir, 2017), and regions-matching (Yous and Serir, 2016) methods presented in recent years.


1997 ◽  
Vol 7 (1) ◽  
pp. 221-233 ◽  
Author(s):  
R. Talluri ◽  
K. Oehler ◽  
T. Barmon ◽  
J.D. Courtney ◽  
A. Das ◽  
...  

2018 ◽  
Vol 10 (12) ◽  
pp. 1863 ◽  
Author(s):  
Zhenhui Sun ◽  
Qingyan Meng ◽  
Weifeng Zhai

Built-up areas extraction from satellite images is an important aspect of urban planning and land use; however, this remains a challenging task when using optical satellite images. Existing methods may be limited because of the complex background. In this paper, an improved boosting learning saliency method for built-up area extraction from Sentinel-2 images is proposed. First, the optimal band combination for extracting such areas from Sentinel-2 data is determined; then, a coarse saliency map is generated, based on multiple cues and the geodesic weighted Bayesian (GWB) model, that provides training samples for a strong model; a refined saliency map is subsequently obtained using the strong model. Furthermore, cuboid cellular automata (CCA) is used to integrate multiscale saliency maps for improving the refined saliency map. Then, coarse and refined saliency maps are synthesized to create a final saliency map. Finally, the fractional-order Darwinian particle swarm optimization algorithm (FODPSO) is employed to extract the built-up areas from the final saliency result. Cities in five different types of ecosystems in China (desert, coastal, riverside, valley, and plain) are used to evaluate the proposed method. Analyses of results and comparative analyses with other methods suggest that the proposed method is robust, with good accuracy.


Sign in / Sign up

Export Citation Format

Share Document