A Biologically Motivated, Proto-Object-Based Audiovisual Saliency Model

The natural environment and our interaction with it are essentially multisensory, where we may deploy visual, tactile and/or auditory senses to perceive, learn and interact with our environment. Our objective in this study is to develop a scene analysis algorithm using multisensory information, specifically vision and audio. We develop a proto-object-based audiovisual saliency map (AVSM) for the analysis of dynamic natural scenes. A specialized audiovisual camera with 360∘ field of view, capable of locating sound direction, is used to collect spatiotemporally aligned audiovisual data. We demonstrate that the performance of a proto-object-based audiovisual saliency map in detecting and localizing salient objects/events is in agreement with human judgment. In addition, the proto-object-based AVSM that we compute as a linear combination of visual and auditory feature conspicuity maps captures a higher number of valid salient events compared to unisensory saliency maps. Such an algorithm can be useful in surveillance, robotic navigation, video compression and related applications.

Download Full-text

Superpixel Generation by the Iterative Spanning Forest Using Object Information

10.5753/sibgrapi.est.2020.12979 ◽

2020 ◽

Author(s):

Felipe C. Belém ◽

Alexandre X. Falcão ◽

Silvio Jamil F. Guimarães

Keyword(s):

State Of The Art ◽

Saliency Map ◽

Lower Number ◽

Experimental Results ◽

Superpixel Segmentation ◽

Spanning Forest ◽

Saliency Maps ◽

Object Based ◽

Segmentation Methods ◽

Object Delineation

Superpixel segmentation methods aim to partition the image into homogeneous connected regions of pixels (i.e., superpixels) such that the union of its comprising superpixels precisely defines the objects of interest. However, the homogeneity criterion is often based solely on color, which, in certain conditions, might be insufficient for inferring the extension of the objects (e.g., low gradient regions). In this dissertation, we address such issue by incorporating prior object information — represented as monochromatic object saliency maps — into a state-of-the-art method, the Iterative Spanning Forest (ISF) framework, resulting in a novel framework named Object-based ISF (OISF). For a given saliency map, OISF-based methods are capable of increasing the superpixel resolution within the objects of interest, whilst permitting a higher adherence to the map’s borders, when color is insufficient for delineation. We compared our work with state-of-the-art methods, considering two classic superpixel segmentation metrics, in three datasets. Experimental results show that our approach presents effective object delineation with a significantly lower number of superpixels than the baselines, especially in terms of preventing superpixel leaking.

Download Full-text

Explaining Neural Networks Using Attentive Knowledge Distillation

Sensors ◽

10.3390/s21041280 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1280

Author(s):

Hyeonseok Lee ◽

Sungchan Kim

Keyword(s):

Neural Networks ◽

Model Prediction ◽

Saliency Map ◽

Model Parameters ◽

Learning Capability ◽

Fine Grained ◽

Network Layers ◽

Saliency Maps ◽

Novel Approach ◽

Knowledge Distillation

Explaining the prediction of deep neural networks makes the networks more understandable and trusted, leading to their use in various mission critical tasks. Recent progress in the learning capability of networks has primarily been due to the enormous number of model parameters, so that it is usually hard to interpret their operations, as opposed to classical white-box models. For this purpose, generating saliency maps is a popular approach to identify the important input features used for the model prediction. Existing explanation methods typically only use the output of the last convolution layer of the model to generate a saliency map, lacking the information included in intermediate layers. Thus, the corresponding explanations are coarse and result in limited accuracy. Although the accuracy can be improved by iteratively developing a saliency map, this is too time-consuming and is thus impractical. To address these problems, we proposed a novel approach to explain the model prediction by developing an attentive surrogate network using the knowledge distillation. The surrogate network aims to generate a fine-grained saliency map corresponding to the model prediction using meaningful regional information presented over all network layers. Experiments demonstrated that the saliency maps are the result of spatially attentive features learned from the distillation. Thus, they are useful for fine-grained classification tasks. Moreover, the proposed method runs at the rate of 24.3 frames per second, which is much faster than the existing methods by orders of magnitude.

Download Full-text

An Object-Based Image Reducing Approach

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1044-1045.1049 ◽

2014 ◽

Vol 1044-1045 ◽

pp. 1049-1052 ◽

Cited By ~ 1

Author(s):

Chin Chen Chang ◽

I Ta Lee ◽

Tsung Ta Ke ◽

Wen Kai Tai

Keyword(s):

Visual Saliency ◽

Saliency Map ◽

Input Image ◽

Image Size ◽

Target Image ◽

Feature Maps ◽

Object Based ◽

Wide Range

Common methods for reducing image size include scaling and cropping. However, these two approaches have some quality problems for reduced images. In this paper, we propose an image reducing algorithm by separating the main objects and the background. First, we extract two feature maps, namely, an enhanced visual saliency map and an improved gradient map from an input image. After that, we integrate these two feature maps to an importance map. Finally, we generate the target image using the importance map. The proposed approach can obtain desired results for a wide range of images.

Download Full-text

Bottom-up visual attention model for still image: a preliminary study

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i1.469 ◽

2020 ◽

Vol 6 (1) ◽

pp. 82

Author(s):

Adhi Prahara ◽

Murinto Murinto ◽

Dewi Pramudi Ismi

Keyword(s):

Visual Attention ◽

Object Detection ◽

Video Compression ◽

Saliency Map ◽

Bottom Up ◽

Attention Model ◽

Intrinsic Cues ◽

Preliminary Study ◽

Segmentation Image ◽

Human Visual Attention

The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.

Download Full-text

Object-based video compression scheme with optimal bit allocation among shape, motion and texture

Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429) ◽

10.1109/icip.2003.1247362 ◽

2004 ◽

Cited By ~ 4

Author(s):

Haohong Wang ◽

G.M. Schuster ◽

A.K. Katsaggelos

Keyword(s):

Video Compression ◽

Bit Allocation ◽

Compression Scheme ◽

Object Based

Download Full-text

Probabilistic Jacobian-Based Saliency Maps Attacks

Machine Learning and Knowledge Extraction ◽

10.3390/make2040030 ◽

2020 ◽

Vol 2 (4) ◽

pp. 558-578

Author(s):

Théo Combey ◽

António Loison ◽

Maxime Faucher ◽

Hatem Hajri

Keyword(s):

Neural Network ◽

Real Time ◽

Saliency Map ◽

Black Box ◽

Saliency Maps ◽

Trade Offs ◽

Good Trade ◽

Neural Network Classifiers

Neural network classifiers (NNCs) are known to be vulnerable to malicious adversarial perturbations of inputs including those modifying a small fraction of the input features named sparse or L0 attacks. Effective and fast L0 attacks, such as the widely used Jacobian-based Saliency Map Attack (JSMA) are practical to fool NNCs but also to improve their robustness. In this paper, we show that penalising saliency maps of JSMA by the output probabilities and the input features of the NNC leads to more powerful attack algorithms that better take into account each input’s characteristics. This leads us to introduce improved versions of JSMA, named Weighted JSMA (WJSMA) and Taylor JSMA (TJSMA), and demonstrate through a variety of white-box and black-box experiments on three different datasets (MNIST, CIFAR-10 and GTSRB), that they are both significantly faster and more efficient than the original targeted and non-targeted versions of JSMA. Experiments also demonstrate, in some cases, very competitive results of our attacks in comparison with the Carlini-Wagner (CW) L0 attack, while remaining, like JSMA, significantly faster (WJSMA and TJSMA are more than 50 times faster than CW L0 on CIFAR-10). Therefore, our new attacks provide good trade-offs between JSMA and CW for L0 real-time adversarial testing on datasets such as the ones previously cited.

Download Full-text

Blotch Detection in Archive Films Based on Visual Saliency Map

Complexity ◽

10.1155/2020/5965387 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Yildiz Aydin ◽

Bekir Dizdaroğlu

Keyword(s):

Computational Complexity ◽

Cultural Heritage ◽

False Alarm ◽

Detection Method ◽

Visual Saliency ◽

Saliency Map ◽

Saliency Maps ◽

Block Based ◽

The Given ◽

Blotch Detection

Degradations frequently occur in archive films that symbolize the historical and cultural heritage of a nation. In this study, the problem of detection blotches commonly encountered in archive films is handled. Here, a block-based blotch detection method is proposed based on a visual saliency map. The visual saliency map reveals prominent areas in an input frame and thus enables more accurate results in the blotch detection. A simple and effective visual saliency map method is taken into consideration in order to reduce computational complexity for the detection phase. After the visual saliency maps of the given frames are obtained, blotch regions are estimated by considered spatiotemporal patches—without the requirement for motion estimation—around the saliency pixels, which are subjected to a prethresholding process. Experimental results show that the proposed block-based blotch detection method provides a significant advantage with reducing false alarm rates over HOG feature (Yous and Serir, 2017), LBP feature (Yous and Serir, 2017), and regions-matching (Yous and Serir, 2016) methods presented in recent years.

Download Full-text

Object-based stereo video compression using fractals and shape-adaptive DCT

AEU - International Journal of Electronics and Communications ◽

10.1016/j.aeue.2014.02.011 ◽

2014 ◽

Vol 68 (7) ◽

pp. 687-697 ◽

Cited By ~ 14

Author(s):

Kamel Belloulata ◽

Amina Belalia ◽

Shiping Zhu

Keyword(s):

Video Compression ◽

Stereo Video ◽

Object Based

Download Full-text

A robust, scalable, object-based video compression technique for very low bit-rate coding

IEEE Transactions on Circuits and Systems for Video Technology ◽

10.1109/76.554433 ◽

1997 ◽

Vol 7 (1) ◽

pp. 221-233 ◽

Cited By ~ 31

Author(s):

R. Talluri ◽

K. Oehler ◽

T. Barmon ◽

J.D. Courtney ◽

A. Das ◽

...

Keyword(s):

Video Compression ◽

Bit Rate ◽

Compression Technique ◽

Low Bit Rate ◽

Object Based ◽

Rate Coding

Download Full-text

An Improved Boosting Learning Saliency Method for Built-Up Areas Extraction in Sentinel-2 Images

Remote Sensing ◽

10.3390/rs10121863 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1863 ◽

Cited By ~ 2

Author(s):

Zhenhui Sun ◽

Qingyan Meng ◽

Weifeng Zhai

Keyword(s):

Satellite Images ◽

Good Accuracy ◽

Particle Swarm Optimization Algorithm ◽

Saliency Map ◽

Band Combination ◽

Saliency Maps ◽

Training Samples ◽

Different Types ◽

Optical Satellite Images ◽

Sentinel 2

Built-up areas extraction from satellite images is an important aspect of urban planning and land use; however, this remains a challenging task when using optical satellite images. Existing methods may be limited because of the complex background. In this paper, an improved boosting learning saliency method for built-up area extraction from Sentinel-2 images is proposed. First, the optimal band combination for extracting such areas from Sentinel-2 data is determined; then, a coarse saliency map is generated, based on multiple cues and the geodesic weighted Bayesian (GWB) model, that provides training samples for a strong model; a refined saliency map is subsequently obtained using the strong model. Furthermore, cuboid cellular automata (CCA) is used to integrate multiscale saliency maps for improving the refined saliency map. Then, coarse and refined saliency maps are synthesized to create a final saliency map. Finally, the fractional-order Darwinian particle swarm optimization algorithm (FODPSO) is employed to extract the built-up areas from the final saliency result. Cities in five different types of ecosystems in China (desert, coastal, riverside, valley, and plain) are used to evaluate the proposed method. Analyses of results and comparative analyses with other methods suggest that the proposed method is robust, with good accuracy.

Download Full-text