Saliency Detection by Multilevel Deep Pyramid Model

Traditional salient object detection models are divided into several classes based on low-level features and contrast between pixels. In this paper, we propose a model based on a multilevel deep pyramid (MLDP), which involves fusing multiple features on different levels. Firstly, the MLDP uses the original image as the input for a VGG16 model to extract high-level features and form an initial saliency map. Next, the MLDP further extracts high-level features to form a saliency map based on a deep pyramid. Then, the MLDP obtains the salient map fused with superpixels by extracting low-level features. After that, the MLDP applies background noise filtering to the saliency map fused with superpixels in order to filter out the interference of background noise and form a saliency map based on the foreground. Lastly, the MLDP combines the saliency map fused with the superpixels with the saliency map based on the foreground, which results in the final saliency map. The MLDP is not limited to low-level features while it fuses multiple features and achieves good results when extracting salient targets. As can be seen in our experiment section, the MLDP is better than the other 7 state-of-the-art models across three different public saliency datasets. Therefore, the MLDP has superiority and wide applicability in extraction of salient targets.

Download Full-text

Event Detection in Sports Video Based on Generative-Discriminative Models

Computer Vision for Multimedia Applications ◽

10.4018/978-1-60960-024-2.ch009 ◽

2011 ◽

pp. 143-165

Author(s):

Guoliang Fan ◽

Yi Ding

Keyword(s):

Event Detection ◽

Semantic Analysis ◽

Building Blocks ◽

Semantic Space ◽

Sports Video ◽

Low Level ◽

Discriminative Models ◽

Video Mining ◽

High Level ◽

Different Levels

Semantic event detection is an active and interesting research topic in the field of video mining. The major challenge is the semantic gap between low-level features and high-level semantics. In this chapter, we will advance a new sports video mining framework where a hybrid generative-discriminative approach is used for event detection. Specifically, we propose a three-layer semantic space by which event detection is converted into two inter-related statistical inference procedures that involve semantic analysis at different levels. The first is to infer the mid-level semantic structures from the low-level visual features via generative models, which can serve as building blocks of high-level semantic analysis. The second is to detect high-level semantics from mid-level semantic structures using discriminative models, which are of direct interests to users. In this framework we can explicitly represent and detect semantics at different levels. The use of generative and discriminative approaches in two different stages is proved to be effective and appropriate for event detection in sports video. The experimental results from a set of American football video data demonstrate that the proposed framework offers promising results compared with traditional approaches.

Download Full-text

A Visual Saliency Detection Approach by Fusing Low-Level Priors With High-Level Priors

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2019070102 ◽

2019 ◽

Vol 9 (3) ◽

pp. 23-37

Author(s):

Monika Singh ◽

Anand Singh Singh Jalal ◽

Ruchira Manke ◽

Aamir Khan

Keyword(s):

Minimum Distance ◽

Saliency Detection ◽

Visual Saliency ◽

Research Area ◽

Experimental Results ◽

Low Level ◽

Detection Approach ◽

Art Methods ◽

High Level ◽

Visual Saliency Detection

Saliency detection has always been a challenging and interesting research area for researchers. The existing methodologies either focus on foreground regions or background regions of an image by computing low-level features. However, considering only low-level features did not produce worthy results. In this paper, low-level features, which are extracted using super pixels, are embodied with high-level priors. The background features are assumed as the low-level prior due to the similarity in the background areas and boundary of an image which are interconnected and have minimum distance in between them. High-level priors such as location, color, and semantic prior are incorporated with low-level prior to spotlight the salient area in the image. The experimental results illustrate that the proposed approach outperform the sate-of-the-art methods.

Download Full-text

Detection of Fog Involving Heavy Pollutants by Using the New Geostationary satellite Himawari-8

10.5194/egusphere-egu2020-3929 ◽

2020 ◽

Author(s):

Hongbin Wang ◽

Zhiwei Zhang ◽

Duanyang Liu

Keyword(s):

Satellite Observation ◽

Geostationary Satellite ◽

Japan Meteorological Agency ◽

Composite Image ◽

Brightness Temperatures ◽

The Difference ◽

Grid Points ◽

High Level ◽

Different Levels ◽

Better Than

Himawari-8 is the new geostationary satellite of the Japan Meteorological Agency (JMA) and carries the Advanced Himawari Imager (AHI), which is greatly improved over past imagers in terms of its number of bands and its temporal/spatial resolution. In this work, two different methods for the detection of the different levels of fog involving heavy pollutants by using the Himawari-8 were developed in China. The two different methods are the method of the difference between the 11.2 mm and 3.9 mm brightness temperatures (BTD3.9-11.2) and the method of 3.9 mm Pseudo-Emissivity (ems3.9).&#160; The 3.9 mm Pseudo-Emissivity is the ratio of the observed 3.9 mm radiance and the 3.9 mm blackbody radiance calculated using the 11.2 mm brightness temperature. We identified the parameters optimal threshold at the 2400 stations and the grid points using the BTD3.9-11.2 and ems3.9 for different levels of fog involving heavy pollutants. Results on land and sea from the two methods were compared with surface observations from 2400 weather stations in China and CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) VFM (Vertical Feature Mask) products. The results show that both the method of BTD3.9-11.2 and the method of ems3.9 can accurately identify the different levels of fog involving heavy pollutants and the accuracy of ems3.9 method is slightly better than the BTD3.9-11.2. The accuracy of two methods has increased significantly and the false alarm rate has significantly decreased with the decrease of the visibility. When the visibility is less than 50 m, the HR, FAR and KSS of the BTD3.9-11.2 method (the ems3.9 method) were 0.89 (0.90), 0.15 (0.15) and 0.74 (0.75), respectively. When mid- or high-level clouds were removed using surface temperature of the ground observations, the HR and KSS of two methods for the different levels of fog has increased significantly, and the FAR has significantly decreased. When the visibility is less than 1000 m, the HR of the BTD3.9-11.2 method (the ems3.9 method) is increased to 0.81(0.85) from 0.71 (0.74), the FAR is decreased to 0.12 (0.13) from 0.27 (0.28), and the KSS is increased to 0.69 (0.72) from 0.44 (0.46). The KSS of two method increase by 0.23 and 0.26, respectively. Three cases analysis show that the fog area can be clearly identified by using the BTD3.9-11.2, ems3.9 and RGB composite image. The results of the detection of sea fog by using Himawari-8 data and using CALIPSO VFM products have consistency.

Download Full-text

Backbone Cannot Be Trained at Once: Rolling Back to Pre-Trained Network for Person Re-Identification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018859 ◽

2019 ◽

Vol 33 ◽

pp. 8859-8867 ◽

Cited By ~ 4

Author(s):

Youngmin Ro ◽

Jongwon Choi ◽

Dae Ung Jo ◽

Byeongho Heo ◽

Jongin Lim ◽

...

Keyword(s):

Network Architecture ◽

State Of The Art ◽

Fine Tuning ◽

Neural Network Architecture ◽

Large Dataset ◽

Low Level ◽

Tuning Method ◽

Improved Performance ◽

High Level ◽

Tuning Strategy

In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently finetune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any addons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.

Download Full-text

Spatial variations in the socioeconomic development of rural municipalities in the podkarpackie voivodeship

Acta Scientiarum Polonorum Administratio Locorum ◽

10.31648/aspal.6562 ◽

2021 ◽

Vol 20 (2) ◽

Author(s):

Katarzyna Pawlewicz ◽

Justyna Flasińska

Keyword(s):

Socioeconomic Development ◽

Environmental Issues ◽

Development Pattern ◽

Class Iii ◽

Low Level ◽

Multidimensional Modeling ◽

Object Based ◽

Rural Municipalities ◽

High Level ◽

Different Levels

The main goal of all territorial administration units, including municipalities, is to promote socioeconomic development. The implemented actions address a broad range of economic, social, spatial and environmental issues. Therefore, socioeconomic development is a complex and multi-dimensional concept that is difficult to evaluate in an unambiguous and objective manner. Statistical methods in object-based multidimensional modeling support such evaluations by considering numerous attributes/variables, which increases the efficiency of the analytical process. In this article, Hellwig’s development pattern method was applied to classify rural municipalities in Podkarpackie Voivodeship based on their socioeconomic development. Twenty-seven indicators were designed for the needs of the analysis with the use of Statistics Poland data for 2018. Based on the results, the municipalities were grouped into four classes with different levels of socioeconomic development. Class III was the largest group, and it was composed of 39 municipalities with a medium-low level of socioeconomic development. Class II was composed of a similar number of municipalities (38) with a medium-high level of socioeconomic development. The smallest groups were Class I containing 18 municipalities with a high level of socioeconomic development, and class IV containing 14 municipalities with a low level of development.

Download Full-text

Stochastic Multiple Chaotic Local Search-Incorporated Gradient-Based Optimizer

Discrete Dynamics in Nature and Society ◽

10.1155/2021/3353926 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Hang Yu ◽

Yu Zhang ◽

Pengxing Cai ◽

Junyan Yi ◽

Sheng Li ◽

...

Keyword(s):

Local Search ◽

Optimization Problem ◽

Search Strategy ◽

State Of The Art ◽

Population Diversity ◽

Local Optima ◽

Gradient Based ◽

High Level ◽

Search Rule ◽

Better Than

In this study, a hybrid metaheuristic algorithm chaotic gradient-based optimizer (CGBO) is proposed. The gradient-based optimizer (GBO) is a novel metaheuristic inspired by Newton’s method which has two search strategies to ensure excellent performance. One is the gradient search rule (GSR), and the other is local escaping operation (LEO). GSR utilizes the gradient method to enhance ability of exploitation and convergence rate, and LEO employs random operators to escape the local optima. It is verified that gradient-based metaheuristic algorithms have obvious shortcomings in exploration. Meanwhile, chaotic local search (CLS) is an efficient search strategy with randomicity and ergodicity, which is usually used to improve global optimization algorithms. Accordingly, we incorporate GBO with CLS to strengthen the ability of exploration and keep high-level population diversity for original GBO. In this study, CGBO is tested with over 30 CEC2017 benchmark functions and a parameter optimization problem of the dendritic neuron model (DNM). Experimental results indicate that CGBO performs better than other state-of-the-art algorithms in terms of effectiveness and robustness.

Download Full-text

Hi-Fi: Hierarchical Feature Integration for Skeleton Detection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/166 ◽

2018 ◽

Cited By ~ 13

Author(s):

Kai Zhao ◽

Wei Shen ◽

Shanghua Gao ◽

Dandan Li ◽

Ming-Ming Cheng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Natural Images ◽

Feature Integration ◽

Detection Problem ◽

Multi Scale ◽

Integration Mechanism ◽

Object Parts ◽

High Level ◽

Different Levels

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.

Download Full-text

Dependency Exploitation: A Unified CNN-RNN Approach for Visual Emotion Recognition

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/503 ◽

2017 ◽

Cited By ~ 21

Author(s):

Xinge Zhu ◽

Liang Li ◽

Weigang Zhang ◽

Tianrong Rao ◽

Min Xu ◽

...

Keyword(s):

Neural Network ◽

Emotion Recognition ◽

Feature Fusion ◽

Feature Representation ◽

Low Level ◽

Learning Framework ◽

Independent Entity ◽

Internet Images ◽

High Level ◽

Different Levels

Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features with in a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a new bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.

Download Full-text

Electrophysiological signatures of hierarchical learning

10.1101/2021.03.09.434666 ◽

2021 ◽

Author(s):

Meng Liu ◽

Wenshan Dong ◽

Shaozheng Qin ◽

Tom Verguts ◽

Qi Chen

Keyword(s):

Cognitive Process ◽

Human Perception ◽

Learning Task ◽

Prediction Errors ◽

Neural Basis ◽

Hierarchical Learning ◽

Low Level ◽

High Level ◽

The Brain ◽

Better Than

AbstractHuman perception and learning is thought to rely on a hierarchical generative model that is continuously updated via precision-weighted prediction errors (pwPEs). However, the neural basis of such cognitive process and how it unfolds during decision making, remain poorly understood. To investigate this question, we combined a hierarchical Bayesian model (i.e., Hierarchical Gaussian Filter, HGF) with electrophysiological (EEG) recording, while participants performed a probabilistic reversal learning task in alternatingly stable and volatile environments. Behaviorally, the HGF fitted significantly better than two control, non-hierarchical, models. Neurally, low-level and high-level pwPEs were independently encoded by the P300 component. Low-level pwPEs were reflected in the theta (4-8 Hz) frequency band, but high-level pwPEs were not. Furthermore, the expressions of high-level pwPEs were stronger for participants with better HGF fit. These results indicate that the brain employs hierarchical learning, and encodes both low- and high-level learning signals separately and adaptively.

Download Full-text

Context-Aware Image Inpainting with Learned Semantic Priors

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/183 ◽

2021 ◽

Author(s):

Wendong Zhang ◽

Junwei Zhu ◽

Ying Tai ◽

Yunbo Wang ◽

Wenqing Chu ◽

...

Keyword(s):

State Of The Art ◽

Contextual Information ◽

Image Inpainting ◽

Context Aware ◽

Global Context ◽

Low Level ◽

Complex Scenes ◽

Knowledge Distillation ◽

Image Generator ◽

High Level

Recent advances in image inpainting have shown impressive results for generating plausible visual details on rather simple backgrounds. However, for complex scenes, it is still challenging to restore reasonable contents as the contextual information within the missing regions tends to be ambiguous. To tackle this problem, we introduce pretext tasks that are semantically meaningful to estimating the missing contents. In particular, we perform knowledge distillation on pretext models and adapt the features to image inpainting. The learned semantic priors ought to be partially invariant between the high-level pretext task and low-level image inpainting, which not only help to understand the global context but also provide structural guidance for the restoration of local textures. Based on the semantic priors, we further propose a context-aware image inpainting model, which adaptively integrates global semantics and local features in a unified image generator. The semantic learner and the image generator are trained in an end-to-end manner. We name the model SPL to highlight its ability to learn and leverage semantic priors. It achieves the state of the art on Places2, CelebA, and Paris StreetView datasets

Download Full-text