An Integrated Saliency Model with Guidance of Eye Movement in Natural Scene Classification

Nature Scene classification is a fundamental problem in image understanding. Human can recognize the scene instantly after only a glance. This is mainly because that our visual attention is easily attracted by the salient objects in scene. And these objects are always representative in the natural scene. It is unclear how humans achieve rapid scene categorization. But this kind of high-level cognitive behavior can be reflected by the eye movement. To identify this ability, we propose a model with the guidance of eye movement. It combines the bag of words (BOW) and spatial pyramid matching (SPM) methods to train and test our model on support vector machine (SVM). The eye movement experiments were employed to validate our model. We found that the subjects could recognize the scenes correctly even if given only a few saliency patches with less than one second. These results suggest that the eye tracking saliency patches play an important role for human scene categorization.

Download Full-text

Sparse coded spatial pyramid matching and multi-kernel integrated SVM for non-linear scene classification

Journal of Electrical Engineering ◽

10.2478/jee-2021-0053 ◽

2021 ◽

Vol 72 (6) ◽

pp. 374-380

Author(s):

Bhavinkumar Gajjar ◽

Hiren Mewada ◽

Ashwin Patani

Keyword(s):

Deep Learning ◽

Support Vector ◽

Scene Classification ◽

Scale Invariant ◽

Training Time ◽

Multi Scale ◽

Non Linear ◽

Pyramid Matching ◽

Scale Invariant Feature ◽

Distinct Features

Abstract Support vector machine (SVM) techniques and deep learning have been prevalent in object classification for many years. However, deep learning is computation-intensive and can require a long training time. SVM is significantly faster than Convolution Neural Network (CNN). However, the SVM has limited its applications in the mid-size dataset as it requires proper tuning. Recently the parameterization of multiple kernels has shown greater flexibility in the characterization of the dataset. Therefore, this paper proposes a sparse coded multi-scale approach to reduce training complexity and tuning of SVM using a non-linear fusion of kernels for large class natural scene classification. The optimum features are obtained by parameterizing the dictionary, Scale Invariant Feature Transform (SIFT) parameters, and fusion of multiple kernels. Experiments were conducted on a large dataset to examine the multi-kernel space capability to find distinct features for better classification. The proposed approach founds to be promising than the linear multi-kernel SVM approaches achieving 91.12 % maximum accuracy.

Download Full-text

Indoor Scene recognition for Micro Aerial Vehicles Navigation using Enhanced SIFT-ScSPM Descriptors

Journal of Navigation ◽

10.1017/s0373463319000420 ◽

2019 ◽

Vol 73 (1) ◽

pp. 37-55 ◽

Cited By ~ 1

Author(s):

B. Anbarasu ◽

G. Anitha

Keyword(s):

State Of The Art ◽

Scene Recognition ◽

Support Vector ◽

Scene Classification ◽

Spatial Pyramid Matching ◽

Speeded Up Robust Features ◽

Indoor Scene ◽

Visual Descriptors ◽

Pyramid Matching ◽

Spatial Pyramid

In this paper, a new scene recognition visual descriptor called Enhanced Scale Invariant Feature Transform-based Sparse coding Spatial Pyramid Matching (Enhanced SIFT-ScSPM) descriptor is proposed by combining a Bag of Words (BOW)-based visual descriptor (SIFT-ScSPM) and Gist-based descriptors (Enhanced Gist-Enhanced multichannel Gist (Enhanced mGist)). Indoor scene classification is carried out by multi-class linear and non-linear Support Vector Machine (SVM) classifiers. Feature extraction methodology and critical review of several visual descriptors used for indoor scene recognition in terms of experimental perspectives have been discussed in this paper. An empirical study is conducted on the Massachusetts Institute of Technology (MIT) 67 indoor scene classification data set and assessed the classification accuracy of state-of-the-art visual descriptors and the proposed Enhanced mGist, Speeded Up Robust Features-Spatial Pyramid Matching (SURF-SPM) and Enhanced SIFT-ScSPM visual descriptors. Experimental results show that the proposed Enhanced SIFT-ScSPM visual descriptor performs better with higher classification rate, precision, recall and area under the Receiver Operating Characteristic (ROC) curve values with respect to the state-of-the-art and the proposed Enhanced mGist and SURF-SPM visual descriptors.

Download Full-text

A Deformable Convolutional Neural Network with Spatial-Channel Attention for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs13245076 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5076

Author(s):

Di Wang ◽

Jinhui Lan

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

State Of The Art ◽

Fundamental Problem ◽

Scene Classification ◽

Complex Shapes ◽

Classification Information ◽

High Level ◽

Scene Features

Remote sensing scene classification converts remote sensing images into classification information to support high-level applications, so it is a fundamental problem in the field of remote sensing. In recent years, many convolutional neural network (CNN)-based methods have achieved impressive results in remote sensing scene classification, but they have two problems in extracting remote sensing scene features: (1) fixed-shape convolutional kernels cannot effectively extract features from remote sensing scenes with complex shapes and diverse distributions; (2) the features extracted by CNN contain a large number of redundant and invalid information. To solve these problems, this paper constructs a deformable convolutional neural network to adapt the convolutional sampling positions to the shape of objects in the remote sensing scene. Meanwhile, the spatial and channel attention mechanisms are used to focus on the effective features while suppressing the invalid ones. The experimental results indicate that the proposed method is competitive to the state-of-the-art methods on three remote sensing scene classification datasets (UCM, NWPU, and AID).

Download Full-text

An Adaptive Algorithm for Robust Visual Codebook Generation and Its Natural Scene Categorization Application

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2009.01323 ◽

2010 ◽

Vol 32 (9) ◽

pp. 2139-2144 ◽

Cited By ~ 1

Author(s):

Dan Yang ◽

Bo Li ◽

Hong Zhao

Keyword(s):

Adaptive Algorithm ◽

Natural Scene ◽

Scene Categorization ◽

Codebook Generation

Download Full-text

Extraction of Arecanut Planting Distribution Based on the Feature Space Optimization of PlanetScope Imagery

Agriculture ◽

10.3390/agriculture11040371 ◽

2021 ◽

Vol 11 (4) ◽

pp. 371

Author(s):

Yu Jin ◽

Jiawei Guo ◽

Huichun Ye ◽

Jinling Zhao ◽

Wenjiang Huang ◽

...

Keyword(s):

Random Forest ◽

Satellite Imagery ◽

Feature Space ◽

Kappa Coefficient ◽

Classification Model ◽

Support Vector ◽

Textural Feature ◽

Monitoring Accuracy ◽

Areca Catechu ◽

High Level

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.

Download Full-text

Ergonomics analysis based on intention inference

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210191 ◽

2021 ◽

pp. 1-16

Author(s):

First A. Wenbo Huang ◽

Second B. Changyuan Wang ◽

Third C. Hongbo Jia

Keyword(s):

Eye Movement ◽

Recognition Rate ◽

Tactile Feedback ◽

Movement Direction ◽

Support Vector ◽

Eeg Signals ◽

Intention Recognition ◽

Baseline Removal ◽

Inference Methods ◽

Intention Inference

Traditional intention inference methods rely solely on EEG, eye movement or tactile feedback, and the recognition rate is low. To improve the accuracy of a pilot’s intention recognition, a human-computer interaction intention inference method is proposed in this paper with the fusion of EEG, eye movement and tactile feedback. Firstly, EEG signals are collected near the frontal lobe of the human brain to extract features, which includes eight channels, i.e., AF7, F7, FT7, T7, AF8, F8, FT8, and T8. Secondly, the signal datas are preprocessed by baseline removal, normalization, and least-squares noise reduction. Thirdly, the support vector machine (SVM) is applied to carry out multiple binary classifications of the eye movement direction. Finally, the 8-direction recognition of the eye movement direction is realized through data fusion. Experimental results have shown that the accuracy of classification with the proposed method can reach 75.77%, 76.7%, 83.38%, 83.64%, 60.49%,60.93%, 66.03% and 64.49%, respectively. Compared with traditional methods, the classification accuracy and the realization process of the proposed algorithm are higher and simpler. The feasibility and effectiveness of EEG signals are further verified to identify eye movement directions for intention recognition.

Download Full-text

Statistics of natural scene structures and scene categorization

BMC Neuroscience ◽

10.1186/1471-2202-13-s1-p7 ◽

2012 ◽

Vol 13 (S1) ◽

Cited By ~ 2

Author(s):

Xin Chen ◽

Weibing Wan ◽

Zhiyong Yong

Keyword(s):

Natural Scene ◽

Scene Categorization

Download Full-text

Natural image understanding using algorithm selection and high-level feedback

10.1117/12.2008593 ◽

2013 ◽

Cited By ~ 5

Author(s):

Martin Lukac ◽

Michitaka Kameyama ◽

Kosuke Hiura

Keyword(s):

Image Understanding ◽

Natural Image ◽

Algorithm Selection ◽

High Level

Download Full-text

Devanagari Text Detection From Natural Scene Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2020070104 ◽

2020 ◽

Vol 10 (3) ◽

pp. 44-59

Author(s):

Sankirti Sandeep Shiravale ◽

R. Jayadevan ◽

Sanjeev S. Sannakki

Keyword(s):

Edge Detection ◽

Image Understanding ◽

Text Detection ◽

Experimental Results ◽

Combined Approach ◽

Natural Scene ◽

Light Conditions ◽

The Individual ◽

Natural Scene Images ◽

Better Than

Text present in a camera captured scene images is semantically rich and can be used for image understanding. Automatic detection, extraction, and recognition of text are crucial in image understanding applications. Text detection from natural scene images is a tedious task due to complex background, uneven light conditions, multi-coloured and multi-sized font. Two techniques, namely ‘edge detection' and ‘colour-based clustering', are combined in this paper to detect text in scene images. Region properties are used for elimination of falsely generated annotations. A dataset of 1250 images is created and used for experimentation. Experimental results show that the combined approach performs better than the individual approaches.

Download Full-text

Space Precession Target Classification Based on Radar High-Resolution Range Profiles

International Journal of Antennas and Propagation ◽

10.1155/2019/8151620 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9

Author(s):

Yizhe Wang ◽

Cunqian Feng ◽

Yongshun Zhang ◽

Sisan He

Keyword(s):

Parameter Extraction ◽

Classification Performance ◽

Support Vector ◽

Electromagnetic Data ◽

Feature Extractor ◽

Different Types ◽

Radar Echo ◽

High Level ◽

Cone Target

Precession is a common micromotion form of space targets, introducing additional micro-Doppler (m-D) modulation into the radar echo. Effective classification of space targets is of great significance for further micromotion parameter extraction and identification. Feature extraction is a key step during the classification process, largely influencing the final classification performance. This paper presents two methods for classifying different types of space precession targets from the HRRPs. We first establish the precession model of space targets and analyze the scattering characteristics and then compute electromagnetic data of the cone target, cone-cylinder target, and cone-cylinder-flare target. Experimental results demonstrate that the support vector machine (SVM) using histograms of oriented gradient (HOG) features achieves a good result, whereas the deep convolutional neural network (DCNN) obtains a higher classification accuracy. DCNN combines the feature extractor and the classifier itself to automatically mine the high-level signatures of HRRPs through a training process. Besides, the efficiency of the two classification processes are compared using the same dataset.

Download Full-text