An Integrated Saliency Model with Guidance of Eye Movement in Natural Scene Classification

2014 ◽  
Vol 678 ◽  
pp. 147-150
Author(s):  
Yu Liang Du ◽  
Ling Feng Yuan ◽  
Wei Bing Wan

Nature Scene classification is a fundamental problem in image understanding. Human can recognize the scene instantly after only a glance. This is mainly because that our visual attention is easily attracted by the salient objects in scene. And these objects are always representative in the natural scene. It is unclear how humans achieve rapid scene categorization. But this kind of high-level cognitive behavior can be reflected by the eye movement. To identify this ability, we propose a model with the guidance of eye movement. It combines the bag of words (BOW) and spatial pyramid matching (SPM) methods to train and test our model on support vector machine (SVM). The eye movement experiments were employed to validate our model. We found that the subjects could recognize the scenes correctly even if given only a few saliency patches with less than one second. These results suggest that the eye tracking saliency patches play an important role for human scene categorization.

2021 ◽  
Vol 72 (6) ◽  
pp. 374-380
Author(s):  
Bhavinkumar Gajjar ◽  
Hiren Mewada ◽  
Ashwin Patani

Abstract Support vector machine (SVM) techniques and deep learning have been prevalent in object classification for many years. However, deep learning is computation-intensive and can require a long training time. SVM is significantly faster than Convolution Neural Network (CNN). However, the SVM has limited its applications in the mid-size dataset as it requires proper tuning. Recently the parameterization of multiple kernels has shown greater flexibility in the characterization of the dataset. Therefore, this paper proposes a sparse coded multi-scale approach to reduce training complexity and tuning of SVM using a non-linear fusion of kernels for large class natural scene classification. The optimum features are obtained by parameterizing the dictionary, Scale Invariant Feature Transform (SIFT) parameters, and fusion of multiple kernels. Experiments were conducted on a large dataset to examine the multi-kernel space capability to find distinct features for better classification. The proposed approach founds to be promising than the linear multi-kernel SVM approaches achieving 91.12 % maximum accuracy.


2019 ◽  
Vol 73 (1) ◽  
pp. 37-55 ◽  
Author(s):  
B. Anbarasu ◽  
G. Anitha

In this paper, a new scene recognition visual descriptor called Enhanced Scale Invariant Feature Transform-based Sparse coding Spatial Pyramid Matching (Enhanced SIFT-ScSPM) descriptor is proposed by combining a Bag of Words (BOW)-based visual descriptor (SIFT-ScSPM) and Gist-based descriptors (Enhanced Gist-Enhanced multichannel Gist (Enhanced mGist)). Indoor scene classification is carried out by multi-class linear and non-linear Support Vector Machine (SVM) classifiers. Feature extraction methodology and critical review of several visual descriptors used for indoor scene recognition in terms of experimental perspectives have been discussed in this paper. An empirical study is conducted on the Massachusetts Institute of Technology (MIT) 67 indoor scene classification data set and assessed the classification accuracy of state-of-the-art visual descriptors and the proposed Enhanced mGist, Speeded Up Robust Features-Spatial Pyramid Matching (SURF-SPM) and Enhanced SIFT-ScSPM visual descriptors. Experimental results show that the proposed Enhanced SIFT-ScSPM visual descriptor performs better with higher classification rate, precision, recall and area under the Receiver Operating Characteristic (ROC) curve values with respect to the state-of-the-art and the proposed Enhanced mGist and SURF-SPM visual descriptors.


2021 ◽  
Vol 13 (24) ◽  
pp. 5076
Author(s):  
Di Wang ◽  
Jinhui Lan

Remote sensing scene classification converts remote sensing images into classification information to support high-level applications, so it is a fundamental problem in the field of remote sensing. In recent years, many convolutional neural network (CNN)-based methods have achieved impressive results in remote sensing scene classification, but they have two problems in extracting remote sensing scene features: (1) fixed-shape convolutional kernels cannot effectively extract features from remote sensing scenes with complex shapes and diverse distributions; (2) the features extracted by CNN contain a large number of redundant and invalid information. To solve these problems, this paper constructs a deformable convolutional neural network to adapt the convolutional sampling positions to the shape of objects in the remote sensing scene. Meanwhile, the spatial and channel attention mechanisms are used to focus on the effective features while suppressing the invalid ones. The experimental results indicate that the proposed method is competitive to the state-of-the-art methods on three remote sensing scene classification datasets (UCM, NWPU, and AID).


Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


2021 ◽  
pp. 1-16
Author(s):  
First A. Wenbo Huang ◽  
Second B. Changyuan Wang ◽  
Third C. Hongbo Jia

Traditional intention inference methods rely solely on EEG, eye movement or tactile feedback, and the recognition rate is low. To improve the accuracy of a pilot’s intention recognition, a human-computer interaction intention inference method is proposed in this paper with the fusion of EEG, eye movement and tactile feedback. Firstly, EEG signals are collected near the frontal lobe of the human brain to extract features, which includes eight channels, i.e., AF7, F7, FT7, T7, AF8, F8, FT8, and T8. Secondly, the signal datas are preprocessed by baseline removal, normalization, and least-squares noise reduction. Thirdly, the support vector machine (SVM) is applied to carry out multiple binary classifications of the eye movement direction. Finally, the 8-direction recognition of the eye movement direction is realized through data fusion. Experimental results have shown that the accuracy of classification with the proposed method can reach 75.77%, 76.7%, 83.38%, 83.64%, 60.49%,60.93%, 66.03% and 64.49%, respectively. Compared with traditional methods, the classification accuracy and the realization process of the proposed algorithm are higher and simpler. The feasibility and effectiveness of EEG signals are further verified to identify eye movement directions for intention recognition.


2012 ◽  
Vol 13 (S1) ◽  
Author(s):  
Xin Chen ◽  
Weibing Wan ◽  
Zhiyong Yong

Author(s):  
Sankirti Sandeep Shiravale ◽  
R. Jayadevan ◽  
Sanjeev S. Sannakki

Text present in a camera captured scene images is semantically rich and can be used for image understanding. Automatic detection, extraction, and recognition of text are crucial in image understanding applications. Text detection from natural scene images is a tedious task due to complex background, uneven light conditions, multi-coloured and multi-sized font. Two techniques, namely ‘edge detection' and ‘colour-based clustering', are combined in this paper to detect text in scene images. Region properties are used for elimination of falsely generated annotations. A dataset of 1250 images is created and used for experimentation. Experimental results show that the combined approach performs better than the individual approaches.


2019 ◽  
Vol 2019 ◽  
pp. 1-9
Author(s):  
Yizhe Wang ◽  
Cunqian Feng ◽  
Yongshun Zhang ◽  
Sisan He

Precession is a common micromotion form of space targets, introducing additional micro-Doppler (m-D) modulation into the radar echo. Effective classification of space targets is of great significance for further micromotion parameter extraction and identification. Feature extraction is a key step during the classification process, largely influencing the final classification performance. This paper presents two methods for classifying different types of space precession targets from the HRRPs. We first establish the precession model of space targets and analyze the scattering characteristics and then compute electromagnetic data of the cone target, cone-cylinder target, and cone-cylinder-flare target. Experimental results demonstrate that the support vector machine (SVM) using histograms of oriented gradient (HOG) features achieves a good result, whereas the deep convolutional neural network (DCNN) obtains a higher classification accuracy. DCNN combines the feature extractor and the classifier itself to automatically mine the high-level signatures of HRRPs through a training process. Besides, the efficiency of the two classification processes are compared using the same dataset.


Sign in / Sign up

Export Citation Format

Share Document