An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Author(s):  
Dengdi Sun ◽  
Leilei Ma ◽  
Zhuanlian Ding ◽  
Bin Luo
2020 ◽  
Vol 12 (4) ◽  
pp. 655
Author(s):  
Chu He ◽  
Mingxia Tu ◽  
Dehui Xiong ◽  
Mingsheng Liao

Synthetic Aperture Rradar (SAR) provides rich ground information for remote sensing survey and can be used all time and in all weather conditions. Polarimetric SAR (PolSAR) can further reveal surface scattering difference and improve radar’s application ability. Most existing classification methods for PolSAR imagery are based on manual features, such methods with fixed pattern has poor data adaptability and low feature utilization, if directly input to the classifier. Therefore, combining PolSAR data characteristics and deep network with auto-feature learning ability forms a new breakthrough direction. In fact, feature learning of deep network is to realize function approximation from data to label, through multi-layer accumulation, but finite layers limit the network’s mapping ability. According to manifold hypothesis, high-dimensional data exists in potential low-dimensional manifold and different types of data locates in different manifolds. Manifold learning can model core variables of the target, and separate different data’s manifold as much as possible, so as to complete data classification better. Therefore, taking manifold hypothesis as a starting point, nonlinear manifold learning integrated with fully convolutional networks for PolSAR image classification method is proposed in this paper. Firstly, high-dimensional polarized features are extracted based on scattering matrix and coherence matrix of original PolSAR data, whose compact representation is mined by manifold learning. Meanwhile, drawing on transfer learning, pre-trained Fully Convolutional Networks (FCN) model is utilized to learn deep spatial features of PolSAR imagery. Considering complementary advantages, weighted strategy is adopted to embed manifold representation into deep spatial features, which are input into support vector machine (SVM) classifier for final classification. A series of experiments on three PolSAR datasets have verified effectiveness and superiority of the proposed classification algorithm.


2019 ◽  
Vol 16 (2) ◽  
pp. 241-245 ◽  
Author(s):  
Anyong Qin ◽  
Zhaowei Shang ◽  
Jinyu Tian ◽  
Yulong Wang ◽  
Taiping Zhang ◽  
...  

2019 ◽  
Vol 10 (1) ◽  
pp. 101 ◽  
Author(s):  
Yadong Yang ◽  
Chengji Xu ◽  
Feng Dong ◽  
Xiaofeng Wang

Computer vision systems are insensitive to the scale of objects in natural scenes, so it is important to study the multi-scale representation of features. Res2Net implements hierarchical multi-scale convolution in residual blocks, but its random grouping method affects the robustness and intuitive interpretability of the network. We propose a new multi-scale convolution model based on multiple attention. It introduces the attention mechanism into the structure of a Res2-block to better guide feature expression. First, we adopt channel attention to score channels and sort them in descending order of the feature’s importance (Channels-Sort). The sorted residual blocks are grouped and intra-block hierarchically convolved to form a single attention and multi-scale block (AMS-block). Then, we implement channel attention on the residual small blocks to constitute a dual attention and multi-scale block (DAMS-block). Introducing spatial attention before sorting the channels to form multi-attention multi-scale blocks(MAMS-block). A MAMS-convolutional neural network (CNN) is a series of multiple MAMS-blocks. It enables significant information to be expressed at more levels, and can also be easily grafted into different convolutional structures. Limited by hardware conditions, we only prove the validity of the proposed ideas through convolutional networks of the same magnitude. The experimental results show that the convolution model with an attention mechanism and multi-scale features is superior in image classification.


2021 ◽  
Vol 11 (17) ◽  
pp. 8141
Author(s):  
Vladimir Kulyukin ◽  
Nikhil Ganta ◽  
Anastasiia Tkachenko

Omnidirectional honeybee traffic is the number of bees moving in arbitrary directions in close proximity to the landing pad of a beehive over a period of time. Automated video analysis of such traffic is critical for continuous colony health assessment. In our previous research, we proposed a two-tier algorithm to measure omnidirectional bee traffic in videos. Our algorithm combines motion detection with image classification: in tier 1, motion detection functions as class-agnostic object location to generate regions with possible objects; in tier 2, each region from tier 1 is classified by a class-specific classifier. In this article, we present an empirical and theoretical comparison of random reinforced forests and shallow convolutional networks as tier 2 classifiers. A random reinforced forest is a random forest trained on a dataset with reinforcement learning. We present several methods of training random reinforced forests and compare their performance with shallow convolutional networks on seven image datasets. We develop a theoretical framework to assess the complexity of image classification by a image classifier. We formulate and prove three theorems on finding optimal random reinforced forests. Our conclusion is that, despite their limitations, random reinforced forests are a reasonable alternative to convolutional networks when memory footprints and classification and energy efficiencies are important factors. We outline several ways in which the performance of random reinforced forests may be improved.


2020 ◽  
Vol 12 (3) ◽  
pp. 396
Author(s):  
Hongwei Dong ◽  
Lamei Zhang ◽  
Bin Zou

Convolutional neural networks (CNNs) have become the state-of-the-art in optical image processing. Recently, CNNs have been used in polarimetric synthetic aperture radar (PolSAR) image classification and obtained promising results. Unlike optical images, the unique phase information of PolSAR data expresses the structure information of objects. This special data representation makes 3D convolution which explicitly modeling the relationship between polarimetric channels perform better in the task of PolSAR image classification. However, the development of deep 3D-CNNs will cause a huge number of model parameters and expensive computational costs, which not only leads to the decrease of the interpretation speed during testing, but also greatly increases the risk of over-fitting. To alleviate this problem, a lightweight 3D-CNN framework that compresses 3D-CNNs from two aspects is proposed in this paper. Lightweight convolution operations, i.e., pseudo-3D and 3D-depthwise separable convolutions, are considered as low-latency replacements for vanilla 3D convolution. Further, fully connected layers are replaced by global average pooling to reduce the number of model parameters so as to save the memory. Under the specific classification task, the proposed methods can reduce up to 69.83% of the model parameters in convolution layers of the 3D-CNN as well as almost all the model parameters in fully connected layers, which ensures the fast PolSAR interpretation. Experiments on three PolSAR benchmark datasets, i.e., AIRSAR Flevoland, ESAR Oberpfaffenhofen, EMISAR Foulum, show that the proposed lightweight architectures can not only maintain but also slightly improve the accuracy under various criteria.


Sign in / Sign up

Export Citation Format

Share Document