scholarly journals Object–Part Registration–Fusion Net for Fine-Grained Image Classification

Symmetry ◽  
2021 ◽  
Vol 13 (10) ◽  
pp. 1838
Author(s):  
Chih-Wei Lin ◽  
Mengxiang Lin ◽  
Jinfu Liu

Classifying fine-grained categories (e.g., bird species, car, and aircraft types) is a crucial problem in image understanding and is difficult due to intra-class and inter-class variance. Most of the existing fine-grained approaches individually utilize various parts and local information of objects to improve the classification accuracy but neglect the mechanism of the feature fusion between the object (global) and object’s parts (local) to reinforce fine-grained features. In this paper, we present a novel framework, namely object–part registration–fusion Net (OR-Net), which considers the mechanism of registration and fusion between an object (global) and its parts’ (local) features for fine-grained classification. Our model learns the fine-grained features from the object of global and local regions and fuses these features with the registration mechanism to reinforce each region’s characteristics in the feature maps. Precisely, OR-Net consists of: (1) a multi-stream feature extraction net, which generates features with global and various local regions of objects; (2) a registration–fusion feature module calculates the dimension and location relationships between global (object) regions and local (parts) regions to generate the registration information and fuses the local features into the global features with registration information to generate the fine-grained feature. Experiments execute symmetric GPU devices with symmetric mini-batch to verify that OR-Net surpasses the state-of-the-art approaches on CUB-200-2011 (Birds), Stanford-Cars, and Stanford-Aircraft datasets.

2019 ◽  
Vol 11 (24) ◽  
pp. 3006 ◽  
Author(s):  
Yafei Lv ◽  
Xiaohan Zhang ◽  
Wei Xiong ◽  
Yaqi Cui ◽  
Mi Cai

Remote sensing image scene classification (RSISC) is an active task in the remote sensing community and has attracted great attention due to its wide applications. Recently, the deep convolutional neural networks (CNNs)-based methods have witnessed a remarkable breakthrough in performance of remote sensing image scene classification. However, the problem that the feature representation is not discriminative enough still exists, which is mainly caused by the characteristic of inter-class similarity and intra-class diversity. In this paper, we propose an efficient end-to-end local-global-fusion feature extraction (LGFFE) network for a more discriminative feature representation. Specifically, global and local features are extracted from channel and spatial dimensions respectively, based on a high-level feature map from deep CNNs. For the local features, a novel recurrent neural network (RNN)-based attention module is first proposed to capture the spatial layout information and context information across different regions. Gated recurrent units (GRUs) is then exploited to generate the important weight of each region by taking a sequence of features from image patches as input. A reweighed regional feature representation can be obtained by focusing on the key region. Then, the final feature representation can be acquired by fusing the local and global features. The whole process of feature extraction and feature fusion can be trained in an end-to-end manner. Finally, extensive experiments have been conducted on four public and widely used datasets and experimental results show that our method LGFFE outperforms baseline methods and achieves state-of-the-art results.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5839
Author(s):  
Denghua Fan ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Yongming Li

As a sub-direction of image retrieval, person re-identification (Re-ID) is usually used to solve the security problem of cross camera tracking and monitoring. A growing number of shopping centers have recently attempted to apply Re-ID technology. One of the development trends of related algorithms is using an attention mechanism to capture global and local features. We notice that these algorithms have apparent limitations. They only focus on the most salient features without considering certain detailed features. People’s clothes, bags and even shoes are of great help to distinguish pedestrians. We notice that global features usually cover these important local features. Therefore, we propose a dual branch network based on a multi-scale attention mechanism. This network can capture apparent global features and inconspicuous local features of pedestrian images. Specifically, we design a dual branch attention network (DBA-Net) for better performance. These two branches can optimize the extracted features of different depths at the same time. We also design an effective block (called channel, position and spatial-wise attention (CPSA)), which can capture key fine-grained information, such as bags and shoes. Furthermore, based on ID loss, we use complementary triplet loss and adaptive weighted rank list loss (WRLL) on each branch during the training process. DBA-Net can not only learn semantic context information of the channel, position, and spatial dimensions but can integrate detailed semantic information by learning the dependency relationships between features. Extensive experiments on three widely used open-source datasets proved that DBA-Net clearly yielded overall state-of-the-art performance. Particularly on the CUHK03 dataset, the mean average precision (mAP) of DBA-Net achieved 83.2%.


2021 ◽  
Vol 11 (5) ◽  
pp. 2174
Author(s):  
Xiaoguang Li ◽  
Feifan Yang ◽  
Jianglu Huang ◽  
Li Zhuo

Images captured in a real scene usually suffer from complex non-uniform degradation, which includes both global and local blurs. It is difficult to handle the complex blur variances by a unified processing model. We propose a global-local blur disentangling network, which can effectively extract global and local blur features via two branches. A phased training scheme is designed to disentangle the global and local blur features, that is the branches are trained with task-specific datasets, respectively. A branch attention mechanism is introduced to dynamically fuse global and local features. Complex blurry images are used to train the attention module and the reconstruction module. The visualized feature maps of different branches indicated that our dual-branch network can decouple the global and local blur features efficiently. Experimental results show that the proposed dual-branch blur disentangling network can improve both the subjective and objective deblurring effects for real captured images.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yongyi Li ◽  
Shiqi Wang ◽  
Shuang Dong ◽  
Xueling Lv ◽  
Changzhi Lv ◽  
...  

At present, person reidentification based on attention mechanism has attracted many scholars’ interests. Although attention module can improve the representation ability and reidentification accuracy of Re-ID model to a certain extent, it depends on the coupling of attention module and original network. In this paper, a person reidentification model that combines multiple attentions and multiscale residuals is proposed. The model introduces combined attention fusion module and multiscale residual fusion module in the backbone network ResNet 50 to enhance the feature flow between residual blocks and better fuse multiscale features. Furthermore, a global branch and a local branch are designed and applied to enhance the channel aggregation and position perception ability of the network by utilizing the dual ensemble attention module, as along as the fine-grained feature expression is obtained by using multiproportion block and reorganization. Thus, the global and local features are enhanced. The experimental results on Market-1501 dataset and DukeMTMC-reID dataset show that the indexes of the presented model, especially Rank-1 accuracy, reach 96.20% and 89.59%, respectively, which can be considered as a progress in Re-ID.


Author(s):  
Fenglin Liu ◽  
Xuancheng Ren ◽  
Yuanxin Liu ◽  
Kai Lei ◽  
Xu Sun

Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation with remarkable efficiency in terms of accuracy, speed, and parameter budget.


2013 ◽  
Vol 11 (03) ◽  
pp. 1341004 ◽  
Author(s):  
YUANNING LIU ◽  
YAPING CHANG ◽  
CHAO ZHANG ◽  
QINGKAI WEI ◽  
JINGBO CHEN ◽  
...  

Design of small interference RNA (siRNA) is one of the most important steps in effectively applying the RNA interference (RNAi) technology. The current siRNA design often produces inconsistent design results, which often fail to reliably select siRNA with clear silencing effects. We propose that when designing siRNA, one should consider mRNA global features and near siRNA-binding site local features. By a linear regression study, we discovered strong correlations between inhibitory efficacy and both mRNA global features and neighboring local features. This paper shows that, on average, less GC content, fewer stem secondary structures, and more loop secondary structures of mRNA at both global and local flanking regions of the siRNA binding sites lead to stronger inhibitory efficacy. Thus, the use of mRNA global features and near siRNA-binding site local features are essential to successful gene silencing and hence, a better siRNA design. We use a random forest model to predict siRNA efficacy using siRNA features, mRNA features, and near siRNA binding site features. Our prediction method achieved a correlation coefficient of 0.7 in 10-fold cross validation in contrast to 0.63 when using siRNA features only. Our study demonstrates that considering mRNA and near siRNA binding site features helps improve siRNA design accuracy. The findings may also be helpful in understanding binding efficacy between microRNA and mRNA.


Author(s):  
Xing Xu ◽  
Yifan Wang ◽  
Yixuan He ◽  
Yang Yang ◽  
Alan Hanjalic ◽  
...  

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calculate the image-sentence similarity. However, the image-sentence similarity obtained by these methods may be coarse as (1) an intermediate common space is introduced to implicitly match the heterogeneous features of images and sentences in a global level, and (2) only the inter-modality relations of images and sentences are captured while the intra-modality relations are ignored. To overcome the limitations, we propose a novel Cross-Modal Hybrid Feature Fusion (CMHF) framework for directly learning the image-sentence similarity by fusing multimodal features with inter- and intra-modality relations incorporated. It can robustly capture the high-level interactions between visual regions in images and words in sentences, where flexible attention mechanisms are utilized to generate effective attention flows within and across the modalities of images and sentences. A structured objective with ranking loss constraint is formed in CMHF to learn the image-sentence similarity based on the fused fine-grained features of different modalities bypassing the usage of intermediate common space. Extensive experiments and comprehensive analysis performed on two widely used datasets—Microsoft COCO and Flickr30K—show the effectiveness of the hybrid feature fusion framework in CMHF, in which the state-of-the-art matching performance is achieved by our proposed CMHF method.


2010 ◽  
Vol 20-23 ◽  
pp. 1253-1259
Author(s):  
Chang Jun Zhou ◽  
Xiao Peng Wei ◽  
Qiang Zhang

In this paper, we propose a novel algorithm for facial recognition based on features fusion in support vector machine (SVM). First, some local features and global features from pre-processed face images are obtained. The global features are obtained by making use of singular value decomposition (SVD). At the same time, the local features are obtained by utilizing principal component analysis (PCA) to extract the principal Gabor features. Finally, the feature vectors which are fused with global and local features are used to train SVM to realize the face expression recognition, and the computer simulation illustrates the effectivity of this method on the JAFFE database.


2014 ◽  
Vol 926-930 ◽  
pp. 3598-3603
Author(s):  
Xiao Xiong ◽  
Guo Fa Hao ◽  
Peng Zhong

Face recognition belongs to the important content of the biometric identification, which is a important method in research of image processing and pattern recognition. It can effectively overcome the traditional authentication defects Through the facial recognition technology. At present, face recognition under ideal state research made some achievements, but the changes in light, shade, expression, posture changes the interference factors such as face recognition is still exist many problems. For this, put forward the integration of global and local features of face recognition research. Practice has proved that through the effective integration of global features and local characteristics, build based on global features and local features fusion face recognition system, can improve the recognition rate of face recognition, face recognition application benefit.


2021 ◽  
Vol 11 (14) ◽  
pp. 6533
Author(s):  
Yimin Wang ◽  
Zhifeng Xiao ◽  
Lingguo Meng

Vegetable and fruit recognition can be considered as a fine-grained visual categorization (FGVC) task, which is challenging due to the large intraclass variances and small interclass variances. A mainstream direction to address the challenge is to exploit fine-grained local/global features to enhance the feature extraction and representation in the learning pipeline. However, unlike the human visual system, most of the existing FGVC methods only extract features from individual images during training. In contrast, human beings can learn discriminative features by comparing two different images. Inspired by this intuition, a recent FGVC method, named Attentive Pairwise Interaction Network (API-Net), takes as input an image pair for pairwise feature interaction and demonstrates superior performance in several open FGVC data sets. However, the accuracy of API-Net on VegFru, a domain-specific FGVC data set, is lower than expected, potentially due to the lack of spatialwise attention. Following this direction, we propose an FGVC framework named Attention-aware Interactive Features Network (AIF-Net) that refines the API-Net by integrating an attentive feature extractor into the backbone network. Specifically, we employ a region proposal network (RPN) to generate a collection of informative regions and apply a biattention module to learn global and local attentive feature maps, which are fused and fed into an interactive feature learning subnetwork. The novel neural structure is verified through extensive experiments and shows consistent performance improvement in comparison with the SOTA on the VegFru data set, demonstrating its superiority in fine-grained vegetable and fruit recognition. We also discover that a concatenation fusion operation applied in the feature extractor, along with three top-scoring regions suggested by an RPN, can effectively boost the performance.


Sign in / Sign up

Export Citation Format

Share Document