scholarly journals Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Author(s):  
Yaohui Zhu ◽  
Chenlong Liu ◽  
Shuqiang Jiang

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works of few-shot learning mainly work on general object images. And current solutions usually learn a global image representation from training tasks to adapt novel tasks. However, fine-gained categories are distinguished by subtle and local parts, which could not be captured by global representations effectively. This may hinder existing few-shot learning approaches from dealing with fine-gained categories well. In this work, we propose a multi-attention meta-learning (MattML) method for few-shot fine-grained image recognition (FSFGIR). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images. The base learner is equipped with two convolutional block attention modules (CBAM) and a classifier. The two CBAM can learn diverse and informative parts. And the initial weights of classifier are attended by the task learner, which gives the classifier a task-related sensitive initialization. For adaptation, the gradient-based meta-learning approach is employed by updating the parameters of two CBAM and the attended classifier, which facilitates the updated base learner to adaptively focus on discriminative parts. We experimentally analyze the different components of our method, and experimental results on four benchmark datasets demonstrate the effectiveness and superiority of our method.

2021 ◽  
Vol 2050 (1) ◽  
pp. 012006
Author(s):  
Xili Dai ◽  
Chunmei Ma ◽  
Jingwei Sun ◽  
Tao Zhang ◽  
Haigang Gong ◽  
...  

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.


2021 ◽  
Vol 10 (1) ◽  
pp. 17
Author(s):  
Ravikumar Balakrishnan ◽  
Mustafa Akdeniz ◽  
Sagar Dhakal ◽  
Arjun Anand ◽  
Ariela Zeira ◽  
...  

Client and Internet of Things devices are increasingly equipped with the ability to sense, process, and communicate data with high efficiency. This is resulting in a major shift in machine learning (ML) computation at the network edge. Distributed learning approaches such as federated learning that move ML training to end devices have emerged, promising lower latency and bandwidth costs and enhanced privacy of end users’ data. However, new challenges that arise from the heterogeneous nature of the devices’ communication rates, compute capabilities, and the limited observability of the training data at each device must be addressed. All these factors can significantly affect the training performance in terms of overall accuracy, model fairness, and convergence time. We present compute-communication and data importance-aware resource management schemes optimizing these metrics and evaluate the training performance on benchmark datasets. We also develop a federated meta-learning solution, based on task similarity, that serves as a sample efficient initialization for federated learning, as well as improves model personalization and generalization across non-IID (independent, identically distributed) data. We present experimental results on benchmark federated learning datasets to highlight the performance gains of the proposed methods in comparison to the well-known federated averaging algorithm and its variants.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jingyuan Zhang ◽  
Zequn Zhang ◽  
Zhi Guo ◽  
Li Jin ◽  
Kang Liu ◽  
...  

Target-oriented opinion words extraction (TOWE) seeks to identify opinion expressions oriented to a specific target, and it is a crucial step toward fine-grained opinion mining. Recent neural networks have achieved significant success in this task by building target-aware representations. However, there are still two limitations of these methods that hinder the progress of TOWE. Mainstream approaches typically utilize position indicators to mark the given target, which is a naive strategy and lacks task-specific semantic meaning. Meanwhile, the annotated target-opinion pairs contain rich latent structural knowledge from multiple perspectives, but existing methods only exploit the TOWE view. To tackle these issues, we formulate the TOWE task as a question answering (QA) problem and leverage a machine reading comprehension (MRC) model trained with a multiview paradigm to extract targeted opinions. Specifically, we introduce a template-based pseudo-question generation method and utilize deep attention interaction to build target-aware context representations and extract related opinion words. To take advantage of latent structural correlations, we further cast the opinion-target structure into three distinct yet correlated views and leverage meta-learning to aggregate common knowledge among them to enhance the TOWE task. We evaluate the proposed model on four benchmark datasets, and our method achieves new state-of-the-art results. Extensional experiments have shown that the pipeline method with our approach could surpass existing opinion pair extraction models, including joint methods that are usually believed to work better.


Author(s):  
Pinzhuo Tian ◽  
Lei Qi ◽  
Shaokang Dong ◽  
Yinghuan Shi ◽  
Yang Gao

In the few-shot learning scenario, the data-distribution discrepancy between training data and test data in a task usually exists due to the limited data. However, most existing meta-learning approaches seldom consider this intra-task discrepancy in the meta-training phase which might deteriorate the performance. To overcome this limitation, we develop a new consistent meta-regularization method to reduce the intra-task data-distribution discrepancy. Moreover, the proposed meta-regularization method could be readily inserted into existing optimization-based meta-learning models to learn better meta-knowledge. Particularly, we provide the theoretical analysis to prove that using the proposed meta-regularization, the conventional gradient-based meta-learning method can reach the lower regret bound. The extensive experiments also demonstrate the effectiveness of our method, which indeed improves the performances of the state-of-the-art gradient-based meta-learning models in the few-shot classification task.


Author(s):  
Yuqing Ma ◽  
Wei Liu ◽  
Shihao Bai ◽  
Qingyu Zhang ◽  
Aishan Liu ◽  
...  

Few-shot learning aims to learn a model that can be readily adapted to new unseen classes (concepts) by accessing one or few examples. Despite the successful progress, most of the few-shot learning approaches, concentrating on either global or local characteristics of examples, still suffer from weak generalization abilities. Inspired by the inverted pyramid theory, to address this problem, we propose an inverted pyramid network (IPN) that intimates the human's coarse-to-fine cognition paradigm. The proposed IPN consists of two consecutive stages, namely global stage and local stage. At the global stage, a class-sensitive contextual memory network (CCMNet) is introduced to learn discriminative support-query relation embeddings and predict the query-to-class similarity based on the contextual memory. Then at the local stage, a fine-grained calibration is further appended to complement the coarse relation embeddings, targeting more precise query-to-class similarity evaluation. To the best of our knowledge, IPN is the first work that simultaneously integrates both global and local characteristics in few-shot learning, approximately imitating the human cognition mechanism. Our extensive experiments on multiple benchmark datasets demonstrate the superiority of IPN, compared to a number of state-of-the-art approaches.


Symmetry ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 479 ◽  
Author(s):  
Yadong Yang ◽  
Xiaofeng Wang ◽  
Hengzheng Zhang

Compared with ordinary image classification tasks, fine-grained image classification is closer to real-life scenes. Its key point is how to find the local areas with sufficient discrimination and perform effective feature learning. Based on a bilinear convolutional neural network (B-CNN), this paper designs a local importance representation convolutional neural network (LIR-CNN) model, which can be divided into three parts. Firstly, the super-pixel segmentation convolution method is used for the input layer of the model. It allows the model to receive images of different sizes and fully considers the complex geometric deformation of the images. Then, we replaced the standard convolution of B-CNN with the proposed local importance representation convolution. It can score each local area of the image using learning to distinguish their importance. Finally, channelwise convolution is proposed and it plays an important role in balancing lightweight network and classification accuracy. Experimental results on the benchmark datasets (e.g., CUB-200-2011, FGVC-Aircraft, and Stanford Cars) showed that the LIR-CNN model had good performance in fine-grained image classification tasks.


Entropy ◽  
2020 ◽  
Vol 22 (6) ◽  
pp. 625
Author(s):  
Fang Dong ◽  
Li Liu ◽  
Fanzhang Li

Deep learning has achieved many successes in different fields but can sometimes encounter an overfitting problem when there are insufficient amounts of labeled samples. In solving the problem of learning with limited training data, meta-learning is proposed to remember some common knowledge by leveraging a large number of similar few-shot tasks and learning how to adapt a base-learner to a new task for which only a few labeled samples are available. Current meta-learning approaches typically uses Shallow Neural Networks (SNNs) to avoid overfitting, thus wasting much information in adapting to a new task. Moreover, the Euclidean space-based gradient descent in existing meta-learning approaches always lead to an inaccurate update of meta-learners, which poses a challenge to meta-learning models in extracting features from samples and updating network parameters. In this paper, we propose a novel meta-learning model called Multi-Stage Meta-Learning (MSML) to post the bottleneck during the adapting process. The proposed method constrains a network to Stiefel manifold so that a meta-learner could perform a more stable gradient descent in limited steps so that the adapting process can be accelerated. An experiment on the mini-ImageNet demonstrates that the proposed method reached a better accuracy under 5-way 1-shot and 5-way 5-shot conditions.


Author(s):  
Tianshui Chen ◽  
Liang Lin ◽  
Riquan Chen ◽  
Yang Wu ◽  
Xiaonan Luo

Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e.g., categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i.e., implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.


2021 ◽  
Vol 13 (16) ◽  
pp. 3065
Author(s):  
Libo Wang ◽  
Rui Li ◽  
Dongzhi Wang ◽  
Chenxi Duan ◽  
Teng Wang ◽  
...  

Semantic segmentation from very fine resolution (VFR) urban scene images plays a significant role in several application scenarios including autonomous driving, land cover classification, urban planning, etc. However, the tremendous details contained in the VFR image, especially the considerable variations in scale and appearance of objects, severely limit the potential of the existing deep learning approaches. Addressing such issues represents a promising research field in the remote sensing community, which paves the way for scene-level landscape pattern analysis and decision making. In this paper, we propose a Bilateral Awareness Network which contains a dependency path and a texture path to fully capture the long-range relationships and fine-grained details in VFR images. Specifically, the dependency path is conducted based on the ResT, a novel Transformer backbone with memory-efficient multi-head self-attention, while the texture path is built on the stacked convolution operation. In addition, using the linear attention mechanism, a feature aggregation module is designed to effectively fuse the dependency features and texture features. Extensive experiments conducted on the three large-scale urban scene image segmentation datasets, i.e., ISPRS Vaihingen dataset, ISPRS Potsdam dataset, and UAVid dataset, demonstrate the effectiveness of our BANet. Specifically, a 64.6% mIoU is achieved on the UAVid dataset.


Sign in / Sign up

Export Citation Format

Share Document