Self-Amplificated Network: Learning fine-grained learner with few samples

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Explicit Interaction Model towards Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016359 ◽

2019 ◽

Vol 33 ◽

pp. 6359-6366 ◽

Cited By ~ 3

Author(s):

Cunxiao Du ◽

Zhaozheng Chen ◽

Fuli Feng ◽

Lei Zhu ◽

Tian Gan ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Deep Neural Networks ◽

Interaction Mechanism ◽

Interaction Model ◽

Classification Task ◽

Fine Grained ◽

Word Level ◽

Benchmark Datasets ◽

Classification Tasks

Text classification is one of the fundamental tasks in natural language processing. Recently, deep neural networks have achieved promising performance in the text classification task compared to shallow models. Despite of the significance of deep models, they ignore the fine-grained (matching signals between words and classes) classification clues since their classifications mainly rely on the text-level representations. To address this problem, we introduce the interaction mechanism to incorporate word-level matching signals into the text classification task. In particular, we design a novel framework, EXplicit interAction Model (dubbed as EXAM), equipped with the interaction mechanism. We justified the proposed approach on several benchmark datasets including both multilabel and multi-class text classification tasks. Extensive experimental results demonstrate the superiority of the proposed method. As a byproduct, we have released the codes and parameter settings to facilitate other researches.

Download Full-text

Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment

10.36227/techrxiv.17031920.v1 ◽

2021 ◽

Author(s):

Mengke Li ◽

Yiu-ming Cheung ◽

Yang Lu

Keyword(s):

Visual Recognition ◽

Deep Neural Networks ◽

Sampling Strategy ◽

Cross Entropy ◽

Superior Performance ◽

Great Success ◽

Effective Number ◽

Entropy Loss ◽

Benchmark Datasets ◽

Varied Amplitude

Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.

Download Full-text

Multi-attention Meta Learning for Few-shot Fine-grained Image Recognition

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/152 ◽

2020 ◽

Author(s):

Yaohui Zhu ◽

Chenlong Liu ◽

Shuqiang Jiang

Keyword(s):

Image Recognition ◽

Feature Learning ◽

Learning Approaches ◽

Fine Grained ◽

Meta Learning ◽

Benchmark Datasets ◽

Gradient Based ◽

General Object ◽

Base Learner ◽

Discriminative Parts

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works of few-shot learning mainly work on general object images. And current solutions usually learn a global image representation from training tasks to adapt novel tasks. However, fine-gained categories are distinguished by subtle and local parts, which could not be captured by global representations effectively. This may hinder existing few-shot learning approaches from dealing with fine-gained categories well. In this work, we propose a multi-attention meta-learning (MattML) method for few-shot fine-grained image recognition (FSFGIR). Instead of using only base learner for general feature learning, the proposed meta-learning method uses attention mechanisms of the base learner and task learner to capture discriminative parts of images. The base learner is equipped with two convolutional block attention modules (CBAM) and a classifier. The two CBAM can learn diverse and informative parts. And the initial weights of classifier are attended by the task learner, which gives the classifier a task-related sensitive initialization. For adaptation, the gradient-based meta-learning approach is employed by updating the parameters of two CBAM and the attended classifier, which facilitates the updated base learner to adaptively focus on discriminative parts. We experimentally analyze the different components of our method, and experimental results on four benchmark datasets demonstrate the effectiveness and superiority of our method.

Download Full-text

Enhancement of Target-Oriented Opinion Words Extraction with Multiview-Trained Machine Reading Comprehension Model

Computational Intelligence and Neuroscience ◽

10.1155/2021/6645871 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jingyuan Zhang ◽

Zequn Zhang ◽

Zhi Guo ◽

Li Jin ◽

Kang Liu ◽

...

Keyword(s):

Reading Comprehension ◽

Question Answering ◽

Opinion Mining ◽

Common Knowledge ◽

Multiple Perspectives ◽

Fine Grained ◽

Proposed Model ◽

Meta Learning ◽

Benchmark Datasets ◽

Machine Reading

Target-oriented opinion words extraction (TOWE) seeks to identify opinion expressions oriented to a specific target, and it is a crucial step toward fine-grained opinion mining. Recent neural networks have achieved significant success in this task by building target-aware representations. However, there are still two limitations of these methods that hinder the progress of TOWE. Mainstream approaches typically utilize position indicators to mark the given target, which is a naive strategy and lacks task-specific semantic meaning. Meanwhile, the annotated target-opinion pairs contain rich latent structural knowledge from multiple perspectives, but existing methods only exploit the TOWE view. To tackle these issues, we formulate the TOWE task as a question answering (QA) problem and leverage a machine reading comprehension (MRC) model trained with a multiview paradigm to extract targeted opinions. Specifically, we introduce a template-based pseudo-question generation method and utilize deep attention interaction to build target-aware context representations and extract related opinion words. To take advantage of latent structural correlations, we further cast the opinion-target structure into three distinct yet correlated views and leverage meta-learning to aggregate common knowledge among them to enhance the TOWE task. We evaluate the proposed model on four benchmark datasets, and our method achieves new state-of-the-art results. Extensional experiments have shown that the pipeline method with our approach could surpass existing opinion pair extraction models, including joint methods that are usually believed to work better.

Download Full-text

Single- and Cross-Modality Near Duplicate Image Pairs Detection via Spatial Transformer Comparing CNN

Sensors ◽

10.3390/s21010255 ◽

2021 ◽

Vol 21 (1) ◽

pp. 255

Author(s):

Yi Zhang ◽

Shizhou Zhang ◽

Ying Li ◽

Yanning Zhang

Keyword(s):

Deep Neural Networks ◽

Image Data ◽

Superior Performance ◽

Image Pair ◽

Benchmark Datasets ◽

Correlation Information ◽

Image Pairs ◽

Duplicate Image Detection ◽

Single Modality ◽

Band Image

Recently, both single modality and cross modality near-duplicate image detection tasks have received wide attention in the community of pattern recognition and computer vision. Existing deep neural networks-based methods have achieved remarkable performance in this task. However, most of the methods mainly focus on the learning of each image from the image pair, thus leading to less use of the information between the near duplicate image pairs to some extent. In this paper, to make more use of the correlations between image pairs, we propose a spatial transformer comparing convolutional neural network (CNN) model to compare near-duplicate image pairs. Specifically, we firstly propose a comparing CNN framework, which is equipped with a cross-stream to fully learn the correlation information between image pairs, while considering the features of each image. Furthermore, to deal with the local deformations led by cropping, translation, scaling, and non-rigid transformations, we additionally introduce a spatial transformer comparing CNN model by incorporating a spatial transformer module to the comparing CNN architecture. To demonstrate the effectiveness of the proposed method on both the single-modality and cross-modality (Optical-InfraRed) near-duplicate image pair detection tasks, we conduct extensive experiments on three popular benchmark datasets, namely CaliforniaND (ND means near duplicate), Mir-Flickr Near Duplicate, and TNO Multi-band Image Data Collection. The experimental results show that the proposed method can achieve superior performance compared with many state-of-the-art methods on both tasks.

Download Full-text

RealPoint3D: Generating 3D Point Clouds from a Single Image of Complex Scenarios

Remote Sensing ◽

10.3390/rs11222644 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2644 ◽

Cited By ~ 2

Author(s):

Yan Xia ◽

Cheng Wang ◽

Yusheng Xu ◽

Yu Zang ◽

Weiquan Liu ◽

...

Keyword(s):

Point Cloud ◽

Point Clouds ◽

Superior Performance ◽

3D Point Cloud ◽

Single Image ◽

Query Image ◽

Shape Model ◽

Fine Grained ◽

3D Point Clouds ◽

3D Information

Generating 3D point clouds from a single image has attracted full attention from researchers in the field of multimedia, remote sensing and computer vision. With the recent proliferation of deep learning, various deep models have been proposed for the 3D point cloud generation. However, they require objects to be captured with absolutely clean backgrounds and fixed viewpoints, which highly limits their application in the real environment. To guide 3D point cloud generation, we propose a novel network, RealPoint3D, to integrate prior 3D shape knowledge into the network. Taking additional 3D information, RealPoint3D can handle 3D object generation from a single real image captured from any viewpoint and complex background. Specifically, provided a query image, we retrieve the nearest shape model from a pre-prepared 3D model database. Then, the image, together with the retrieved shape model, is fed into RealPoint3D to generate a fine-grained 3D point cloud. We evaluated the proposed RealPoint3D on the ShapeNet dataset and ObjectNet3D dataset for the 3D point cloud generation. Experimental results and comparisons with state-of-the-art methods demonstrate that our framework achieves superior performance. Furthermore, our proposed framework works well for real images in complex backgrounds (the image has the remaining objects in addition to the reconstructed object, and the reconstructed object may be occluded or truncated) with various viewing angles.

Download Full-text

Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment

10.36227/techrxiv.17031920 ◽

2021 ◽

Author(s):

Mengke Li ◽

Yiu-ming Cheung ◽

Yang Lu

Keyword(s):

Visual Recognition ◽

Deep Neural Networks ◽

Sampling Strategy ◽

Cross Entropy ◽

Superior Performance ◽

Great Success ◽

Effective Number ◽

Entropy Loss ◽

Benchmark Datasets ◽

Varied Amplitude

Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.

Download Full-text

SAR Target Recognition via Meta-Learning and Amortized Variational Inference

Sensors ◽

10.3390/s20205966 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5966

Author(s):

Ke Wang ◽

Gong Zhang

Keyword(s):

Target Recognition ◽

Probability Distributions ◽

Automatic Target Recognition ◽

Variational Inference ◽

Training Data ◽

Superior Performance ◽

Small Data ◽

Meta Learning ◽

Radar Automatic Target Recognition ◽

Global Parameters

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.

Download Full-text

Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity

Scientific Reports ◽

10.1038/s41598-019-50121-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 10

Author(s):

Narjes Rohani ◽

Changiz Eslahchi

Keyword(s):

Neural Network ◽

Drug Interaction ◽

Side Effect ◽

Network Architecture ◽

Selection Process ◽

Superior Performance ◽

Multiple Drug ◽

Interaction Prediction ◽

Benchmark Datasets ◽

Drug Drug Interaction

Abstract Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.

Download Full-text