scholarly journals Gated Convolutional Networks with Hybrid Connectivity for Image Classification

2020 ◽  
Vol 34 (07) ◽  
pp. 12581-12588
Author(s):  
Chuanguang Yang ◽  
Zhulin An ◽  
Hui Zhu ◽  
Xiaolong Hu ◽  
Kun Zhang ◽  
...  

We propose a simple yet effective method to reduce the redundancy of DenseNet by substantially decreasing the number of stacked modules by replacing the original bottleneck by our SMG module, which is augmented by local residual. Furthermore, SMG module is equipped with an efficient two-stage pipeline, which aims to DenseNet-like architectures that need to integrate all previous outputs, i.e., squeezing the incoming informative but redundant features gradually by hierarchical convolutions as a hourglass shape and then exciting it by multi-kernel depthwise convolutions, the output of which would be compact and hold more informative multi-scale features. We further develop a forget and an update gate by introducing the popular attention modules to implement the effective fusion instead of a simple addition between reused and new features. Due to the Hybrid Connectivity (nested combination of global dense and local residual) and Gated mechanisms, we called our network as the HCGNet. Experimental results on CIFAR and ImageNet datasets show that HCGNet is more prominently efficient than DenseNet, and can also significantly outperform state-of-the-art networks with less complexity. Moreover, HCGNet also shows the remarkable interpretability and robustness by network dissection and adversarial defense, respectively. On MS-COCO, HCGNet can consistently learn better features than popular backbones.

Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3818
Author(s):  
Ye Zhang ◽  
Yi Hou ◽  
Shilin Zhou ◽  
Kewei Ouyang

Recent advances in time series classification (TSC) have exploited deep neural networks (DNN) to improve the performance. One promising approach encodes time series as recurrence plot (RP) images for the sake of leveraging the state-of-the-art DNN to achieve accuracy. Such an approach has been shown to achieve impressive results, raising the interest of the community in it. However, it remains unsolved how to handle not only the variability in the distinctive region scale and the length of sequences but also the tendency confusion problem. In this paper, we tackle the problem using Multi-scale Signed Recurrence Plots (MS-RP), an improvement of RP, and propose a novel method based on MS-RP images and Fully Convolutional Networks (FCN) for TSC. This method first introduces phase space dimension and time delay embedding of RP to produce multi-scale RP images; then, with the use of asymmetrical structure, constructed RP images can represent very long sequences (>700 points). Next, MS-RP images are obtained by multiplying designed sign masks in order to remove the tendency confusion. Finally, FCN is trained with MS-RP images to perform classification. Experimental results on 45 benchmark datasets demonstrate that our method improves the state-of-the-art in terms of classification accuracy and visualization evaluation.


2019 ◽  
Vol 10 (1) ◽  
pp. 101 ◽  
Author(s):  
Yadong Yang ◽  
Chengji Xu ◽  
Feng Dong ◽  
Xiaofeng Wang

Computer vision systems are insensitive to the scale of objects in natural scenes, so it is important to study the multi-scale representation of features. Res2Net implements hierarchical multi-scale convolution in residual blocks, but its random grouping method affects the robustness and intuitive interpretability of the network. We propose a new multi-scale convolution model based on multiple attention. It introduces the attention mechanism into the structure of a Res2-block to better guide feature expression. First, we adopt channel attention to score channels and sort them in descending order of the feature’s importance (Channels-Sort). The sorted residual blocks are grouped and intra-block hierarchically convolved to form a single attention and multi-scale block (AMS-block). Then, we implement channel attention on the residual small blocks to constitute a dual attention and multi-scale block (DAMS-block). Introducing spatial attention before sorting the channels to form multi-attention multi-scale blocks(MAMS-block). A MAMS-convolutional neural network (CNN) is a series of multiple MAMS-blocks. It enables significant information to be expressed at more levels, and can also be easily grafted into different convolutional structures. Limited by hardware conditions, we only prove the validity of the proposed ideas through convolutional networks of the same magnitude. The experimental results show that the convolution model with an attention mechanism and multi-scale features is superior in image classification.


2020 ◽  
Vol 34 (07) ◽  
pp. 11037-11044
Author(s):  
Lianghua Huang ◽  
Xin Zhao ◽  
Kaiqi Huang

A key capability of a long-term tracker is to search for targets in very large areas (typically the entire image) to handle possible target absences or tracking failures. However, currently there is a lack of such a strong baseline for global instance search. In this work, we aim to bridge this gap. Specifically, we propose GlobalTrack, a pure global instance search based tracker that makes no assumption on the temporal consistency of the target's positions and scales. GlobalTrack is developed based on two-stage object detectors, and it is able to perform full-image and multi-scale search of arbitrary instances with only a single query as the guide. We further propose a cross-query loss to improve the robustness of our approach against distractors. With no online learning, no punishment on position or scale changes, no scale smoothing and no trajectory refinement, our pure global instance search based tracker achieves comparable, sometimes much better performance on four large-scale tracking benchmarks (i.e., 52.1% AUC on LaSOT, 63.8% success rate on TLP, 60.3% MaxGM on OxUvA and 75.4% normalized precision on TrackingNet), compared to state-of-the-art approaches that typically require complex post-processing. More importantly, our tracker runs without cumulative errors, i.e., any type of temporary tracking failures will not affect its performance on future frames, making it ideal for long-term tracking. We hope this work will be a strong baseline for long-term tracking and will stimulate future works in this area.


Author(s):  
Liang Yao ◽  
Chengsheng Mao ◽  
Yuan Luo

Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.


2021 ◽  
Vol 422 ◽  
pp. 34-50
Author(s):  
Houzhang Fang ◽  
Mingjiang Xia ◽  
Hehui Liu ◽  
Yi Chang ◽  
Liming Wang ◽  
...  

Author(s):  
Gang Xue ◽  
Shifeng Liu ◽  
Yicao Ma

Abstract Image recognition supports several applications, for instance, facial recognition, image classification, and achieving accurate fruit and vegetable classification is very important in fresh supply chain, factories, supermarkets, and other fields. In this paper, we develop a hybrid deep learning-based fruit image classification framework, named attention-based densely connected convolutional networks with convolution autoencoder (CAE-ADN), which uses a convolution autoencoder to pre-train the images and uses an attention-based DenseNet to extract the features of image. In the first part of the framework, an unsupervised method with a set of images is applied to pre-train the greedy layer-wised CAE. We use CAE structure to initialize a set of weights and bias of ADN. In the second part of the framework, the supervised ADN with the ground truth is implemented. The final part of the framework makes a prediction of the category of fruits. We use two fruit datasets to test the effectiveness of the model, experimental results show the effectiveness of the framework, and the framework can improve the efficiency of fruit sorting, which can reduce costs of fresh supply chain, factories, supermarkets, etc.


Author(s):  
Ximing Li ◽  
Yang Wang

Partial Multi-label Learning (PML) aims to induce the multi-label predictor from datasets with noisy supervision, where each training instance is associated with several candidate labels but only partially valid. To address the noisy issue, the existing PML methods basically recover the ground-truth labels by leveraging the ground-truth confidence of the candidate label, i.e., the likelihood of a candidate label being a ground-truth one. However, they neglect the information from non-candidate labels, which potentially contributes to the ground-truth label recovery. In this paper, we propose to recover the ground-truth labels, i.e., estimating the ground-truth confidences, from the label enrichment, composed of the relevance degrees of candidate labels and irrelevance degrees of non-candidate labels. Upon this observation, we further develop a novel two-stage PML method, namely Partial Multi-Label Learning with Label Enrichment-Recovery (PML3ER), where in the first stage, it estimates the label enrichment with unconstrained label propagation, then jointly learns the ground-truth confidence and multi-label predictor given the label enrichment. Experimental results validate that PML3ER outperforms the state-of-the-art PML methods.


2020 ◽  
Vol 11 (6) ◽  
pp. 65-73
Author(s):  
Tingwei Li ◽  
Ruiwen Zhang ◽  
Qing Li

Graph convolutional networks (GCNs) have been proven to be effective for processing structured data, so that it can effectively capture the features of related nodes and improve the performance of model. More attention is paid to employing GCN in Skeleton-Based action recognition. But there are some challenges with the existing methods based on GCNs. First, the consistency of temporal and spatial features is ignored due to extracting features node by node and frame by frame. We design a generic representation of skeleton sequences for action recognition and propose a novel model called Temporal Graph Networks (TGN), which can obtain spatiotemporal features simultaneously. Secondly, the adjacency matrix of graph describing the relation of joints are mostly depended on the physical connection between joints. We propose a multi-scale graph strategy to appropriately describe the relations between joints in skeleton graph, which adopts a full-scale graph, part-scale graph and core-scale graph to capture the local features of each joint and the contour features of important joints. Extensive experiments are conducted on two large datasets including NTU RGB+D and Kinetics Skeleton. And the experiments results show that TGN with our graph strategy outperforms other state-of-the-art methods.


Author(s):  
Tanli Zuo ◽  
Yukun Qiu ◽  
Wei-Shi Zheng

Graph convolutional networks (GCNs) have been widely used to process graph-structured data. However, existing GNN methods do not explicitly extract critical structures, which reflect the intrinsic property of a graph. In this work, we propose a novel GCN module named Neighbor Combinatorial ATtention (NCAT) to find critical structure in graph-structured data. NCAT attempts to match combinatorial neighbors with learnable patterns and assigns different weights to each combination based on the matching degree between the patterns and combinations. By stacking several NCAT modules, we can extract hierarchical structures that is helpful for down-stream tasks. Our experimental results show that NCAT achieves state-of-the-art performance on several benchmark graph classification datasets. In addition, we interpret what kind of features our model learned by visualizing the extracted critical structures.


Author(s):  
Gong Cheng ◽  
Decheng Gao ◽  
Yang Liu ◽  
Junwei Han

Convolutional neural networks (CNNs) have shown their promise for image classification task. However, global CNN features still lack geometric invariance for addressing the problem of intra-class variations and so are not optimal for multi-label image classification. This paper proposes a new and effective framework built upon CNNs to learn Multi-scale and Discriminative Part Detectors (MsDPD)-based feature representations for multi-label image classification. Specifically, at each scale level, we (i) first present an entropy-rank based scheme to generate and select a set of discriminative part detectors (DPD), and then (ii) obtain a number of DPD-based convolutional feature maps with each feature map representing the occurrence probability of a particular part detector and learn DPD-based features by using a task-driven pooling scheme. The two steps are formulated into a unified framework by developing a new objective function, which jointly trains part detectors incrementally and integrates the learning of feature representations into the classification task. Finally, the multi-scale features are fused to produce the predictions. Experimental results on PASCAL VOC 2007 and VOC 2012 datasets demonstrate that the proposed method achieves better accuracy when compared with the existing state-of-the-art multi-label classification methods.


Sign in / Sign up

Export Citation Format

Share Document