AAR-CNNs: Auto Adaptive Regularized Convolutional Neural Networks

In order to address the overfitting problem caused by the small or simple training datasets and the large model’s size in Convolutional Neural Networks (CNNs), a novel Auto Adaptive Regularization (AAR) method is proposed in this paper. The relevant networks can be called AAR-CNNs. AAR is the first method using the “abstraction extent” (predicted by AE net) and a tiny learnable module (SE net) to auto adaptively predict more accurate and individualized regularization information. The AAR module can be directly inserted into every stage of any popular networks and trained end to end to improve the networks’ flexibility. This method can not only regularize the network at both the forward and the backward processes in the training phase, but also regularize the network on a more refined level (channel or pixel level) depending on the abstraction extent’s form. Comparative experiments are performed on low resolution ImageNet, CIFAR and SVHN datasets. Experimental results show that the AAR-CNNs can achieve state-of-the-art performances on these datasets.

Download Full-text

Sanitizing hidden activations for improving adversarial robustness of convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210371 ◽

2021 ◽

pp. 1-11

Author(s):

Tianshi Mu ◽

Kequan Lin ◽

Huabing Zhang ◽

Jian Wang

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Black Box ◽

Experimental Results ◽

Amplification Effect ◽

Wide Range ◽

Adversarial Examples

Deep learning is gaining significant traction in a wide range of areas. Whereas, recent studies have demonstrated that deep learning exhibits the fatal weakness on adversarial examples. Due to the black-box nature and un-transparency problem of deep learning, it is difficult to explain the reason for the existence of adversarial examples and also hard to defend against them. This study focuses on improving the adversarial robustness of convolutional neural networks. We first explore how adversarial examples behave inside the network through visualization. We find that adversarial examples produce perturbations in hidden activations, which forms an amplification effect to fool the network. Motivated by this observation, we propose an approach, termed as sanitizing hidden activations, to help the network correctly recognize adversarial examples by eliminating or reducing the perturbations in hidden activations. To demonstrate the effectiveness of our approach, we conduct experiments on three widely used datasets: MNIST, CIFAR-10 and ImageNet, and also compare with state-of-the-art defense techniques. The experimental results show that our sanitizing approach is more generalized to defend against different kinds of attacks and can effectively improve the adversarial robustness of convolutional neural networks.

Download Full-text

Effective Plug-Ins for Reducing Inference-Latency of Spiking Convolutional Neural Networks During Inference Phase

Frontiers in Computational Neuroscience ◽

10.3389/fncom.2021.697469 ◽

2021 ◽

Vol 15 ◽

Author(s):

Xuan Chen ◽

Xiaopeng Yuan ◽

Gaoming Fu ◽

Yuanyong Luo ◽

Tao Yue ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Network Architecture ◽

State Of The Art ◽

Low Cost ◽

Training Phase ◽

Working Mechanism ◽

Simulation Results ◽

New Perspective ◽

Fully Connected

Convolutional Neural Networks (CNNs) are effective and mature in the field of classification, while Spiking Neural Networks (SNNs) are energy-saving for their sparsity of data flow and event-driven working mechanism. Previous work demonstrated that CNNs can be converted into equivalent Spiking Convolutional Neural Networks (SCNNs) without obvious accuracy loss, including different functional layers such as Convolutional (Conv), Fully Connected (FC), Avg-pooling, Max-pooling, and Batch-Normalization (BN) layers. To reduce inference-latency, existing researches mainly concentrated on the normalization of weights to increase the firing rate of neurons. There are also some approaches during training phase or altering the network architecture. However, little attention has been paid on the end of inference phase. From this new perspective, this paper presents 4 stopping criterions as low-cost plug-ins to reduce the inference-latency of SCNNs. The proposed methods are validated using MATLAB and PyTorch platforms with Spiking-AlexNet for CIFAR-10 dataset and Spiking-LeNet-5 for MNIST dataset. Simulation results reveal that, compared to the state-of-the-art methods, the proposed method can shorten the average inference-latency of Spiking-AlexNet from 892 to 267 time steps (almost 3.34 times faster) with the accuracy decline from 87.95 to 87.72%. With our methods, 4 types of Spiking-LeNet-5 only need 24–70 time steps per image with the accuracy decline not more than 0.1%, while models without our methods require 52–138 time steps, almost 1.92 to 3.21 times slower than us.

Download Full-text

End-to-end, sequence-to-sequence probabilistic visual odometry through deep neural networks

The International Journal of Robotics Research ◽

10.1177/0278364917734298 ◽

2017 ◽

Vol 37 (4-5) ◽

pp. 513-542 ◽

Cited By ~ 41

Author(s):

Sen Wang ◽

Ronald Clark ◽

Hongkai Wen ◽

Niki Trigoni

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Visual Odometry ◽

Feature Representation ◽

Geometric Information ◽

Geometric Problem ◽

End To End

This paper studies visual odometry (VO) from the perspective of deep learning. After tremendous efforts in the robotics and computer vision communities over the past few decades, state-of-the-art VO algorithms have demonstrated incredible performance. However, since the VO problem is typically formulated as a pure geometric problem, one of the key features still missing from current VO systems is the capability to automatically gain knowledge and improve performance through learning. In this paper, we investigate whether deep neural networks can be effective and beneficial to the VO problem. An end-to-end, sequence-to-sequence probabilistic visual odometry (ESP-VO) framework is proposed for the monocular VO based on deep recurrent convolutional neural networks. It is trained and deployed in an end-to-end manner, that is, directly inferring poses and uncertainties from a sequence of raw images (video) without adopting any modules from the conventional VO pipeline. It can not only automatically learn effective feature representation encapsulating geometric information through convolutional neural networks, but also implicitly model sequential dynamics and relation for VO using deep recurrent neural networks. Uncertainty is also derived along with the VO estimation without introducing much extra computation. Extensive experiments on several datasets representing driving, flying and walking scenarios show competitive performance of the proposed ESP-VO to the state-of-the-art methods, demonstrating a promising potential of the deep learning technique for VO and verifying that it can be a viable complement to current VO systems.

Download Full-text

Graph Convolutional Networks for Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017370 ◽

2019 ◽

Vol 33 ◽

pp. 7370-7377 ◽

Cited By ~ 90

Author(s):

Liang Yao ◽

Chengsheng Mao ◽

Yuan Luo

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Language Processing ◽

Text Classification ◽

State Of The Art ◽

Classical Problem ◽

Experimental Results ◽

Training Data ◽

Convolutional Networks ◽

Single Text

Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

Download Full-text

Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks

Electronics ◽

10.3390/electronics10070851 ◽

2021 ◽

Vol 10 (7) ◽

pp. 851

Author(s):

Michael Taenzer ◽

Stylianos I. Mimilakis ◽

Jakob Abeßer

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Piano Music ◽

Experimental Results ◽

Pitch Estimation ◽

Feature Representations ◽

Series Of Experiments ◽

Music Recordings ◽

Recent Extension

In this work, we propose considering the information from a polyphony for multi-pitch estimation (MPE) in piano music recordings. To that aim, we propose a method for local polyphony estimation (LPE), which is based on convolutional neural networks (CNNs) trained in a supervised fashion to explicitly predict the degree of polyphony. We investigate two feature representations as inputs to our method, in particular, the Constant-Q Transform (CQT) and its recent extension Folded-CQT (F-CQT). To evaluate the performance of our method, we conduct a series of experiments on real and synthetic piano recordings based on the MIDI Aligned Piano Sounds (MAPS) and the Saarland Music Data (SMD) datasets. We compare our approaches with a state-of-the art piano transcription method by informing said method with the LPE knowledge in a postprocessing stage. The experimental results suggest that using explicit LPE information can refine MPE predictions. Furthermore, it is shown that, on average, the CQT representation is preferred over F-CQT for LPE.

Download Full-text

Common Kernels and Convolutions in Binary- and Ternary-Weight Neural Networks

Journal of Circuits System and Computers ◽

10.1142/s0218126621501589 ◽

2020 ◽

pp. 2150158

Author(s):

Byungmin Ahn ◽

Taewhan Kim

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Optimization Algorithm ◽

State Of The Art ◽

The State ◽

Memory Access ◽

Experimental Results ◽

Hardware Platform ◽

Access Latency ◽

The Common

A new algorithm for extracting common kernels and convolutions to maximally eliminate the redundant operations among the convolutions in binary- and ternary-weight convolutional neural networks is presented. Precisely, we propose (1) a new algorithm of common kernel extraction to overcome the local and limited exploration of common kernel candidates by the existing method, and subsequently apply (2) a new concept of common convolution extraction to maximally eliminate the redundancy in the convolution operations. In addition, our algorithm is able to (3) tune in minimizing the number of resulting kernels for convolutions, thereby saving the total memory access latency for kernels. Experimental results on ternary-weight VGG-16 demonstrate that our convolution optimization algorithm is very effective, reducing the total number of operations for all convolutions by [Formula: see text], thereby reducing the total number of execution cycles on hardware platform by 22.4% while using [Formula: see text] fewer kernels over that of the convolution utilizing the common kernels extracted by the state-of-the-art algorithm.

Download Full-text

Generalize Sentence Representation with Self-Inference

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6481 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9394-9401

Author(s):

Kai-Chou Yang ◽

Hung-Yu Kao

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Contextual Information ◽

Interaction Space ◽

Black Box ◽

Experimental Results ◽

Inference Process ◽

Model Sets

In this paper, we propose Self Inference Neural Network (SINN), a simple yet efficient sentence encoder which leverages knowledge from recurrent and convolutional neural networks. SINN gathers semantic evidence in an interaction space which is subsequently fused by a shared vector gate to determine the most relevant mixture of contextual information. We evaluate the proposed method on four benchmarks among three NLP tasks. Experimental results demonstrate that our model sets a new state-of-the-art on MultiNLI, Scitail and is competitive on the remaining two datasets over all sentence encoding methods. The encoding and inference process in our model is highly interpretable. Through visualizations of the fusion component, we open the black box of our network and explore the applicability of the base encoding methods case by case.

Download Full-text

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/406 ◽

2017 ◽

Cited By ~ 69

Author(s):

Jin Wang ◽

Zhongyuan Wang ◽

Dawei Zhang ◽

Jun Yan

Keyword(s):

Neural Networks ◽

Knowledge Base ◽

Convolutional Neural Networks ◽

Text Classification ◽

State Of The Art ◽

Experimental Results ◽

Text Representation ◽

Deep Convolutional Neural Networks ◽

Short Text ◽

Fine Grained

Text classification is a fundamental task in NLP applications. Most existing work relied on either explicit or implicit text representation to address this problem. While these techniques work well for sentences, they can not easily be applied to short text because of its shortness and sparsity. In this paper, we propose a framework based on convolutional neural networks that combines explicit and implicit representations of short text for classification. We first conceptualize a short text as a set of relevant concepts using a large taxonomy knowledge base. We then obtain the embedding of short text by coalescing the words and relevant concepts on top of pre-trained word vectors. We further incorporate character level features into our model to capture fine-grained subword information. Experimental results on five commonly used datasets show that our proposed method significantly outperforms state-of-the-art methods.

Download Full-text

Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02376-3 ◽

2021 ◽

Author(s):

Jorge F. Lazo ◽

Aldo Marzullo ◽

Sara Moccia ◽

Michele Catellani ◽

Benoit Rosa ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Automatic Segmentation ◽

Temporal Information ◽

Invasive Technique ◽

Dice Similarity Coefficient ◽

Specular Reflections ◽

Lumen Segmentation ◽

Previous State

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.

Download Full-text

Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9377745 ◽

2020 ◽

Author(s):

Alexander Egiazarov ◽

Fabio Massimo Zennaro ◽

Vasileios Mavroeidis

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Semantic Segmentation ◽

End To End

Download Full-text