HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015297 ◽

2019 ◽

Vol 33 ◽

pp. 5297-5304 ◽

Cited By ~ 4

Author(s):

Shipeng Wang ◽

Jian Sun ◽

Zongben Xu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Research Topic ◽

Decay Rates ◽

Adaptive Combination ◽

Network Training ◽

Stochastic Optimization Algorithms

Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

Interpolation Consistency Training for Semi-supervised Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/504 ◽

2019 ◽

Cited By ~ 39

Author(s):

Vikas Verma ◽

Alex Lamb ◽

Juho Kannala ◽

Yoshua Bengio ◽

David Lopez-Paz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Supervised Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Network Architectures ◽

Low Density ◽

Decision Boundary ◽

Classification Problems

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey

Machine Learning and Knowledge Extraction ◽

10.3390/make3040048 ◽

2021 ◽

Vol 3 (4) ◽

pp. 966-989

Author(s):

Vanessa Buhrmester ◽

David Münch ◽

Michael Arens

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Complex Data ◽

Comprehensive Overview ◽

Nonlinear Structure ◽

Black Boxes ◽

Insight Into

Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.

Modular Dynamic Neural Network: A Continual Learning Architecture

Applied Sciences ◽

10.3390/app112412078 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12078

Author(s):

Daniel Turner ◽

Pedro J. S. Cardoso ◽

João M. F. Rodrigues

Keyword(s):

Neural Network ◽

Neural Networks ◽

Feature Extraction ◽

Deep Neural Networks ◽

State Of The Art ◽

Simple Task ◽

Dynamic Neural Network ◽

Main Components ◽

Over Time ◽

Continual Learning

Learning to recognize a new object after having learned to recognize other objects may be a simple task for a human, but not for machines. The present go-to approaches for teaching a machine to recognize a set of objects are based on the use of deep neural networks (DNN). So, intuitively, the solution for teaching new objects on the fly to a machine should be DNN. The problem is that the trained DNN weights used to classify the initial set of objects are extremely fragile, meaning that any change to those weights can severely damage the capacity to perform the initial recognitions; this phenomenon is known as catastrophic forgetting (CF). This paper presents a new (DNN) continual learning (CL) architecture that can deal with CF, the modular dynamic neural network (MDNN). The presented architecture consists of two main components: (a) the ResNet50-based feature extraction component as the backbone; and (b) the modular dynamic classification component, which consists of multiple sub-networks and progressively builds itself up in a tree-like structure that rearranges itself as it learns over time in such a way that each sub-network can function independently. The main contribution of the paper is a new architecture that is strongly based on its modular dynamic training feature. This modular structure allows for new classes to be added while only altering specific sub-networks in such a way that previously known classes are not forgotten. Tests on the CORe50 dataset showed results above the state of the art for CL architectures.

HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6035 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5784-5791

Author(s):

Sungho Shin ◽

Jinhwan Park ◽

Yoonho Boo ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Quantization Noise ◽

Quantization Scheme ◽

Performance Improvements ◽

Training Scheme ◽

Network Training ◽

Training Technique

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.

PIPxResNet: Penalty Induced Prototype-Based eXplainable Residual Neural Network for Heartbeat Classification

10.21203/rs.3.rs-852812/v1 ◽

2021 ◽

Author(s):

Deepankar Nankani ◽

Rashmi Dutta Baruah

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Early Stage ◽

Black Box ◽

Training Dataset ◽

Medical Attention ◽

Generative Adversarial Network ◽

Heartbeat Classification ◽

General Physicians

Abstract Early stage heartbeat classification using the electrocardiogram signals can prevent cardiovascular diseases that causes millions of deaths annually around the world. In the past, researchers have used deep neural networks to achieve significant performance for heartbeat classification but their black-box nature and prediction rationale limits real-world deployment. We propose a Penalty Induced Prototype based eXplainable Residual Neural Network (PIPxResNet) that addresses the black-box nature of deep neural networks. PIPxResNet encodes the temporal variations of heartbeats by employing pretrained residual neural network following the concept of task transfer learning. The algorithm further extracts prototypes that are most representative of the training dataset that explain model predictions to general physicians, making them clinically relevant. The prototypes of a particular class having close resemblance to other class prototypes are penalised and their contribution towards corresponding class is reduced. In addition, the classification performance is improved by synthesising regular and irregular heartbeats using a deep convolution conditional generative adversarial network. The proposed method can easily be adopted to other domains that requires explanations for the classification tasks. The PIPxResNet performs at par with existing state-of-the-art algorithms without compromising individual class performance when tested on four publicly available annotated datasets. The proposed model is capable to perform automated screening and provide medical attention by simulating a clinical decision support system for general physicians.

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Electronics ◽

10.3390/electronics10070810 ◽

2021 ◽

Vol 10 (7) ◽

pp. 810

Author(s):

Carlos Hernandez-Olivan ◽

Ignacio Zay Pinilla ◽

Carlos Hernandez-Lopez ◽

Jose R. Beltran

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

High Impact ◽

Critical Problem ◽

Music Transcription ◽

Automatic Music Transcription ◽

Music Information ◽

Method Show

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.

ThriftyNets: Convolutional Neural Networks with Tiny Parameter Budget

IoT ◽

10.3390/iot2020012 ◽

2021 ◽

Vol 2 (2) ◽

pp. 222-235

Author(s):

Guillaume Coiffier ◽

Ghouthi Boukli Hacene ◽

Vincent Gripon

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Convolutional Neural Network ◽

Spatial Resolution ◽

Network Architecture ◽

Deep Neural Networks ◽

State Of The Art ◽

Feature Maps ◽

Neural Network Architecture

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.

Towards a high robust neural network via feature matching

International Journal of Multimedia Information Retrieval ◽

10.1007/s13735-021-00219-0 ◽

2021 ◽

Author(s):

Jian Li ◽

Yanming Guo ◽

Songyang Lao ◽

Yulun Wu ◽

Liang Bai ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

Feature Matching ◽

Feature Vector ◽

State Of The Art ◽

Model Performance ◽

Image Features ◽

Classification Systems ◽

Adversarial Attack

AbstractImage classification systems have been found vulnerable to adversarial attack, which is imperceptible to human but can easily fool deep neural networks. Recent researches indicate that regularizing the network by introducing randomness could greatly improve the model’s robustness against adversarial attack, but the randomness module would normally involve complex calculations and numerous additional parameters and seriously affect the model performance on clean data. In this paper, we propose a feature matching module to regularize the network. Specifically, our model learns a feature vector for each category and imposes additional restrictions on image features. Then, the similarity between image features and category features is used as the basis for classification. Our method does not introduce any additional network parameters than undefended model and can be easily integrated into any neural network. Experiments on the CIFAR10 and SVHN datasets highlight that our proposed module can effectively improve both clean data and perturbed data accuracy in comparison with the state-of-the-art defense methods and outperform the L2P method by 6.3$$\%$$ % , 24$$\%$$ % on clean and perturbed data, respectively, using ResNet-V2(18) architecture.

Tri-net for Semi-Supervised Deep Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/278 ◽

2018 ◽

Cited By ~ 11

Author(s):

Dong-Dong Chen ◽

Wei Wang ◽

Wei Gao ◽

Zhi-Hua Zhou

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Error Rate ◽

Deep Neural Network ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Learning Methods ◽

Model Initialization

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.

S-DFP: shifted dynamic fixed point for quantized deep neural network training

Neural Computing and Applications ◽

10.1007/s00521-021-06821-x ◽

2021 ◽

Author(s):

Yasufumi Sakai ◽

Yutaka Tamiya

Keyword(s):

Neural Network ◽

Neural Networks ◽

Fixed Point ◽

Deep Neural Networks ◽

Data Representation ◽

Training Methods ◽

Neural Network Training ◽

Training Time ◽

Network Training ◽

Complex Models

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.