scholarly journals HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Author(s):  
Shipeng Wang ◽  
Jian Sun ◽  
Zongben Xu

Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

Author(s):  
Vikas Verma ◽  
Alex Lamb ◽  
Juho Kannala ◽  
Yoshua Bengio ◽  
David Lopez-Paz

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.


2021 ◽  
Vol 3 (4) ◽  
pp. 966-989
Author(s):  
Vanessa Buhrmester ◽  
David Münch ◽  
Michael Arens

Deep Learning is a state-of-the-art technique to make inference on extensive or complex data. As a black box model due to their multilayer nonlinear structure, Deep Neural Networks are often criticized as being non-transparent and their predictions not traceable by humans. Furthermore, the models learn from artificially generated datasets, which often do not reflect reality. By basing decision-making algorithms on Deep Neural Networks, prejudice and unfairness may be promoted unknowingly due to a lack of transparency. Hence, several so-called explanators, or explainers, have been developed. Explainers try to give insight into the inner structure of machine learning black boxes by analyzing the connection between the input and output. In this survey, we present the mechanisms and properties of explaining systems for Deep Neural Networks for Computer Vision tasks. We give a comprehensive overview about the taxonomy of related studies and compare several survey papers that deal with explainability in general. We work out the drawbacks and gaps and summarize further research ideas.


2021 ◽  
Vol 11 (24) ◽  
pp. 12078
Author(s):  
Daniel Turner ◽  
Pedro J. S. Cardoso ◽  
João M. F. Rodrigues

Learning to recognize a new object after having learned to recognize other objects may be a simple task for a human, but not for machines. The present go-to approaches for teaching a machine to recognize a set of objects are based on the use of deep neural networks (DNN). So, intuitively, the solution for teaching new objects on the fly to a machine should be DNN. The problem is that the trained DNN weights used to classify the initial set of objects are extremely fragile, meaning that any change to those weights can severely damage the capacity to perform the initial recognitions; this phenomenon is known as catastrophic forgetting (CF). This paper presents a new (DNN) continual learning (CL) architecture that can deal with CF, the modular dynamic neural network (MDNN). The presented architecture consists of two main components: (a) the ResNet50-based feature extraction component as the backbone; and (b) the modular dynamic classification component, which consists of multiple sub-networks and progressively builds itself up in a tree-like structure that rearranges itself as it learns over time in such a way that each sub-network can function independently. The main contribution of the paper is a new architecture that is strongly based on its modular dynamic training feature. This modular structure allows for new classes to be added while only altering specific sub-networks in such a way that previously known classes are not forgotten. Tests on the CORe50 dataset showed results above the state of the art for CL architectures.


2020 ◽  
Vol 34 (04) ◽  
pp. 5784-5791
Author(s):  
Sungho Shin ◽  
Jinhwan Park ◽  
Yoonho Boo ◽  
Wonyong Sung

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.


2021 ◽  
Author(s):  
Deepankar Nankani ◽  
Rashmi Dutta Baruah

Abstract Early stage heartbeat classification using the electrocardiogram signals can prevent cardiovascular diseases that causes millions of deaths annually around the world. In the past, researchers have used deep neural networks to achieve significant performance for heartbeat classification but their black-box nature and prediction rationale limits real-world deployment. We propose a Penalty Induced Prototype based eXplainable Residual Neural Network (PIPxResNet) that addresses the black-box nature of deep neural networks. PIPxResNet encodes the temporal variations of heartbeats by employing pretrained residual neural network following the concept of task transfer learning. The algorithm further extracts prototypes that are most representative of the training dataset that explain model predictions to general physicians, making them clinically relevant. The prototypes of a particular class having close resemblance to other class prototypes are penalised and their contribution towards corresponding class is reduced. In addition, the classification performance is improved by synthesising regular and irregular heartbeats using a deep convolution conditional generative adversarial network. The proposed method can easily be adopted to other domains that requires explanations for the classification tasks. The PIPxResNet performs at par with existing state-of-the-art algorithms without compromising individual class performance when tested on four publicly available annotated datasets. The proposed model is capable to perform automated screening and provide medical attention by simulating a clinical decision support system for general physicians.


Electronics ◽  
2021 ◽  
Vol 10 (7) ◽  
pp. 810
Author(s):  
Carlos Hernandez-Olivan ◽  
Ignacio Zay Pinilla ◽  
Carlos Hernandez-Lopez ◽  
Jose R. Beltran

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work.


IoT ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 222-235
Author(s):  
Guillaume Coiffier ◽  
Ghouthi Boukli Hacene ◽  
Vincent Gripon

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.


Author(s):  
Jian Li ◽  
Yanming Guo ◽  
Songyang Lao ◽  
Yulun Wu ◽  
Liang Bai ◽  
...  

AbstractImage classification systems have been found vulnerable to adversarial attack, which is imperceptible to human but can easily fool deep neural networks. Recent researches indicate that regularizing the network by introducing randomness could greatly improve the model’s robustness against adversarial attack, but the randomness module would normally involve complex calculations and numerous additional parameters and seriously affect the model performance on clean data. In this paper, we propose a feature matching module to regularize the network. Specifically, our model learns a feature vector for each category and imposes additional restrictions on image features. Then, the similarity between image features and category features is used as the basis for classification. Our method does not introduce any additional network parameters than undefended model and can be easily integrated into any neural network. Experiments on the CIFAR10 and SVHN datasets highlight that our proposed module can effectively improve both clean data and perturbed data accuracy in comparison with the state-of-the-art defense methods and outperform the L2P method by 6.3$$\%$$ % , 24$$\%$$ % on clean and perturbed data, respectively, using ResNet-V2(18) architecture.


Author(s):  
Dong-Dong Chen ◽  
Wei Wang ◽  
Wei Gao ◽  
Zhi-Hua Zhou

Deep neural networks have witnessed great successes in various real applications, but it requires a large number of labeled data for training. In this paper, we propose tri-net, a deep neural network which is able to use massive unlabeled data to help learning with limited labeled data. We consider model initialization, diversity augmentation and pseudo-label editing simultaneously. In our work, we utilize output smearing to initialize modules, use fine-tuning on labeled data to augment diversity and eliminate unstable pseudo-labels to alleviate the influence of suspicious pseudo-labeled data. Experiments show that our method achieves the best performance in comparison with state-of-the-art semi-supervised deep learning methods. In particular, it achieves 8.30% error rate on CIFAR-10 by using only 4000 labeled examples.


Author(s):  
Yasufumi Sakai ◽  
Yutaka Tamiya

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.


Sign in / Sign up

Export Citation Format

Share Document