Subclass Deep Neural Networks: Re-enabling Neglected Classes in Deep Network Training for Multimedia Classification

The article investigated a modification of stochastic gradient descent (SGD), based on the previously developed stabilization theory of discrete dynamical system cycles. Relation between stabilization of cycles in discrete dynamical systems and finding extremum points allowed us to apply new control methods to accelerate gradient descent when approaching local minima. Gradient descent is often used in training deep neural networks on a par with other iterative methods. Two gradient SGD and Adam were experimented, and we conducted comparative experiments. All experiments were conducted during solving a practical problem of teeth recognition on 2-D panoramic images. Network training showed that the new method outperforms the SGD in its capabilities and as for parameters chosen it approaches the capabilities of Adam, which is a “state of the art” method. Thus, practical utility of using control theory in the training of deep neural networks and possibility of expanding its applicability in the process of creating new algorithms in this important field are shown.

Download Full-text

HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6035 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5784-5791

Author(s):

Sungho Shin ◽

Jinhwan Park ◽

Yoonho Boo ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Quantization Noise ◽

Quantization Scheme ◽

Performance Improvements ◽

Training Scheme ◽

Network Training ◽

Training Technique

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.

Download Full-text

Adapting Deep Network Features to Capture Psychological Representations: An Abridged Report

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/697 ◽

2017 ◽

Cited By ~ 6

Author(s):

Joshua C. Peterson ◽

Joshua T. Abbott ◽

Thomas L. Griffiths

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Object Classification ◽

Deep Network ◽

Similarity Judgments ◽

Animal Images ◽

Image Representations ◽

The Relationship ◽

Psychological Experiments

Deep neural networks have become increasingly successful at solving classic perception problems (e.g., recognizing objects), often reaching or surpassing human-level accuracy. In this abridged report of Peterson et al. [2016], we examine the relationship between the image representations learned by these networks and those of humans. We find that deep features learned in service of object classification account for a significant amount of the variance in human similarity judgments for a set of animal images. However, these features do not appear to capture some key qualitative aspects of human representations. To close this gap, we present a method for adapting deep features to align with human similarity judgments, resulting in image representations that can potentially be used to extend the scope of psychological experiments and inform human-centric AI.

Download Full-text

S-DFP: shifted dynamic fixed point for quantized deep neural network training

Neural Computing and Applications ◽

10.1007/s00521-021-06821-x ◽

2021 ◽

Author(s):

Yasufumi Sakai ◽

Yutaka Tamiya

Keyword(s):

Neural Network ◽

Neural Networks ◽

Fixed Point ◽

Deep Neural Networks ◽

Data Representation ◽

Training Methods ◽

Neural Network Training ◽

Training Time ◽

Network Training ◽

Complex Models

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.

Download Full-text

Correspondence between neuroevolution and gradient descent

Nature Communications ◽

10.1038/s41467-021-26568-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Stephen Whitelam ◽

Viktor Selin ◽

Sang-Won Park ◽

Isaac Tamblyn

Keyword(s):

Neural Network ◽

Numerical Simulation ◽

Neural Networks ◽

Loss Function ◽

Gradient Descent ◽

Deep Neural Networks ◽

Gaussian White Noise ◽

Training Methods ◽

Neural Network Training ◽

Network Training

AbstractWe show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.

Download Full-text

A note on the applications of one primary function in deep neural networks

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691321500582 ◽

2021 ◽

Author(s):

Hengjie Chen ◽

Zhong Li

Keyword(s):

Neural Networks ◽

Mathematical Knowledge ◽

Deep Neural Networks ◽

Activation Function ◽

Continuous Functions ◽

Middle Point ◽

Deep Network ◽

Rectified Linear Unit ◽

The Mean ◽

The Difference

By applying fundamental mathematical knowledge, this paper proves that the function [Formula: see text] is an integer no less than [Formula: see text] has the property that the difference between the function value of middle point of arbitrarily two adjacent equidistant distribution nodes on [Formula: see text] and the mean of function values of these two nodes is a constant depending only on the number of nodes if and only if [Formula: see text] By them, we establish an important result about deep neural networks that the function [Formula: see text] can be interpolated by a deep Rectified Linear Unit (ReLU) network with depth [Formula: see text] on the equidistant distribution nodes in interval [Formula: see text] and the error of approximation is [Formula: see text] Then based on the main result that has just been proven and the Chebyshev orthogonal polynomials, we construct a deep network and give the error estimate of approximation to polynomials and continuous functions, respectively. In addition, this paper constructs one deep network with local sparse connections, shared weights and activation function [Formula: see text] and discusses its density and complexity.

Download Full-text

Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6063 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6013-6020

Author(s):

Kai Tian ◽

Yi Xu ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Convex Combination ◽

Network Training ◽

Class Level ◽

Framework Model ◽

Target Network ◽

Regularization Techniques ◽

And Performance ◽

Fitting Problem

Despite powerful representation ability, deep neural networks (DNNs) are prone to over-fitting, because of over-parametrization. Existing works have explored various regularization techniques to tackle the over-fitting problem. Some of them employed soft targets rather than one-hot labels to guide network training (e.g. label smoothing in classification tasks), which are called target-based regularization approaches in this paper. To alleviate the over-fitting problem, here we propose a new and general regularization framework that introduces an auxiliary network to dynamically incorporate guided semantic disturbance to the labels. We call it Network as Regularization (NaR in short). During training, the disturbance is constructed by a convex combination of the predictions of the target network and the auxiliary network. These two networks are initialized separately. And the auxiliary network is trained independently from the target network, while providing instance-level and class-level semantic information to the latter progressively. We conduct extensive experiments to validate the effectiveness of the proposed method. Experimental results show that NaR outperforms many state-of-the-art target-based regularization methods, and other regularization approaches (e.g. mixup) can also benefit from combining with NaR.

Download Full-text

How to Trick a Neural Network? Synthesising Noise to Reduce the Accuracy of Neural Network Image Classification

Herald of the Bauman Moscow State Technical University Series Instrument Engineering ◽

10.18698/0236-3933-2021-1-102-119 ◽

2021 ◽

pp. 102-119

Author(s):

A.P. Karpenko ◽

V.A. Ovchinnikov

Keyword(s):

Neural Network ◽

Neural Networks ◽

Gradient Descent ◽

Deep Neural Networks ◽

Stochastic Gradient Descent ◽

Classification Error ◽

Synthesis Algorithm ◽

Network Training ◽

The Neural Network ◽

Gradient Descent Algorithm

The study aims to develop an algorithm and then software to synthesise noise that could be used to attack deep learning neural networks designed to classify images. We present the results of our analysis of methods for conducting this type of attacks. The synthesis of attack noise is stated as a problem of multidimensional constrained optimization. The main features of the attack noise synthesis algorithm proposed are as follows: we employ the clip function to take constraints on noise into account; we use the top-1 and top-5 classification error ratings as attack noise efficiency criteria; we train our neural networks using backpropagation and Adam's gradient descent algorithm; stochastic gradient descent is employed to solve the optimisation problem indicated above; neural network training also makes use of the augmentation technique. The software was developed in Python using the Pytorch framework to dynamically differentiate the calculation graph and runs under Ubuntu 18.04 and CentOS 7. Our IDE was Visual Studio Code. We accelerated the computation via CUDA executed on a NVIDIA Titan XP GPU. The paper presents the results of a broad computational experiment in synthesising non-universal and universal attack noise types for eight deep neural networks. We show that the attack algorithm proposed is able to increase the neural network error by eight times

Download Full-text

HyperAdam: A Learnable Task-Adaptive Adam for Network Training

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015297 ◽

2019 ◽

Vol 33 ◽

pp. 5297-5304 ◽

Cited By ~ 4

Author(s):

Shipeng Wang ◽

Jian Sun ◽

Zongben Xu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Black Box ◽

Research Topic ◽

Decay Rates ◽

Adaptive Combination ◽

Network Training ◽

Stochastic Optimization Algorithms

Deep neural networks are traditionally trained using humandesigned stochastic optimization algorithms, such as SGD and Adam. Recently, the approach of learning to optimize network parameters has emerged as a promising research topic. However, these learned black-box optimizers sometimes do not fully utilize the experience in human-designed optimizers, therefore have limitation in generalization ability. In this paper, a new optimizer, dubbed as HyperAdam, is proposed that combines the idea of “learning to optimize” and traditional Adam optimizer. Given a network for training, its parameter update in each iteration generated by HyperAdam is an adaptive combination of multiple updates generated by Adam with varying decay rates . The combination weights and decay rates in HyperAdam are adaptively learned depending on the task. HyperAdam is modeled as a recurrent neural network with AdamCell, WeightCell and StateCell. It is justified to be state-of-the-art for various network training, such as multilayer perceptron, CNN and LSTM.

Download Full-text

Digital Hologram Watermarking Based on Multiple Deep Neural Networks Training Reconstruction and Attack

Sensors ◽

10.3390/s21154977 ◽

2021 ◽

Vol 21 (15) ◽

pp. 4977

Author(s):

Ji-Won Kang ◽

Jae-Eun Lee ◽

Jang-Hwan Choi ◽

Woosuk Kim ◽

Jin-Kyum Kim ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Training Method ◽

Digital Hologram ◽

Network Training ◽

Digital Holograms ◽

Holographic Reconstruction

This paper proposes a method to embed and extract a watermark on a digital hologram using a deep neural network. The entire algorithm for watermarking digital holograms consists of three sub-networks. For the robustness of watermarking, an attack simulation is inserted inside the deep neural network. By including attack simulation and holographic reconstruction in the network, the deep neural network for watermarking can simultaneously train invisibility and robustness. We propose a network training method using hologram and reconstruction. After training the proposed network, we analyze the robustness of each attack and perform re-training according to this result to propose a method to improve the robustness. We quantitatively evaluate the results of robustness against various attacks and show the reliability of the proposed technique.

Download Full-text