scholarly journals S-DFP: shifted dynamic fixed point for quantized deep neural network training

Author(s):  
Yasufumi Sakai ◽  
Yutaka Tamiya

AbstractRecent advances in deep neural networks have achieved higher accuracy with more complex models. Nevertheless, they require much longer training time. To reduce the training time, training methods using quantized weight, activation, and gradient have been proposed. Neural network calculation by integer format improves the energy efficiency of hardware for deep learning models. Therefore, training methods for deep neural networks with fixed point format have been proposed. However, the narrow data representation range of the fixed point format degrades neural network accuracy. In this work, we propose a new fixed point format named shifted dynamic fixed point (S-DFP) to prevent accuracy degradation in quantized neural networks training. S-DFP can change the data representation range of dynamic fixed point format by adding bias to the exponent. We evaluated the effectiveness of S-DFP for quantized neural network training on the ImageNet task using ResNet-34, ResNet-50, ResNet-101 and ResNet-152. For example, the accuracy of quantized ResNet-152 is improved from 76.6% with conventional 8-bit DFP to 77.6% with 8-bit S-DFP.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Stephen Whitelam ◽  
Viktor Selin ◽  
Sang-Won Park ◽  
Isaac Tamblyn

AbstractWe show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.


2017 ◽  
Vol 109 (1) ◽  
pp. 29-38 ◽  
Author(s):  
Valentin Deyringer ◽  
Alexander Fraser ◽  
Helmut Schmid ◽  
Tsuyoshi Okita

Abstract Neural Networks are prevalent in todays NLP research. Despite their success for different tasks, training time is relatively long. We use Hogwild! to counteract this phenomenon and show that it is a suitable method to speed up training Neural Networks of different architectures and complexity. For POS tagging and translation we report considerable speedups of training, especially for the latter. We show that Hogwild! can be an important tool for training complex NLP architectures.


2014 ◽  
Vol 602-605 ◽  
pp. 3512-3514
Author(s):  
Xue Ding ◽  
Hong Hong Yang

With the ever-changing education information technology, it is a big problem for the universities and college that how to classify the thousands of copies of the image during the art examination marking process. This paper is to explore the application of artificial intelligence techniques, and to do accurate classification of a large number of images within a limited time and under the help of computer. It is can be seen that the proposed method is feasible through the application of the results of the actual work. Artificial neural network training Artificial neural network training methods have two mainly style, which are Incremental Training and Batch Training, and take the amount of different network training mission as the distinction standard. First, to introduce the Incremental Training [1], that means whenever the network receives the input vector and target vector, it have to adjust once the connection weights and thresholds. It is an online learning method. The other one is Batch Training [2], that means no longer adjust the connection and immediately, but perform bulk adjustment, and after a given volume of the input vector and target vector. Both training methods can be applied, whether it is static or dynamic neural network. Different results will be obtained by artificial neural network for the use of different training methods. When using artificial neural networks to solve specific problems, learning methods, training methods and artificial neural network function should be selected according to the expected results of question type and its specific requirements [3-4]. The selection of parameters of wavelet neural networks and adaptive learning


2014 ◽  
Vol 10 (S306) ◽  
pp. 279-287 ◽  
Author(s):  
Michael Hobson ◽  
Philip Graff ◽  
Farhan Feroz ◽  
Anthony Lasenby

AbstractMachine-learning methods may be used to perform many tasks required in the analysis of astronomical data, including: data description and interpretation, pattern recognition, prediction, classification, compression, inference and many more. An intuitive and well-established approach to machine learning is the use of artificial neural networks (NNs), which consist of a group of interconnected nodes, each of which processes information that it receives and then passes this product on to other nodes via weighted connections. In particular, I discuss the first public release of the generic neural network training algorithm, calledSkyNet, and demonstrate its application to astronomical problems focusing on its use in the BAMBI package for accelerated Bayesian inference in cosmology, and the identification of gamma-ray bursters. TheSkyNetand BAMBI packages, which are fully parallelised using MPI, are available athttp://www.mrao.cam.ac.uk/software/.


2022 ◽  
pp. 202-226
Author(s):  
Leema N. ◽  
Khanna H. Nehemiah ◽  
Elgin Christo V. R. ◽  
Kannan A.

Artificial neural networks (ANN) are widely used for classification, and the training algorithm commonly used is the backpropagation (BP) algorithm. The major bottleneck faced in the backpropagation neural network training is in fixing the appropriate values for network parameters. The network parameters are initial weights, biases, activation function, number of hidden layers and the number of neurons per hidden layer, number of training epochs, learning rate, minimum error, and momentum term for the classification task. The objective of this work is to investigate the performance of 12 different BP algorithms with the impact of variations in network parameter values for the neural network training. The algorithms were evaluated with different training and testing samples taken from the three benchmark clinical datasets, namely, Pima Indian Diabetes (PID), Hepatitis, and Wisconsin Breast Cancer (WBC) dataset obtained from the University of California Irvine (UCI) machine learning repository.


2020 ◽  
Vol 2 (1) ◽  
pp. 29-36
Author(s):  
M. I. Zghoba ◽  
◽  
Yu. I. Hrytsiuk ◽  

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.


2012 ◽  
Vol 2012 ◽  
pp. 1-9 ◽  
Author(s):  
Ioannis E. Livieris ◽  
Panagiotis Pintelas

Conjugate gradient methods constitute excellent neural network training methods characterized by their simplicity, numerical efficiency, and their very low memory requirements. In this paper, we propose a conjugate gradient neural network training algorithm which guarantees sufficient descent using any line search, avoiding thereby the usually inefficient restarts. Moreover, it achieves a high-order accuracy in approximating the second-order curvature information of the error surface by utilizing the modified secant condition proposed by Li et al. (2007). Under mild conditions, we establish that the proposed method is globally convergent for general functions under the strong Wolfe conditions. Experimental results provide evidence that our proposed method is preferable and in general superior to the classical conjugate gradient methods and has a potential to significantly enhance the computational efficiency and robustness of the training process.


Author(s):  
Leema N. ◽  
Khanna H. Nehemiah ◽  
Elgin Christo V. R. ◽  
Kannan A.

Artificial neural networks (ANN) are widely used for classification, and the training algorithm commonly used is the backpropagation (BP) algorithm. The major bottleneck faced in the backpropagation neural network training is in fixing the appropriate values for network parameters. The network parameters are initial weights, biases, activation function, number of hidden layers and the number of neurons per hidden layer, number of training epochs, learning rate, minimum error, and momentum term for the classification task. The objective of this work is to investigate the performance of 12 different BP algorithms with the impact of variations in network parameter values for the neural network training. The algorithms were evaluated with different training and testing samples taken from the three benchmark clinical datasets, namely, Pima Indian Diabetes (PID), Hepatitis, and Wisconsin Breast Cancer (WBC) dataset obtained from the University of California Irvine (UCI) machine learning repository.


Sign in / Sign up

Export Citation Format

Share Document