scholarly journals Parallel Backpropagation Neural Network Training Techniques using Graphics Processing Unit

Author(s):  
Muhammad Arslan Amin ◽  
Muhammad Kashif ◽  
Muhammad Umer ◽  
Abdur Rehman ◽  
Fiaz Waheed ◽  
...  
2020 ◽  
Vol 2 (1) ◽  
pp. 29-36
Author(s):  
M. I. Zghoba ◽  
◽  
Yu. I. Hrytsiuk ◽  

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.


2020 ◽  
Author(s):  
Vui Huang Tea

The 3rd Generation Partnership Project (3GPP) standard for 5G telecommunications specifies privacy protection schemes to cryptographically encrypt and conceal permanent identifiers of subscribers to prevent them from being exposed and tracked by over-the-air eavesdroppers. However, conventional privacy-preserving protocols and architectures alone are insufficient to protect subscriber privacy as they are vulnerable to new types of attacks due to the utilization of the emerging technologies such artificial intelligence (AI). A conventional brute force attack to unmask concealed 5G identity using a CPU would require ~877 million years. This paper presents an apparatus using machine learning (ML) and a graphics processing unit (GPU) that is able to unmask a concealed 5G identity in ~12 minutes with an untrained neural-network, or ~0.015 milliseconds with a pre-trained neural-network. The 5G concealed identities are effectively identified without requiring decryption, hence severely diminishing the level of privacy-preservation. Finally, several ML defence countermeasures are proposed to re-establish privacy protection in 5G identity.


2021 ◽  
Vol 2062 (1) ◽  
pp. 012008
Author(s):  
Sunil Pandey ◽  
Naresh Kumar Nagwani ◽  
Shrish Verma

Abstract The convolutional neural network training algorithm has been implemented for a central processing unit based high performance multisystem architecture machine. The multisystem or the multicomputer is a parallel machine model which is essentially an abstraction of distributed memory parallel machines. In actual practice, this model corresponds to high performance computing clusters. The proposed implementation of the convolutional neural network training algorithm is based on modeling the convolutional neural network as a computational pipeline. The various functions or tasks of the convolutional neural network pipeline have been mapped onto the multiple nodes of a central processing unit based high performance computing cluster for task parallelism. The pipeline implementation provides a first level performance gain through pipeline parallelism. Further performance gains are obtained by distributing the convolutional neural network training onto the different nodes of the compute cluster. The two gains are multiplicative. In this work, the authors have carried out a comparative evaluation of the computational performance and scalability of this pipeline implementation of the convolutional neural network training with a distributed neural network software program which is based on conventional multi-model training and makes use of a centralized server. The dataset considered for this work is the North Eastern University’s hot rolled steel strip surface defects imaging dataset. In both the cases, the convolutional neural networks have been trained to classify the different defects on hot rolled steel strips on the basis of the input image. One hundred images corresponding to each class of defects have been used for the training in order to keep the training times manageable. The hyperparameters of both the convolutional neural networks were kept identical and the programs were run on the same computational cluster to enable fair comparison. Both the convolutional neural network implementations have been observed to train to nearly 80% training accuracy in 200 epochs. In effect, therefore, the comparison is on the time taken to complete the training epochs.


Sign in / Sign up

Export Citation Format

Share Document