Parallel Backpropagation Neural Network Training Techniques using Graphics Processing Unit

The peculiarities of neural network training for forecasting taxi passenger demand using graphics processing units are considered, which allowed to speed up the training procedure for different sets of input data, hardware configurations, and its power. It has been found that taxi services are becoming more accessible to a wide range of people. The most important task for any transportation company and taxi driver is to minimize the waiting time for new orders and to minimize the distance from drivers to passengers on order receiving. Understanding and assessing the geographical passenger demand that depends on many factors is crucial to achieve this goal. This paper describes an example of neural network training for predicting taxi passenger demand. It shows the importance of a large input dataset for the accuracy of the neural network. Since the training of a neural network is a lengthy process, parallel training was used to speed up the training. The neural network for forecasting taxi passenger demand was trained using different hardware configurations, such as one CPU, one GPU, and two GPUs. The training times of one epoch were compared along with these configurations. The impact of different hardware configurations on training time was analyzed in this work. The network was trained using a dataset containing 4.5 million trips within one city. The results of this study show that the training with GPU accelerators doesn't necessarily improve the training time. The training time depends on many factors, such as input dataset size, splitting of the entire dataset into smaller subsets, as well as hardware and power characteristics.

Download Full-text

Unmasking Concealed 5G Privacy Identity with Machine Learning and GPU in 12 mins

10.36227/techrxiv.13187636.v1 ◽

2020 ◽

Author(s):

Vui Huang Tea

Keyword(s):

Neural Network ◽

Machine Learning ◽

Privacy Protection ◽

Privacy Preservation ◽

Graphics Processing Unit ◽

Processing Unit ◽

Trained Neural Network ◽

Brute Force Attack ◽

Graphics Processing ◽

3Rd Generation Partnership Project

The 3rd Generation Partnership Project (3GPP) standard for 5G telecommunications specifies privacy protection schemes to cryptographically encrypt and conceal permanent identifiers of subscribers to prevent them from being exposed and tracked by over-the-air eavesdroppers. However, conventional privacy-preserving protocols and architectures alone are insufficient to protect subscriber privacy as they are vulnerable to new types of attacks due to the utilization of the emerging technologies such artificial intelligence (AI). A conventional brute force attack to unmask concealed 5G identity using a CPU would require ~877 million years. This paper presents an apparatus using machine learning (ML) and a graphics processing unit (GPU) that is able to unmask a concealed 5G identity in ~12 minutes with an untrained neural-network, or ~0.015 milliseconds with a pre-trained neural-network. The 5G concealed identities are effectively identified without requiring decryption, hence severely diminishing the level of privacy-preservation. Finally, several ML defence countermeasures are proposed to re-establish privacy protection in 5G identity.

Download Full-text

Analysis of Alternatives for Neural Network Training Techniques in Assessing Cognitive Workload

Advances in Neuroergonomics and Cognitive Engineering - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-94866-9_3 ◽

2018 ◽

pp. 27-37

Author(s):

Colin Elkin ◽

Vijay Devabhaktuni

Keyword(s):

Neural Network ◽

Cognitive Workload ◽

Neural Network Training ◽

Training Techniques ◽

Network Training ◽

Analysis Of Alternatives

Download Full-text

Comparative evaluation of performance and scalability of convolutional neural network implementations on a multisystem HPC architecture

Journal of Physics Conference Series ◽

10.1088/1742-6596/2062/1/012008 ◽

2021 ◽

Vol 2062 (1) ◽

pp. 012008

Author(s):

Sunil Pandey ◽

Naresh Kumar Nagwani ◽

Shrish Verma

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Performance ◽

Processing Unit ◽

Neural Network Training ◽

Central Processing ◽

Network Training ◽

Hot Rolled ◽

Hot Rolled Steel ◽

Performance Computing

Abstract The convolutional neural network training algorithm has been implemented for a central processing unit based high performance multisystem architecture machine. The multisystem or the multicomputer is a parallel machine model which is essentially an abstraction of distributed memory parallel machines. In actual practice, this model corresponds to high performance computing clusters. The proposed implementation of the convolutional neural network training algorithm is based on modeling the convolutional neural network as a computational pipeline. The various functions or tasks of the convolutional neural network pipeline have been mapped onto the multiple nodes of a central processing unit based high performance computing cluster for task parallelism. The pipeline implementation provides a first level performance gain through pipeline parallelism. Further performance gains are obtained by distributing the convolutional neural network training onto the different nodes of the compute cluster. The two gains are multiplicative. In this work, the authors have carried out a comparative evaluation of the computational performance and scalability of this pipeline implementation of the convolutional neural network training with a distributed neural network software program which is based on conventional multi-model training and makes use of a centralized server. The dataset considered for this work is the North Eastern University’s hot rolled steel strip surface defects imaging dataset. In both the cases, the convolutional neural networks have been trained to classify the different defects on hot rolled steel strips on the basis of the input image. One hundred images corresponding to each class of defects have been used for the training in order to keep the training times manageable. The hyperparameters of both the convolutional neural networks were kept identical and the programs were run on the same computational cluster to enable fair comparison. Both the convolutional neural network implementations have been observed to train to nearly 80% training accuracy in 200 epochs. In effect, therefore, the comparison is on the time taken to complete the training epochs.

Download Full-text