Accelerating recommendation system training by leveraging popular choices

Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000X more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3X and 1.52X in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy.

Download Full-text

Complexity of Deep Convolutional Neural Networks in Mobile Computing

Complexity ◽

10.1155/2020/3853780 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Saad Naeem ◽

Noreen Jamil ◽

Habib Ullah Khan ◽

Shah Nazir

Keyword(s):

Neural Networks ◽

Mobile Devices ◽

Training Data ◽

Deep Convolutional Neural Networks ◽

Compression Technique ◽

Training Time ◽

Network Pruning ◽

Highly Nonlinear ◽

Process Work ◽

Time Required

Neural networks employ massive interconnection of simple computing units called neurons to compute the problems that are highly nonlinear and could not be hard coded into a program. These neural networks are computation-intensive, and training them requires a lot of training data. Each training example requires heavy computations. We look at different ways in which we can reduce the heavy computation requirement and possibly make them work on mobile devices. In this paper, we survey various techniques that can be matched and combined in order to improve the training time of neural networks. Additionally, we also review some extra recommendations to make the process work for mobile devices as well. We finally survey deep compression technique that tries to solve the problem by network pruning, quantization, and encoding the network weights. Deep compression reduces the time required for training the network by first pruning the irrelevant connections, i.e., the pruning stage, which is then followed by quantizing the network weights via choosing centroids for each layer. Finally, at the third stage, it employs Huffman encoding algorithm to deal with the storage issue of the remaining weights.

Download Full-text

Automatic Gully Detection: Neural Networks and Computer Vision

Remote Sensing ◽

10.3390/rs12111743 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1743

Author(s):

Artur M. Gafurov ◽

Oleg P. Yermolayev

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Large Scale ◽

Satellite Images ◽

Gully Erosion ◽

Training Data ◽

Russian Plain ◽

Data Set ◽

High Resolution Satellite Images

Transition from manual (visual) interpretation to fully automated gully detection is an important task for quantitative assessment of modern gully erosion, especially when it comes to large mapping areas. Existing approaches to semi-automated gully detection are based on either object-oriented selection based on multispectral images or gully selection based on a probabilistic model obtained using digital elevation models (DEMs). These approaches cannot be used for the assessment of gully erosion on the territory of the European part of Russia most affected by gully erosion due to the lack of national large-scale DEM and limited resolution of open source multispectral satellite images. An approach based on the use of convolutional neural networks for automated gully detection on the RGB-synthesis of ultra-high resolution satellite images publicly available for the test region of the east of the Russian Plain with intensive basin erosion has been proposed and developed. The Keras library and U-Net architecture of convolutional neural networks were used for training. Preliminary results of application of the trained gully erosion convolutional neural network (GECNN) allow asserting that the algorithm performs well in detecting active gullies, well differentiates gullies from other linear forms of slope erosion — rills and balkas, but so far has errors in detecting complex gully systems. Also, GECNN does not identify a gully in 10% of cases and in another 10% of cases it identifies not a gully. To solve these problems, it is necessary to additionally train the neural network on the enlarged training data set.

Download Full-text

Investigating rainfall estimation from radar measurements using neural networks

Natural Hazards and Earth System Science ◽

10.5194/nhess-13-535-2013 ◽

2013 ◽

Vol 13 (3) ◽

pp. 535-544 ◽

Cited By ~ 7

Author(s):

A. Alqudah ◽

V. Chandrasekar ◽

M. Le

Keyword(s):

Neural Network ◽

Neural Networks ◽

Rain Gauge ◽

Training Data ◽

Training Dataset ◽

Dimensional Structure ◽

Rbf Neural Networks ◽

Rainfall Estimation ◽

Training Time ◽

Radar Measurements

Abstract. Rainfall observed on the ground is dependent on the four dimensional structure of precipitation aloft. Scanning radars can observe the four dimensional structure of precipitation. Neural network is a nonparametric method to represent the nonlinear relationship between radar measurements and rainfall rate. The relationship is derived directly from a dataset consisting of radar measurements and rain gauge measurements. The performance of neural network based rainfall estimation is subject to many factors, such as the representativeness and sufficiency of the training dataset, the generalization capability of the network to new data, seasonal changes, and regional changes. Improving the performance of the neural network for real time applications is of great interest. The goal of this paper is to investigate the performance of rainfall estimation based on Radial Basis Function (RBF) neural networks using radar reflectivity as input and rain gauge as the target. Data from Melbourne, Florida NEXRAD (Next Generation Weather Radar) ground radar (KMLB) over different years along with rain gauge measurements are used to conduct various investigations related to this problem. A direct gauge comparison study is done to demonstrate the improvement brought in by the neural networks and to show the feasibility of this system. The principal components analysis (PCA) technique is also used to reduce the dimensionality of the training dataset. Reducing the dimensionality of the input training data will reduce the training time as well as reduce the network complexity which will also avoid over fitting.

Download Full-text

Temperature Stability Investigations of Neural Network Models for Graphene-Based Gas Sensor Devices

Engineering Proceedings ◽

10.3390/ecsa-8-11250 ◽

2021 ◽

Vol 10 (1) ◽

pp. 19

Author(s):

Yosra Bahri ◽

Sebastian A. Schober ◽

Cecilia Carbonelli ◽

Robert Wille

Keyword(s):

Neural Networks ◽

Gas Sensors ◽

Large Scale ◽

Heating Temperature ◽

Low Cost ◽

Temperature Stability ◽

Training Data ◽

Neural Network Models ◽

Temperature Deviation ◽

Algorithm Performance

Chemiresistive gas sensors are a crucial tool for monitoring gases on a large scale. For the estimation of gas concentrations based on the signals provided by such sensors, pattern recognition tools, such as neural networks, are widely used after training them on data measured by sample sensors and reference devices. However, in the production process of low-cost sensor technologies, small variations in their physical properties can occur, which can alter the measuring conditions of the devices and make them less comparable to the sample sensors, leading to less adapted algorithms. In this work, we study the influence of such variations with a focus on changes in the operating and heating temperature of graphene-based gas sensors in particular. To this end, we trained machine learning models on synthetic data provided by a sensor simulation model. By varying the operation temperatures between −15% and +15% from the original values, we could observe a steady decline in algorithm performance, if the temperature deviation exceeds 10%. Furthermore, we were able to substantiate the effectiveness of training the neural networks with several temperature parameters by conducting a second, comparative experiment. A well-balanced training set has shown to improve the prediction accuracy metrics significantly in the scope of our measurement setup. Overall, our results provide insights into the influence of different operating temperatures on the algorithm performance and how the choice of training data can increase the robustness of the prediction algorithms.

Download Full-text

Collaborative Learning Based Straggler Prevention in Large-Scale Distributed Computing Framework

Security and Communication Networks ◽

10.1155/2021/8340925 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Shyam Deshmukh ◽

Komati Thirupathi Rao ◽

Mohammad Shabaz

Keyword(s):

Collaborative Learning ◽

Distributed Computing ◽

Large Scale ◽

Cluster Computing ◽

Data Transfer ◽

Training Data ◽

Shared Resources ◽

Training Time ◽

Big Data Applications ◽

Computing Framework

Modern big data applications tend to prefer a cluster computing approach as they are linked to the distributed computing framework that serves users jobs as per demand. It performs rapid processing of tasks by subdividing them into tasks that execute in parallel. Because of the complex environment, hardware and software issues, tasks might run slowly leading to delayed job completion, and such phenomena are also known as stragglers. The performance improvement of distributed computing framework is a bottleneck by straggling nodes due to various factors like shared resources, heavy system load, or hardware issues leading to the prolonged job execution time. Many state-of-the-art approaches use independent models per node and workload. With increased nodes and workloads, the number of models would increase, and even with large numbers of nodes. Not every node would be able to capture the stragglers as there might not be sufficient training data available of straggler patterns, yielding suboptimal straggler prediction. To alleviate such problems, we propose a novel collaborative learning-based approach for straggler prediction, the alternate direction method of multipliers (ADMM), which is resource-efficient and learns how to efficiently deal with mitigating stragglers without moving data to a centralized location. The proposed framework shares information among the various models, allowing us to use larger training data and bring training time down by avoiding data transfer. We rigorously evaluate the proposed method on various datasets with high accuracy results.

Download Full-text

A Verifiable Visual Cryptography Scheme Using Neural Networks

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1361 ◽

2013 ◽

Vol 756-759 ◽

pp. 1361-1365 ◽

Cited By ~ 1

Author(s):

Yu Qiao Deng ◽

Ge Song

Keyword(s):

Neural Networks ◽

Large Scale ◽

Visual Cryptography ◽

Regular Structure ◽

Signature Scheme ◽

Information Communication ◽

Training Time ◽

Verification Method ◽

Access Structures ◽

Visual Cryptography Scheme

This paper proposes a new verifiable visual cryptography scheme for general access structures using pi-sigma neural networks (VVCSPSN), which is based on probabilistic signature scheme (PSS), which is considered as security and effective verification method. Compared to other high-order networks, PSN has a highly regular structure, needs a much smaller number of weights and less training time. Using PSNs capability of large-scale parallel classification, VCSPSN reduces the information communication rate greatly, makes best known upper bound polynomial, and distinguishes the deferent information in secret image.

Download Full-text

Learning from Web Data Using Adversarial Discriminative Neural Networks for Fine-Grained Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301273 ◽

2019 ◽

Vol 33 ◽

pp. 273-280 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Sun ◽

Liyi Chen ◽

Jufeng Yang

Keyword(s):

Neural Networks ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Web Data ◽

Fine Grained ◽

Learning Framework ◽

Attractive Option ◽

Public Datasets ◽

Noisy Labels

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.

Download Full-text

MapReduce Based Parallel Neural Networks in Enabling Large Scale Machine Learning

Computational Intelligence and Neuroscience ◽

10.1155/2015/297672 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 22

Author(s):

Yang Liu ◽

Jie Yang ◽

Yuan Huang ◽

Lixiong Xu ◽

Siguang Li ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Large Scale ◽

Training Data ◽

Computer Cluster ◽

Data Intensive ◽

Big Data Applications ◽

The Neural Network ◽

Computation Process ◽

Data Intensive Applications

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.

Download Full-text

Estimating Neural Network’s Performance with Bootstrap: A Tutorial

Machine Learning and Knowledge Extraction ◽

10.3390/make3020018 ◽

2021 ◽

Vol 3 (2) ◽

pp. 357-373

Author(s):

Umberto Michelucci ◽

Francesca Venturini

Keyword(s):

Neural Networks ◽

Mean Squared Error ◽

Real Data ◽

Point Of View ◽

Training Data ◽

Statistical Estimator ◽

Regression Problem ◽

Training Time ◽

Computationally Intensive ◽

Bootstrap Algorithm

Neural networks present characteristics where the results are strongly dependent on the training data, the weight initialisation, and the hyperparameters chosen. The determination of the distribution of a statistical estimator, as the Mean Squared Error (MSE) or the accuracy, is fundamental to evaluate the performance of a neural network model (NNM). For many machine learning models, as linear regression, it is possible to analytically obtain information as variance or confidence intervals on the results. Neural networks present the difficulty of not being analytically tractable due to their complexity. Therefore, it is impossible to easily estimate distributions of statistical estimators. When estimating the global performance of an NNM by estimating the MSE in a regression problem, for example, it is important to know the variance of the MSE. Bootstrap is one of the most important resampling techniques to estimate averages and variances, between other properties, of statistical estimators. In this tutorial, the application of resampling techniques (including bootstrap) to the evaluation of neural networks’ performance is explained from both a theoretical and practical point of view. The pseudo-code of the algorithms is provided to facilitate their implementation. Computational aspects, as the training time, are discussed, since resampling techniques always require simulations to be run many thousands of times and, therefore, are computationally intensive. A specific version of the bootstrap algorithm is presented that allows the estimation of the distribution of a statistical estimator when dealing with an NNM in a computationally effective way. Finally, algorithms are compared on both synthetically generated and real data to demonstrate their performance.

Download Full-text

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

10.21203/rs.3.rs-832355/v1 ◽

2021 ◽

Author(s):

Daniel Coquelin ◽

Charlotte Debus ◽

Markus Götz ◽

Fabrice von der Lehr ◽

James Kahn ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Training Methods ◽

Network Parameter ◽

Distributed Resources ◽

Global Networks ◽

Training Time ◽

Data Parallel ◽

Network Training ◽

Time Required

Abstract With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the Distributed Asynchronous and Selective Optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.

Download Full-text