scholarly journals Accelerating recommendation system training by leveraging popular choices

2021 ◽  
Vol 15 (1) ◽  
pp. 127-140
Author(s):  
Muhammad Adnan ◽  
Yassaman Ebrahimzadeh Maboud ◽  
Divya Mahajan ◽  
Prashant J. Nair

Recommender models are commonly used to suggest relevant items to a user for e-commerce and online advertisement-based applications. These models use massive embedding tables to store numerical representation of items' and users' categorical variables (memory intensive) and employ neural networks (compute intensive) to generate final recommendations. Training these large-scale recommendation models is evolving to require increasing data and compute resources. The highly parallel neural networks portion of these models can benefit from GPU acceleration however, large embedding tables often cannot fit in the limited-capacity GPU device memory. Hence, this paper deep dives into the semantics of training data and obtains insights about the feature access, transfer, and usage patterns of these models. We observe that, due to the popularity of certain inputs, the accesses to the embeddings are highly skewed with a few embedding entries being accessed up to 10000X more. This paper leverages this asymmetrical access pattern to offer a framework, called FAE, and proposes a hot-embedding aware data layout for training recommender models. This layout utilizes the scarce GPU memory for storing the highly accessed embeddings, thus reduces the data transfers from CPU to GPU. At the same time, FAE engages the GPU to accelerate the executions of these hot embedding entries. Experiments on production-scale recommendation models with real datasets show that FAE reduces the overall training time by 2.3X and 1.52X in comparison to XDL CPU-only and XDL CPU-GPU execution while maintaining baseline accuracy.

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Saad Naeem ◽  
Noreen Jamil ◽  
Habib Ullah Khan ◽  
Shah Nazir

Neural networks employ massive interconnection of simple computing units called neurons to compute the problems that are highly nonlinear and could not be hard coded into a program. These neural networks are computation-intensive, and training them requires a lot of training data. Each training example requires heavy computations. We look at different ways in which we can reduce the heavy computation requirement and possibly make them work on mobile devices. In this paper, we survey various techniques that can be matched and combined in order to improve the training time of neural networks. Additionally, we also review some extra recommendations to make the process work for mobile devices as well. We finally survey deep compression technique that tries to solve the problem by network pruning, quantization, and encoding the network weights. Deep compression reduces the time required for training the network by first pruning the irrelevant connections, i.e., the pruning stage, which is then followed by quantizing the network weights via choosing centroids for each layer. Finally, at the third stage, it employs Huffman encoding algorithm to deal with the storage issue of the remaining weights.


2020 ◽  
Vol 12 (11) ◽  
pp. 1743
Author(s):  
Artur M. Gafurov ◽  
Oleg P. Yermolayev

Transition from manual (visual) interpretation to fully automated gully detection is an important task for quantitative assessment of modern gully erosion, especially when it comes to large mapping areas. Existing approaches to semi-automated gully detection are based on either object-oriented selection based on multispectral images or gully selection based on a probabilistic model obtained using digital elevation models (DEMs). These approaches cannot be used for the assessment of gully erosion on the territory of the European part of Russia most affected by gully erosion due to the lack of national large-scale DEM and limited resolution of open source multispectral satellite images. An approach based on the use of convolutional neural networks for automated gully detection on the RGB-synthesis of ultra-high resolution satellite images publicly available for the test region of the east of the Russian Plain with intensive basin erosion has been proposed and developed. The Keras library and U-Net architecture of convolutional neural networks were used for training. Preliminary results of application of the trained gully erosion convolutional neural network (GECNN) allow asserting that the algorithm performs well in detecting active gullies, well differentiates gullies from other linear forms of slope erosion — rills and balkas, but so far has errors in detecting complex gully systems. Also, GECNN does not identify a gully in 10% of cases and in another 10% of cases it identifies not a gully. To solve these problems, it is necessary to additionally train the neural network on the enlarged training data set.


2013 ◽  
Vol 13 (3) ◽  
pp. 535-544 ◽  
Author(s):  
A. Alqudah ◽  
V. Chandrasekar ◽  
M. Le

Abstract. Rainfall observed on the ground is dependent on the four dimensional structure of precipitation aloft. Scanning radars can observe the four dimensional structure of precipitation. Neural network is a nonparametric method to represent the nonlinear relationship between radar measurements and rainfall rate. The relationship is derived directly from a dataset consisting of radar measurements and rain gauge measurements. The performance of neural network based rainfall estimation is subject to many factors, such as the representativeness and sufficiency of the training dataset, the generalization capability of the network to new data, seasonal changes, and regional changes. Improving the performance of the neural network for real time applications is of great interest. The goal of this paper is to investigate the performance of rainfall estimation based on Radial Basis Function (RBF) neural networks using radar reflectivity as input and rain gauge as the target. Data from Melbourne, Florida NEXRAD (Next Generation Weather Radar) ground radar (KMLB) over different years along with rain gauge measurements are used to conduct various investigations related to this problem. A direct gauge comparison study is done to demonstrate the improvement brought in by the neural networks and to show the feasibility of this system. The principal components analysis (PCA) technique is also used to reduce the dimensionality of the training dataset. Reducing the dimensionality of the input training data will reduce the training time as well as reduce the network complexity which will also avoid over fitting.


2021 ◽  
Vol 10 (1) ◽  
pp. 19
Author(s):  
Yosra Bahri ◽  
Sebastian A. Schober ◽  
Cecilia Carbonelli ◽  
Robert Wille

Chemiresistive gas sensors are a crucial tool for monitoring gases on a large scale. For the estimation of gas concentrations based on the signals provided by such sensors, pattern recognition tools, such as neural networks, are widely used after training them on data measured by sample sensors and reference devices. However, in the production process of low-cost sensor technologies, small variations in their physical properties can occur, which can alter the measuring conditions of the devices and make them less comparable to the sample sensors, leading to less adapted algorithms. In this work, we study the influence of such variations with a focus on changes in the operating and heating temperature of graphene-based gas sensors in particular. To this end, we trained machine learning models on synthetic data provided by a sensor simulation model. By varying the operation temperatures between −15% and +15% from the original values, we could observe a steady decline in algorithm performance, if the temperature deviation exceeds 10%. Furthermore, we were able to substantiate the effectiveness of training the neural networks with several temperature parameters by conducting a second, comparative experiment. A well-balanced training set has shown to improve the prediction accuracy metrics significantly in the scope of our measurement setup. Overall, our results provide insights into the influence of different operating temperatures on the algorithm performance and how the choice of training data can increase the robustness of the prediction algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Shyam Deshmukh ◽  
Komati Thirupathi Rao ◽  
Mohammad Shabaz

Modern big data applications tend to prefer a cluster computing approach as they are linked to the distributed computing framework that serves users jobs as per demand. It performs rapid processing of tasks by subdividing them into tasks that execute in parallel. Because of the complex environment, hardware and software issues, tasks might run slowly leading to delayed job completion, and such phenomena are also known as stragglers. The performance improvement of distributed computing framework is a bottleneck by straggling nodes due to various factors like shared resources, heavy system load, or hardware issues leading to the prolonged job execution time. Many state-of-the-art approaches use independent models per node and workload. With increased nodes and workloads, the number of models would increase, and even with large numbers of nodes. Not every node would be able to capture the stragglers as there might not be sufficient training data available of straggler patterns, yielding suboptimal straggler prediction. To alleviate such problems, we propose a novel collaborative learning-based approach for straggler prediction, the alternate direction method of multipliers (ADMM), which is resource-efficient and learns how to efficiently deal with mitigating stragglers without moving data to a centralized location. The proposed framework shares information among the various models, allowing us to use larger training data and bring training time down by avoiding data transfer. We rigorously evaluate the proposed method on various datasets with high accuracy results.


2013 ◽  
Vol 756-759 ◽  
pp. 1361-1365 ◽  
Author(s):  
Yu Qiao Deng ◽  
Ge Song

This paper proposes a new verifiable visual cryptography scheme for general access structures using pi-sigma neural networks (VVCSPSN), which is based on probabilistic signature scheme (PSS), which is considered as security and effective verification method. Compared to other high-order networks, PSN has a highly regular structure, needs a much smaller number of weights and less training time. Using PSNs capability of large-scale parallel classification, VCSPSN reduces the information communication rate greatly, makes best known upper bound polynomial, and distinguishes the deferent information in secret image.


Author(s):  
Xiaoxiao Sun ◽  
Liyi Chen ◽  
Jufeng Yang

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Yang Liu ◽  
Jie Yang ◽  
Yuan Huang ◽  
Lixiong Xu ◽  
Siguang Li ◽  
...  

Artificial neural networks (ANNs) have been widely used in pattern recognition and classification applications. However, ANNs are notably slow in computation especially when the size of data is large. Nowadays, big data has received a momentum from both industry and academia. To fulfill the potentials of ANNs for big data applications, the computation process must be speeded up. For this purpose, this paper parallelizes neural networks based on MapReduce, which has become a major computing model to facilitate data intensive applications. Three data intensive scenarios are considered in the parallelization process in terms of the volume of classification data, the size of the training data, and the number of neurons in the neural network. The performance of the parallelized neural networks is evaluated in an experimental MapReduce computer cluster from the aspects of accuracy in classification and efficiency in computation.


2021 ◽  
Vol 3 (2) ◽  
pp. 357-373
Author(s):  
Umberto Michelucci ◽  
Francesca Venturini

Neural networks present characteristics where the results are strongly dependent on the training data, the weight initialisation, and the hyperparameters chosen. The determination of the distribution of a statistical estimator, as the Mean Squared Error (MSE) or the accuracy, is fundamental to evaluate the performance of a neural network model (NNM). For many machine learning models, as linear regression, it is possible to analytically obtain information as variance or confidence intervals on the results. Neural networks present the difficulty of not being analytically tractable due to their complexity. Therefore, it is impossible to easily estimate distributions of statistical estimators. When estimating the global performance of an NNM by estimating the MSE in a regression problem, for example, it is important to know the variance of the MSE. Bootstrap is one of the most important resampling techniques to estimate averages and variances, between other properties, of statistical estimators. In this tutorial, the application of resampling techniques (including bootstrap) to the evaluation of neural networks’ performance is explained from both a theoretical and practical point of view. The pseudo-code of the algorithms is provided to facilitate their implementation. Computational aspects, as the training time, are discussed, since resampling techniques always require simulations to be run many thousands of times and, therefore, are computationally intensive. A specific version of the bootstrap algorithm is presented that allows the estimation of the distribution of a statistical estimator when dealing with an NNM in a computationally effective way. Finally, algorithms are compared on both synthetically generated and real data to demonstrate their performance.


2021 ◽  
Author(s):  
Daniel Coquelin ◽  
Charlotte Debus ◽  
Markus Götz ◽  
Fabrice von der Lehr ◽  
James Kahn ◽  
...  

Abstract With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the Distributed Asynchronous and Selective Optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.


Sign in / Sign up

Export Citation Format

Share Document