Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training

2019 Seventh International Conference on Advanced Cloud and Big Data (CBD) ◽

10.1109/cbd.2019.00021 ◽

2019 ◽

Author(s):

Zhaoyi Yang ◽

Fang Dong

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training ◽

Parallelization Strategy

Download Full-text

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015693 ◽

2019 ◽

Vol 33 ◽

pp. 5693-5700 ◽

Author(s):

Hao Yu ◽

Sen Yang ◽

Shenghuo Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Model Averaging ◽

Communication Overhead ◽

Single Server ◽

Training Time ◽

Distributed Training ◽

Speed Up ◽

Experimental Works ◽

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.

Download Full-text

A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw52791.2021.00110 ◽

2021 ◽

Author(s):

Sergio Barrachina ◽

Adrian Castello ◽

Mar Catalan ◽

Manuel F. Dolz ◽

Jose I. Mestre

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

PSO-PS:Parameter Synchronization with Particle Swarm Optimization for Distributed Training of Deep Neural Networks

2020 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn48605.2020.9207698 ◽

2020 ◽

Author(s):

Qing Ye ◽

Yuxuan Han ◽

Yanan Sun ◽

Jiancheng Lv

Keyword(s):

Neural Networks ◽

Particle Swarm Optimization ◽

Deep Neural Networks ◽

Particle Swarm ◽

Swarm Optimization ◽

Distributed Training

Download Full-text

A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks

2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) ◽

10.1109/micro.2018.00023 ◽

2018 ◽

Author(s):

Youjie Li ◽

Jongse Park ◽

Mohammad Alian ◽

Yifan Yuan ◽

Zheng Qu ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

Developing a Loss Prediction-based Asynchronous Stochastic Gradient Descent Algorithm for Distributed Training of Deep Neural Networks

49th International Conference on Parallel Processing - ICPP ◽

10.1145/3404397.3404432 ◽

2020 ◽

Author(s):

Junyu Li ◽

Ligang He ◽

Shenyuan Ren ◽

Rui Mao

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Deep Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Algorithm ◽

Distributed Training ◽

Gradient Descent Algorithm ◽

Loss Prediction

Download Full-text

Benchmarking network fabrics for data distributed training of deep neural networks

2020 IEEE High Performance Extreme Computing Conference (HPEC) ◽

10.1109/hpec43674.2020.9286232 ◽

2020 ◽

Author(s):

Siddharth Samsi ◽

Andrew Prout ◽

Michael Jones ◽

Andrew Kirby ◽

Bill Arcand ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

Distributed training of deep neural networks with spark: The MareNostrum experience

Pattern Recognition Letters ◽

10.1016/j.patrec.2019.01.020 ◽

2019 ◽

Vol 125 ◽

pp. 174-178

Author(s):

Leonel Cruz ◽

Ruben Tous ◽

Beatriz Otero

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

A Hitchhiker’s Guide On Distributed Training Of Deep Neural Networks

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2019.10.004 ◽

2020 ◽

Vol 137 ◽

pp. 65-76 ◽

Author(s):

Karanbir Singh Chahal ◽

Manraj Singh Grover ◽

Kuntal Dey ◽

Rajiv Ratn Shah

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

Parallel and Distributed Training of Deep Neural Networks: A brief overview

2020 IEEE 24th International Conference on Intelligent Engineering Systems (INES) ◽

10.1109/ines49302.2020.9147123 ◽

2020 ◽

Author(s):

Attila Farkas ◽

Gabor Kertesz ◽

Robert Lovas

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Distributed Training

Download Full-text

Poster Abstract: Model Average-based Distributed Training for Sparse Deep Neural Networks

IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) ◽

10.1109/infocomwkshps50562.2020.9162748 ◽

2020 ◽

Author(s):

Yuetong Yang ◽

Zhiquan Lai ◽

Lei Cai ◽

Dongsheng Li

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Abstract Model ◽

Poster Abstract ◽

Distributed Training ◽

Download Full-text