Accelerating distributed deep neural network training with pipelined MPI allreduce

Cluster Computing ◽

10.1007/s10586-021-03370-9 ◽

2021 ◽

Author(s):

Adrián Castelló ◽

Enrique S. Quintana-Ortí ◽

José Duato

Keyword(s):

Neural Network ◽

Experimental Analysis ◽

Deep Neural Network ◽

Neural Network Training ◽

Synchronization Mechanism ◽

Distributed Training ◽

Network Training ◽

Workload Distribution ◽

Share Information

AbstractTensorFlow (TF) is usually combined with the Horovod (HVD) workload distribution package to obtain a parallel tool to train deep neural network on clusters of computers. HVD in turn utilizes a blocking Allreduce primitive to share information among processes, combined with a communication thread to overlap communication with computation. In this work, we perform a thorough experimental analysis to expose (1) the importance of selecting the best algorithm in MPI libraries to realize the Allreduce operation; and (2) the performance acceleration that can be attained when replacing a blocking Allreduce with its non-blocking counterpart (while maintaining the blocking behaviour via the appropriate synchronization mechanism). Furthermore, (3) we explore the benefits of applying pipelining to the communication exchange, demonstrating that these improvements carry over to distributed training via TF+HVD. Finally, (4) we show that pipelining can also boost performance for applications that make heavy use of other collectives, such as Broadcast and Reduce-Scatter.

Download Full-text

A Deep Neural Network Training Architecture with Inference-aware Heterogeneous Data-type

IEEE Transactions on Computers ◽

10.1109/tc.2021.3078316 ◽

2021 ◽

pp. 1-1

Author(s):

Seungkyu Choi ◽

Jaekang Shin ◽

Lee-Sup Kim

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Heterogeneous Data ◽

Data Type ◽

Neural Network Training ◽

Network Training

Download Full-text

An Energy-Efficient Deep Neural Network Training Processor with Bit-Slice-Level Reconfigurability and Sparsity Exploitation

2021 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) ◽

10.1109/coolchips52128.2021.9410324 ◽

2021 ◽

Author(s):

Donghyeon Han ◽

Dongseok Im ◽

Gwangtae Park ◽

Youngwoo Kim ◽

Seokchan Song ◽

...

Keyword(s):

Neural Network ◽

Energy Efficient ◽

Deep Neural Network ◽

Neural Network Training ◽

Network Training

Download Full-text

Deep Neural Network Training Accelerator Designs in ASIC and FPGA

2020 International SoC Design Conference (ISOCC) ◽

10.1109/isocc50952.2020.9333063 ◽

2020 ◽

Author(s):

Shreyas K. Venkataramanaiah ◽

Shihui Yin ◽

Yu Cao ◽

Jae-Sun Seo

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Training ◽

Network Training

Download Full-text

Low Complexity Gradient Computation Techniques to Accelerate Deep Neural Network Training

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2021.3130991 ◽

2021 ◽

pp. 1-15

Author(s):

Dongyeob Shin ◽

Geonho Kim ◽

Joongho Jo ◽

Jongsun Park

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Low Complexity ◽

Neural Network Training ◽

Computation Techniques ◽

Network Training ◽

Gradient Computation

Download Full-text

A FeRAM based Volatile/Non-volatile Dual-mode Buffer Memory for Deep Neural Network Training

10.23919/date51398.2021.9474180 ◽

2021 ◽

Author(s):

Yandong Luo ◽

Yuan-Chun Luc ◽

Shimeng Yu

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Training ◽

Dual Mode ◽

Buffer Memory ◽

Network Training

Download Full-text

Exploring one pass learning for deep neural network training with averaged stochastic gradient descent

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2014.6854928 ◽

2014 ◽

Author(s):

Zhao You ◽

Xiaorui Wang ◽

Bo Xu

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Deep Neural Network ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Neural Network Training ◽

Network Training

Download Full-text

A 141.4 mW Low-Power Online Deep Neural Network Training Processor for Real-time Object Tracking in Mobile Devices

2018 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas.2018.8351398 ◽

2018 ◽

Author(s):

Donghyeon Han ◽

Jinsu Lee ◽

Jinmook Lee ◽

Sungpill Choi ◽

Hoi-Jun Yoo

Keyword(s):

Neural Network ◽

Low Power ◽

Object Tracking ◽

Real Time ◽

Mobile Devices ◽

Deep Neural Network ◽

Neural Network Training ◽

Network Training

Download Full-text

A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes

2018 IEEE International Symposium on Information Theory (ISIT) ◽

10.1109/isit.2018.8437852 ◽

2018 ◽

Author(s):

Sanghamitra Dutta ◽

Ziqian Bai ◽

Haewon Jeong ◽

Tze Meng Low ◽

Pulkit Grover

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Training ◽

Training Strategy ◽

Network Training

Download Full-text

Communication optimization strategies for distributed deep neural network training: A survey

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2020.11.005 ◽

2021 ◽

Vol 149 ◽

pp. 52-65 ◽

Author(s):

Shuo Ouyang ◽

Dezun Dong ◽

Yemao Xu ◽

Liquan Xiao

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Communication Optimization ◽

Neural Network Training ◽

Network Training

Download Full-text

Distributed deep neural network training on edge devices

Proceedings of the 4th ACM/IEEE Symposium on Edge Computing - SEC '19 ◽

10.1145/3318216.3363324 ◽

2019 ◽

Author(s):

Daniel Benditkis ◽

Aviv Keren ◽

Liron Mor-Yosef ◽

Tomer Avidor ◽

Neta Shoham ◽

...

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Neural Network Training ◽

Network Training

Download Full-text