EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent

We develop a Distributed Event-Triggered Stochastic GRAdient Descent (DETSGRAD) algorithm for solving non-convex optimization problems typically encountered in distributed deep learning. We propose a novel communication triggering mechanism that would allow the networked agents to update their model parameters aperiodically and provide sufficient conditions on the algorithm step-sizes that guarantee the asymptotic mean-square convergence. The algorithm is applied to a distributed supervised-learning problem, in which a set of networked agents collaboratively train their individual neural networks to perform image classification, while aperiodically sharing the model parameters with their one-hop neighbors. Results indicate that all agents report similar performance that is also comparable to the performance of a centrally trained neural network, while the event-triggered communication provides significant reduction in inter-agent communication. Results also show that the proposed algorithm allows the individual agents to classify the images even though the training data corresponding to all the classes are not locally available to each agent.

Download Full-text

Pedestrian Re-identification Based on Hierarchical Attributes Learning via Parallel Stochastic Gradient Descent

2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS) ◽

10.1109/ccis.2018.8691153 ◽

2018 ◽

Author(s):

Fei Tao ◽

Keyang Cheng ◽

Jianming Zhang

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent

Download Full-text

An Efficient Parallel Stochastic Gradient Descent for Matrix Factorization On GPUS

2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC) ◽

10.1109/dsc50466.2020.00047 ◽

2020 ◽

Author(s):

Tianyu Xing ◽

Bin Wu ◽

Bai Wang

Keyword(s):

Matrix Factorization ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent

Download Full-text

Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning

IEEE/CAA Journal of Automatica Sinica ◽

10.1109/jas.2020.1003396 ◽

2021 ◽

Vol 8 (2) ◽

pp. 402-411

Author(s):

Xin Luo ◽

Wen Qin ◽

Ani Dong ◽

Khaled Sedraoui ◽

MengChu Zhou

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

High Quality ◽

Parallel Stochastic Gradient Descent

Download Full-text

Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence

2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps49936.2021.00051 ◽

2021 ◽

Author(s):

Karl Backstrom ◽

Ivan Walulya ◽

Marina Papatriantafilou ◽

Philippas Tsigas

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Stable Convergence ◽

Parallel Stochastic Gradient Descent

Download Full-text

A(DP)^2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2021.3107796 ◽

2021 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Jie Xu ◽

Wei Zhang ◽

Fei Wang

Keyword(s):

Gradient Descent ◽

Differential Privacy ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent

Download Full-text

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/447 ◽

2018 ◽

Cited By ~ 8

Author(s):

Fan Zhou ◽

Guojing Cong

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Stochastic Gradient ◽

Learning Problems ◽

Stochastic Gradient Descent ◽

Convergence Properties ◽

Descent Algorithm ◽

Convergence Results ◽

Gradient Descent Algorithm ◽

Parallel Stochastic Gradient Descent

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with $K=1$. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

Download Full-text

EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent

High Performance Parallel Stochastic Gradient Descent in Shared Memory

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

Network-Density-Controlled Decentralized Parallel Stochastic Gradient Descent in Wireless Systems

Distributed Stochastic Gradient Descent with Event-Triggered Communication

Pedestrian Re-identification Based on Hierarchical Attributes Learning via Parallel Stochastic Gradient Descent

An Efficient Parallel Stochastic Gradient Descent for Matrix Factorization On GPUS

Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning

Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence

A(DP)^2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization

Export Citation Format