D-(DP)2SGD: Decentralized Parallel SGD with Differential Privacy in Dynamic Networks

Decentralized machine learning has been playing an essential role in improving training efficiency. It has been applied in many real-world scenarios, such as edge computing and IoT. However, in fact, networks are dynamic, and there is a risk of information leaking during the communication process. To address this problem, we propose a decentralized parallel stochastic gradient descent algorithm (D-(DP)2SGD) with differential privacy in dynamic networks. With rigorous analysis, we show that D-(DP)2SGD converges with a rate of O 1 / K n while satisfying ε -DP, which achieves almost the same convergence rate as previous works without privacy concern. To the best of our knowledge, our algorithm is the first known decentralized parallel SGD algorithm that can implement in dynamic networks and take privacy-preserving into consideration.

Download Full-text

A(DP)^2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2021.3107796 ◽

2021 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Jie Xu ◽

Wei Zhang ◽

Fei Wang

Keyword(s):

Gradient Descent ◽

Differential Privacy ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent

Download Full-text

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/447 ◽

2018 ◽

Cited By ~ 8

Author(s):

Fan Zhou ◽

Guojing Cong

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Stochastic Gradient ◽

Learning Problems ◽

Stochastic Gradient Descent ◽

Convergence Properties ◽

Descent Algorithm ◽

Convergence Results ◽

Gradient Descent Algorithm ◽

Parallel Stochastic Gradient Descent

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with $K=1$. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

Download Full-text

Revisiting the dynamic interactions between economic growth and environmental pollution in Italy: evidence from a gradient descent algorithm

Environmental Science and Pollution Research ◽

10.1007/s11356-021-14264-z ◽

2021 ◽

Author(s):

Marco Mele ◽

Cosimo Magazzino ◽

Nicolas Schneider ◽

Floriana Nicolai

Keyword(s):

Economic Growth ◽

Gradient Descent ◽

Statistical Data ◽

Stochastic Gradient Descent ◽

Policy Recommendations ◽

Dynamic Interactions ◽

Gradient Descent Algorithm ◽

The Relationship ◽

Main Strand ◽

Yearly Data

AbstractAlthough the literature on the relationship between economic growth and CO2 emissions is extensive, the use of machine learning (ML) tools remains seminal. In this paper, we assess this nexus for Italy using innovative algorithms, with yearly data for the 1960–2017 period. We develop three distinct models: the batch gradient descent (BGD), the stochastic gradient descent (SGD), and the multilayer perceptron (MLP). Despite the phase of low Italian economic growth, results reveal that CO2 emissions increased in the predicting model. Compared to the observed statistical data, the algorithm shows a correlation between low growth and higher CO2 increase, which contradicts the main strand of literature. Based on this outcome, adequate policy recommendations are provided.

Download Full-text

High Performance Parallel Stochastic Gradient Descent in Shared Memory

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) ◽

10.1109/ipdps.2016.107 ◽

2016 ◽

Cited By ~ 8

Author(s):

Scott Sallinen ◽

Nadathur Satish ◽

Mikhail Smelyanskiy ◽

Samantika S. Sury ◽

Christopher Re

Keyword(s):

Shared Memory ◽

Gradient Descent ◽

High Performance ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent

Download Full-text

Soft-Sign Stochastic Gradient Descent Algorithm for Wireless Federated Learning

10.1109/spawc51858.2021.9593212 ◽

2021 ◽

Author(s):

Seunghoon Lee ◽

Chanho Park ◽

Songnam Hong ◽

Yonina C. Eldar ◽

Namyoon Lee

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Algorithm ◽

Gradient Descent Algorithm

Download Full-text

Optical Recognition of Handwritten Logic Formulas Using Neural Networks

Electronics ◽

10.3390/electronics10222761 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2761

Author(s):

Vaios Ampelakiotis ◽

Isidoros Perikos ◽

Ioannis Hatzilygeroudis ◽

George Tsihrintzis

Keyword(s):

Neural Networks ◽

Character Recognition ◽

Gradient Descent ◽

Feedforward Neural Networks ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Training Algorithms ◽

Gradient Descent Algorithm ◽

Two Stages ◽

And Training

In this paper, we present a handwritten character recognition (HCR) system that aims to recognize first-order logic handwritten formulas and create editable text files of the recognized formulas. Dense feedforward neural networks (NNs) are utilized, and their performance is examined under various training conditions and methods. More specifically, after three training algorithms (backpropagation, resilient propagation and stochastic gradient descent) had been tested, we created and trained an NN with the stochastic gradient descent algorithm, optimized by the Adam update rule, which was proved to be the best, using a trainset of 16,750 handwritten image samples of 28 × 28 each and a testset of 7947 samples. The final accuracy achieved is 90.13%. The general methodology followed consists of two stages: the image processing and the NN design and training. Finally, an application has been created that implements the methodology and automatically recognizes handwritten logic formulas. An interesting feature of the application is that it allows for creating new, user-oriented training sets and parameter settings, and thus new NN models.

Download Full-text

MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006054 ◽

2019 ◽

Author(s):

Karl Backstrom ◽

Marina Papatriantafilou ◽

Philippas Tsigas

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Parallel Stochastic Gradient Descent ◽

Asynchronous Parallel

Download Full-text

A Stochastic Gradient Descent Algorithm for Structural Risk Minimisation

Lecture Notes in Computer Science - Algorithmic Learning Theory ◽

10.1007/978-3-540-39624-6_17 ◽

2003 ◽

pp. 205-220 ◽

Cited By ~ 1

Author(s):

Joel Ratsaby

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Risk Minimisation ◽

Descent Algorithm ◽

Gradient Descent Algorithm ◽

Structural Risk

Download Full-text

MapReduce and Optimized Deep Network for Rainfall Prediction in Agriculture

The Computer Journal ◽

10.1093/comjnl/bxz164 ◽

2020 ◽

Vol 63 (6) ◽

pp. 900-912

Author(s):

Oswalt Manoj S ◽

Ananth J P

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Prediction Models ◽

Short Term Memory ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Mean Square ◽

Rainfall Prediction ◽

Gradient Descent Algorithm ◽

Major Factors

Abstract Rainfall prediction is the active area of research as it enables the farmers to move with the effective decision-making regarding agriculture in both cultivation and irrigation. The existing prediction models are scary as the prediction of rainfall depended on three major factors including the humidity, rainfall and rainfall recorded in the previous years, which resulted in huge time consumption and leveraged huge computational efforts associated with the analysis. Thus, this paper introduces the rainfall prediction model based on the deep learning network, convolutional long short-term memory (convLSTM) system, which promises a prediction based on the spatial-temporal patterns. The weights of the convLSTM are tuned optimally using the proposed Salp-stochastic gradient descent algorithm (S-SGD), which is the integration of Salp swarm algorithm (SSA) in the stochastic gradient descent (SGD) algorithm in order to facilitate the global optimal tuning of the weights and to assure a better prediction accuracy. On the other hand, the proposed deep learning framework is built in the MapReduce framework that enables the effective handling of the big data. The analysis using the rainfall prediction database reveals that the proposed model acquired the minimal mean square error (MSE) and percentage root mean square difference (PRD) of 0.001 and 0.0021.

Download Full-text

Locally adaptive activation functions with slope recovery for deep and physics-informed neural networks

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2020.0334 ◽

2020 ◽

Vol 476 (2239) ◽

pp. 20200334 ◽

Cited By ~ 2

Author(s):

Ameya D. Jagtap ◽

Kenji Kawaguchi ◽

George Em Karniadakis

Keyword(s):

Neural Networks ◽

Adaptive Learning ◽

Gradient Descent ◽

Activation Function ◽

Stochastic Gradient Descent ◽

Activation Functions ◽

Gradient Descent Algorithm ◽

Locally Adaptive ◽

The Matrix ◽

Base Method

We propose two approaches of locally adaptive activation functions namely, layer-wise and neuron-wise locally adaptive activation functions, which improve the performance of deep and physics-informed neural networks. The local adaptation of activation function is achieved by introducing a scalable parameter in each layer (layer-wise) and for every neuron (neuron-wise) separately, and then optimizing it using a variant of stochastic gradient descent algorithm. In order to further increase the training speed, an activation slope-based slope recovery term is added in the loss function, which further accelerates convergence, thereby reducing the training cost. On the theoretical side, we prove that in the proposed method, the gradient descent algorithms are not attracted to sub-optimal critical points or local minima under practical conditions on the initialization and learning rate, and that the gradient dynamics of the proposed method is not achievable by base methods with any (adaptive) learning rates. We further show that the adaptive activation methods accelerate the convergence by implicitly multiplying conditioning matrices to the gradient of the base method without any explicit computation of the conditioning matrix and the matrix–vector product. The different adaptive activation functions are shown to induce different implicit conditioning matrices. Furthermore, the proposed methods with the slope recovery are shown to accelerate the training process.

Download Full-text