Averaging Is Probably Not the Optimum Way of Aggregating Parameters in Federated Learning

Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.

Download Full-text

Averaging is Probably not the Optimum Way of Aggregating Parameters in Federated Learning

10.20944/preprints202001.0207.v1 ◽

2020 ◽

Author(s):

Peng Xiao ◽

Samuel Cheng ◽

Vladimir Stankovic ◽

Dejan Vukobratovic

Keyword(s):

Mutual Information ◽

Mobile Phones ◽

Data Center ◽

Data Privacy ◽

Wearable Devices ◽

Local Data ◽

Learning Tasks ◽

Increasing Trend ◽

Almost All ◽

Stochastic Properties

Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient decrease (SGD) based on their own local data, all locally-computed parameters will be aggregated in the server to generate an updated global model. Almost all current studies directly average different client computed parameters by default, but no one gives an explanation why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.

Download Full-text

Regularized Instance Embedding for Deep Multi-Instance Learning

Applied Sciences ◽

10.3390/app10010064 ◽

2019 ◽

Vol 10 (1) ◽

pp. 64

Author(s):

Yi Lin ◽

Honggang Zhang

Keyword(s):

Neural Network ◽

Big Data ◽

Supervised Learning ◽

Regularization Method ◽

Gradient Descent ◽

State Of The Art ◽

Stochastic Gradient Descent ◽

Learning Framework ◽

Weakly Supervised ◽

The Cost

In the era of Big Data, multi-instance learning, as a weakly supervised learning framework, has various applications since it is helpful to reduce the cost of the data-labeling process. Due to this weakly supervised setting, learning effective instance representation/embedding is challenging. To address this issue, we propose an instance-embedding regularizer that can boost the performance of both instance- and bag-embedding learning in a unified fashion. Specifically, the crux of the instance-embedding regularizer is to maximize correlation between instance-embedding and underlying instance-label similarities. The embedding-learning framework was implemented using a neural network and optimized in an end-to-end manner using stochastic gradient descent. In experiments, various applications were studied, and the results show that the proposed instance-embedding-regularization method is highly effective, having state-of-the-art performance.

Download Full-text

On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/556 ◽

2019 ◽

Author(s):

Yi Xu ◽

Zhuoning Yuan ◽

Sen Yang ◽

Rong Jin ◽

Tianbao Yang

Keyword(s):

Convex Optimization ◽

Gradient Descent ◽

Optimization Problems ◽

Upper Bounds ◽

Convex Minimization ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

First Order ◽

Convex Optimization Problems ◽

Learning Tasks

Extrapolation is a well-known technique for solving convex optimization and variational inequalities and recently attracts some attention for non-convex optimization. Several recent works have empirically shown its success in some machine learning tasks. However, it has not been analyzed for non-convex minimization and there still remains a gap between the theory and the practice. In this paper, we analyze gradient descent and stochastic gradient descent with extrapolation for finding an approximate first-order stationary point in smooth non-convex optimization problems. Our convergence upper bounds show that the algorithms with extrapolation can be accelerated than without extrapolation.

Download Full-text

Stochastic gradient descent for hybrid quantum-classical optimization

Quantum ◽

10.22331/q-2020-08-31-314 ◽

2020 ◽

Vol 4 ◽

pp. 314 ◽

Cited By ~ 2

Author(s):

Ryan Sweke ◽

Frederik Wilde ◽

Johannes Jakob Meyer ◽

Maria Schuld ◽

Paul K. Fährmann ◽

...

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Expectation Values ◽

Data Set ◽

Doubly Stochastic ◽

Learning Tasks ◽

Value Estimation ◽

Near Term ◽

Classical Optimization

Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of stochastic gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with k measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of k. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases k-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain ``doubly stochastic'' gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

Download Full-text

FNNC: Achieving Fairness through Neural Networks

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/315 ◽

2020 ◽

Author(s):

Manisha Padala ◽

Sujit Gujar

Keyword(s):

Gradient Descent ◽

Optimization Problem ◽

State Of The Art ◽

High Accuracy ◽

Stochastic Gradient Descent ◽

Classification Models ◽

Constrained Optimization Problem ◽

Lagrangian Multipliers ◽

Fairness Constraints ◽

Generalization Errors

In classification models, fairness can be ensured by solving a constrained optimization problem. We focus on fairness constraints like Disparate Impact, Demographic Parity, and Equalized Odds, which are non-decomposable and non-convex. Researchers define convex surrogates of the constraints and then apply convex optimization frameworks to obtain fair classifiers. Surrogates serve as an upper bound to the actual constraints, and convexifying fairness constraints is challenging. We propose a neural network-based framework, \emph{FNNC}, to achieve fairness while maintaining high accuracy in classification. The above fairness constraints are included in the loss using Lagrangian multipliers. We prove bounds on generalization errors for the constrained losses which asymptotically go to zero. The network is optimized using two-step mini-batch stochastic gradient descent. Our experiments show that FNNC performs as good as the state of the art, if not better. The experimental evidence supplements our theoretical guarantees. In summary, we have an automated solution to achieve fairness in classification, which is easily extendable to many fairness constraints.

Download Full-text

Deep convolutional neural network-based system for fish classification

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp2026-2039 ◽

2022 ◽

Vol 12 (2) ◽

pp. 2026

Author(s):

Ahmad AL Smadi ◽

Atif Mehmood ◽

Ahed Abugabah ◽

Eiad Almekhlaﬁ ◽

Ahmad Mohammad Al-smadi

Keyword(s):

Performance Measures ◽

Gradient Descent ◽

State Of The Art ◽

Marketing Strategies ◽

Optimization Techniques ◽

Experimental Results ◽

Stochastic Gradient Descent ◽

Proposed Model ◽

Comprehensive Comparison ◽

Adaptive Momentum

<p>In computer vision, image classification is one of the potential image processing tasks. Nowadays, fish classification is a wide considered issue within the areas of machine learning and image segmentation. Moreover, it has been extended to a variety of domains, such as marketing strategies. This paper presents an effective fish classification method based on convolutional neural networks (CNNs). The experiments were conducted on the new dataset of Bangladesh’s indigenous fish species with three kinds of splitting: 80-20%, 75-25%, and 70-30%. We provide a comprehensive comparison of several popular optimizers of CNN. In total, we perform a comparative analysis of 5 different state-of-the-art gradient descent-based optimizers, namely adaptive delta (AdaDelta), stochastic gradient descent (SGD), adaptive momentum (Adam), adaptive max pooling (Adamax), Root mean square propagation (Rmsprop), for CNN. Overall, the obtained experimental results show that Rmsprop, Adam, Adamax performed well compared to the other optimization techniques used, while AdaDelta and SGD performed the worst. Furthermore, the experimental results demonstrated that Adam optimizer attained the best results in performance measures for 70-30% and 80-20% splitting experiments, while the Rmsprop optimizer attained the best results in terms of performance measures of 70-25% splitting experiments. Finally, the proposed model is then compared with state-of-the-art deep CNNs models. Therefore, the proposed model attained the best accuracy of 98.46% in enhancing the CNN ability in classification, among others.</p>

Download Full-text

How to train a discriminative front end with stochastic gradient descent and maximum mutual information

IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. ◽

10.1109/asru.2005.1566501 ◽

2005 ◽

Cited By ~ 4

Author(s):

J. Droppo ◽

M. Mahajan ◽

A. Gunawardana ◽

A. Acero

Keyword(s):

Mutual Information ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Front End ◽

Maximum Mutual Information

Download Full-text

Co-Registration of LISS-4 Multispectral Band Data Using Mutual Information-Based Stochastic Gradient Descent Optimization

Current Science ◽

10.18520/cs/v113/i05/877-888 ◽

2017 ◽

Vol 113 (05) ◽

pp. 877

Author(s):

S. Manthira Moorthi ◽

D. Dhar ◽

R. Sivakumar

Keyword(s):

Mutual Information ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent

Download Full-text

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks

Entropy ◽

10.3390/e22050560 ◽

2020 ◽

Vol 22 (5) ◽

pp. 560

Author(s):

Shrihari Vasudevan

Keyword(s):

Neural Networks ◽

Mutual Information ◽

Gradient Descent ◽

Deep Neural Networks ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Novel Approach ◽

The Neural Network ◽

Gradient Based

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.

Download Full-text

Fingerspelling Identification for Chinese Sign Language via AlexNet-Based Transfer Learning and Adam Optimizer

Scientific Programming ◽

10.1155/2020/3291426 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Xianwei Jiang ◽

Bo Hu ◽

Suresh Chandra Satapathy ◽

Shui-Hua Wang ◽

Yu-Dong Zhang

Keyword(s):

Language Learning ◽

Sign Language ◽

Transfer Learning ◽

Gradient Descent ◽

Data Augmentation ◽

State Of The Art ◽

Stochastic Gradient Descent ◽

Mean Square ◽

Average Accuracy ◽

Using Data

As an important component of universal sign language and the basis of other sign language learning, finger sign language is of great significance. This paper proposed a novel fingerspelling identification method for Chinese Sign Language via AlexNet-based transfer learning and Adam optimizer, which tested four different configurations of transfer learning. Besides, in the experiment, Adam algorithm was compared with stochastic gradient descent with momentum (SGDM) and root mean square propagation (RMSProp) algorithms, and comparison of using data augmentation (DA) against not using DA was executed to pursue higher performance. Finally, the best accuracy of 91.48% and average accuracy of 89.48 ± 1.16% were yielded by configuration M1 (replacing the last FCL8) with Adam algorithm and using 181x DA, which indicates that our method can identify Chinese finger sign language effectively and stably. Meanwhile, the proposed method is superior to other five state-of-the-art approaches.

Download Full-text