RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011544 ◽

2019 ◽

Vol 33 ◽

pp. 1544-1551 ◽

Cited By ~ 10

Author(s):

Liping Li ◽

Wei Xu ◽

Tianyi Chen ◽

Georgios B. Giannakis ◽

Qing Ling

Keyword(s):

Optimal Solution ◽

Distributed Learning ◽

Learning Task ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Subgradient Methods ◽

Gradient Descent Method ◽

Aggregation Methods ◽

Heterogeneous Datasets ◽

Byzantine Attacks

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

Download Full-text

Fast identification of a human skeleton-marker model for motion capture system using stochastic gradient descent method

2020 8th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob) ◽

10.1109/biorob49111.2020.9224442 ◽

2020 ◽

Author(s):

Tianyi Zou ◽

Tomomichi Sugihara

Keyword(s):

Motion Capture ◽

Gradient Descent ◽

Descent Method ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Motion Capture System ◽

Human Skeleton ◽

Fast Identification

Download Full-text

Adaptive Online Learning Algorithms for Blind Separation: Maximum Entropy and Minimum Mutual Information

Neural Computation ◽

10.1162/neco.1997.9.7.1457 ◽

1997 ◽

Vol 9 (7) ◽

pp. 1457-1482 ◽

Cited By ~ 218

Author(s):

Howard Hua Yang ◽

Shun-ichi Amari

Keyword(s):

Mutual Information ◽

Maximum Entropy ◽

Learning Algorithm ◽

Adaptive Method ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Natural Gradient ◽

Blind Separation ◽

Efficient Learning

There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.

Download Full-text

Hyperparameter-free optimizer of stochastic gradient descent that incorporates unit correction and moment estimation

10.1101/348557 ◽

2018 ◽

Author(s):

Kazunori D Yamada

Keyword(s):

Deep Learning ◽

Gradient Descent ◽

Mathematical Optimization ◽

Descent Method ◽

Learning Rate ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Moment Estimation ◽

Estimation System

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.

Download Full-text

SSGD: A Safe and Efficient Method of Gradient Descent

Security and Communication Networks ◽

10.1155/2021/5404061 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jinhuan Duan ◽

Xianxian Li ◽

Shiqi Gao ◽

Zili Zhong ◽

Jinyan Wang

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Optimization Problems ◽

Unit Vector ◽

Descent Method ◽

Stochastic Gradient ◽

Learning System ◽

Training Data ◽

Stochastic Gradient Descent ◽

Gradient Descent Method

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.

Download Full-text

Implicit Stochastic Gradient Descent Method for Cross-Domain Recommendation System

Sensors ◽

10.3390/s20092510 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2510

Author(s):

Nam D. Vo ◽

Minsung Hong ◽

Jason J. Jung

Keyword(s):

Gradient Descent ◽

Recommendation System ◽

Computation Time ◽

Descent Method ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Target Domain ◽

Cross Domain ◽

Gradient Descent Algorithm

The previous recommendation system applied the matrix factorization collaborative filtering (MFCF) technique to only single domains. Due to data sparsity, this approach has a limitation in overcoming the cold-start problem. Thus, in this study, we focus on discovering latent features from domains to understand the relationships between domains (called domain coherence). This approach uses potential knowledge of the source domain to improve the quality of the target domain recommendation. In this paper, we consider applying MFCF to multiple domains. Mainly, by adopting the implicit stochastic gradient descent algorithm to optimize the objective function for prediction, multiple matrices from different domains are consolidated inside the cross-domain recommendation system (CDRS). Additionally, we design a conceptual framework for CDRS, which applies to different industrial scenarios for recommenders across domains. Moreover, an experiment is devised to validate the proposed method. By using a real-world dataset gathered from Amazon Food and MovieLens, experimental results show that the proposed method improves 15.2% and 19.7% in terms of computation time and MSE over other methods on a utility matrix. Notably, a much lower convergence value of the loss function has been obtained from the experiment. Furthermore, a critical analysis of the obtained results shows that there is a dynamic balance between prediction accuracy and computational complexity.

Download Full-text

An Optimization Strategy Based on Hybrid Algorithm of Adam and SGD

MATEC Web of Conferences ◽

10.1051/matecconf/201823203007 ◽

2018 ◽

Vol 232 ◽

pp. 03007 ◽

Cited By ~ 3

Author(s):

Yijun Wang ◽

Pengyu Zhou ◽

Wenya Zhong

Keyword(s):

Optimal Solution ◽

Optimization Methods ◽

Learning Task ◽

Stochastic Gradient Descent ◽

Optimization Strategy ◽

Adaptive Optimization ◽

Hybrid Strategy ◽

Training Outcomes ◽

New Variant ◽

The Right

Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to stochastic gradient descent (SGD). So scholars (Nitish Shirish Keskar et al., 2017) proposed a hybrid strategy to start training with Adam and switch to SGD at the right time. In the learning task with a large output space, it was observed that Adam could not converge to an optimal solution (or could not converge to an extreme point in a non-convex scene) [1]. Therefore, this paper proposes a new variant of the ADAM algorithm (AMSGRAD), which not only solves the convergence problem, but also improves the empirical performance.

Download Full-text

Layer-wise learning based stochastic gradient descent method for the optimization of deep convolutional neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-190861 ◽

2019 ◽

Vol 37 (4) ◽

pp. 5641-5654 ◽

Cited By ~ 3

Author(s):

Qinghe Zheng ◽

Xinyu Tian ◽

Nan Jiang ◽

Mingqiang Yang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Gradient Descent ◽

Descent Method ◽

Stochastic Gradient ◽

Deep Convolutional Neural Network ◽

Stochastic Gradient Descent ◽

Gradient Descent Method

Download Full-text

Globally convergent stochastic optimization with optimal asymptotic distribution

Journal of Applied Probability ◽

10.1017/s0021900200015023 ◽

1998 ◽

Vol 35 (02) ◽

pp. 395-406 ◽

Cited By ~ 3

Author(s):

Jürgen Dippon

Keyword(s):

Neural Networks ◽

Stochastic Optimization ◽

Asymptotic Distribution ◽

Gradient Descent ◽

Likelihood Estimation ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Optimal Convergence Rate ◽

Globally Convergent

A stochastic gradient descent method is combined with a consistent auxiliary estimate to achieve global convergence of the recursion. Using step lengths converging to zero slower than 1/n and averaging the trajectories, yields the optimal convergence rate of 1/√n and the optimal variance of the asymptotic distribution. Possible applications can be found in maximum likelihood estimation, regression analysis, training of artificial neural networks, and stochastic optimization.

Download Full-text

Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme Under Weak Strong Convexity Assumption

Modeling and Optimization: Theory and Applications - Springer Proceedings in Mathematics & Statistics ◽

10.1007/978-3-319-66616-7_7 ◽

2017 ◽

pp. 95-117 ◽

Cited By ~ 1

Author(s):

Jie Liu ◽

Martin Takáč

Keyword(s):

Gradient Descent ◽

Descent Method ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Strong Convexity ◽

Gradient Descent Method ◽

Convexity Assumption

Download Full-text

Wireless Brain Wave Classification for Alzheimer’s Patients via Efficient Neural Network Computation

Advances in Data Science and Adaptive Analysis ◽

10.1142/s2424922x18500043 ◽

2018 ◽

Vol 10 (03) ◽

pp. 1850004

Author(s):

Grant Sheen

Keyword(s):

Neural Network ◽

Gradient Descent ◽

Descent Method ◽

Stochastic Gradient Descent ◽

Normal Person ◽

Gradient Descent Method ◽

Step Size ◽

Brain Wave ◽

Proposed Model ◽

Wireless Recording

Wireless recording and real time classification of brain waves are essential steps towards future wearable devices to assist Alzheimer’s patients in conveying their thoughts. This work is concerned with efficient computation of a dimension-reduced neural network (NN) model on Alzheimer’s patient data recorded by a wireless headset. Due to much fewer sensors in wireless recording than the number of electrodes in a traditional wired cap and shorter attention span of an Alzheimer’s patient than a normal person, the data is much more restrictive than is typical in neural robotics and mind-controlled games. To overcome this challenge, an alternating minimization (AM) method is developed for network training. AM minimizes a nonsmooth and nonconvex objective function one variable at a time while fixing the rest. The sub-problem for each variable is piecewise convex with a finite number of minima. The overall iterative AM method is descending and free of step size (learning parameter) in the standard gradient descent method. The proposed model, trained by the AM method, significantly outperforms the standard NN model trained by the stochastic gradient descent method in classifying four daily thoughts, reaching accuracies around 90% for Alzheimer’s patient. Curved decision boundaries of the proposed model with multiple hidden neurons are found analytically to establish the nonlinear nature of the classification.

Download Full-text