scholarly journals RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Author(s):  
Liping Li ◽  
Wei Xu ◽  
Tianyi Chen ◽  
Georgios B. Giannakis ◽  
Qing Ling

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.

1997 ◽  
Vol 9 (7) ◽  
pp. 1457-1482 ◽  
Author(s):  
Howard Hua Yang ◽  
Shun-ichi Amari

There are two major approaches for blind separation: maximum entropy (ME) and minimum mutual information (MMI). Both can be implemented by the stochastic gradient descent method for obtaining the demixing matrix. The MI is the contrast function for blind separation; the entropy is not. To justify the ME, the relation between ME and MMI is first elucidated by calculating the first derivative of the entropy and proving that the mean subtraction is necessary in applying the ME and at the solution points determined by the MI, the ME will not update the demixing matrix in the directions of increasing the cross-talking. Second, the natural gradient instead of the ordinary gradient is introduced to obtain efficient algorithms, because the parameter space is a Riemannian space consisting of matrices. The mutual information is calculated by applying the Gram-Charlier expansion to approximate probability density functions of the outputs. Finally, we propose an efficient learning algorithm that incorporates with an adaptive method of estimating the unknown cumulants. It is shown by computer simulation that the convergence of the stochastic descent algorithms is improved by using the natural gradient and the adaptively estimated cumulants.


2018 ◽  
Author(s):  
Kazunori D Yamada

ABSTRACTIn the deep learning era, stochastic gradient descent is the most common method used for optimizing neural network parameters. Among the various mathematical optimization methods, the gradient descent method is the most naive. Adjustment of learning rate is necessary for quick convergence, which is normally done manually with gradient descent. Many optimizers have been developed to control the learning rate and increase convergence speed. Generally, these optimizers adjust the learning rate automatically in response to learning status. These optimizers were gradually improved by incorporating the effective aspects of earlier methods. In this study, we developed a new optimizer: YamAdam. Our optimizer is based on Adam, which utilizes the first and second moments of previous gradients. In addition to the moment estimation system, we incorporated an advantageous part of AdaDelta, namely a unit correction system, into YamAdam. According to benchmark tests on some common datasets, our optimizer showed similar or faster convergent performance compared to the existing methods. YamAdam is an option as an alternative optimizer for deep learning.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jinhuan Duan ◽  
Xianxian Li ◽  
Shiqi Gao ◽  
Zili Zhong ◽  
Jinyan Wang

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2510
Author(s):  
Nam D. Vo ◽  
Minsung Hong ◽  
Jason J. Jung

The previous recommendation system applied the matrix factorization collaborative filtering (MFCF) technique to only single domains. Due to data sparsity, this approach has a limitation in overcoming the cold-start problem. Thus, in this study, we focus on discovering latent features from domains to understand the relationships between domains (called domain coherence). This approach uses potential knowledge of the source domain to improve the quality of the target domain recommendation. In this paper, we consider applying MFCF to multiple domains. Mainly, by adopting the implicit stochastic gradient descent algorithm to optimize the objective function for prediction, multiple matrices from different domains are consolidated inside the cross-domain recommendation system (CDRS). Additionally, we design a conceptual framework for CDRS, which applies to different industrial scenarios for recommenders across domains. Moreover, an experiment is devised to validate the proposed method. By using a real-world dataset gathered from Amazon Food and MovieLens, experimental results show that the proposed method improves 15.2% and 19.7% in terms of computation time and MSE over other methods on a utility matrix. Notably, a much lower convergence value of the loss function has been obtained from the experiment. Furthermore, a critical analysis of the obtained results shows that there is a dynamic balance between prediction accuracy and computational complexity.


2018 ◽  
Vol 232 ◽  
pp. 03007 ◽  
Author(s):  
Yijun Wang ◽  
Pengyu Zhou ◽  
Wenya Zhong

Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to stochastic gradient descent (SGD). So scholars (Nitish Shirish Keskar et al., 2017) proposed a hybrid strategy to start training with Adam and switch to SGD at the right time. In the learning task with a large output space, it was observed that Adam could not converge to an optimal solution (or could not converge to an extreme point in a non-convex scene) [1]. Therefore, this paper proposes a new variant of the ADAM algorithm (AMSGRAD), which not only solves the convergence problem, but also improves the empirical performance.


1998 ◽  
Vol 35 (02) ◽  
pp. 395-406 ◽  
Author(s):  
Jürgen Dippon

A stochastic gradient descent method is combined with a consistent auxiliary estimate to achieve global convergence of the recursion. Using step lengths converging to zero slower than 1/n and averaging the trajectories, yields the optimal convergence rate of 1/√n and the optimal variance of the asymptotic distribution. Possible applications can be found in maximum likelihood estimation, regression analysis, training of artificial neural networks, and stochastic optimization.


2018 ◽  
Vol 10 (03) ◽  
pp. 1850004
Author(s):  
Grant Sheen

Wireless recording and real time classification of brain waves are essential steps towards future wearable devices to assist Alzheimer’s patients in conveying their thoughts. This work is concerned with efficient computation of a dimension-reduced neural network (NN) model on Alzheimer’s patient data recorded by a wireless headset. Due to much fewer sensors in wireless recording than the number of electrodes in a traditional wired cap and shorter attention span of an Alzheimer’s patient than a normal person, the data is much more restrictive than is typical in neural robotics and mind-controlled games. To overcome this challenge, an alternating minimization (AM) method is developed for network training. AM minimizes a nonsmooth and nonconvex objective function one variable at a time while fixing the rest. The sub-problem for each variable is piecewise convex with a finite number of minima. The overall iterative AM method is descending and free of step size (learning parameter) in the standard gradient descent method. The proposed model, trained by the AM method, significantly outperforms the standard NN model trained by the stochastic gradient descent method in classifying four daily thoughts, reaching accuracies around 90% for Alzheimer’s patient. Curved decision boundaries of the proposed model with multiple hidden neurons are found analytically to establish the nonlinear nature of the classification.


Sign in / Sign up

Export Citation Format

Share Document