scholarly journals Why Does Large Batch Training Result in Poor Generalization? A Comprehensive Explanation and a Better Strategy from the Viewpoint of Stochastic Optimization

2018 ◽  
Vol 30 (7) ◽  
pp. 2005-2023 ◽  
Author(s):  
Tomoumi Takase ◽  
Satoshi Oyama ◽  
Masahito Kurihara

We present a comprehensive framework of search methods, such as simulated annealing and batch training, for solving nonconvex optimization problems. These methods search a wider range by gradually decreasing the randomness added to the standard gradient descent method. The formulation that we define on the basis of this framework can be directly applied to neural network training. This produces an effective approach that gradually increases batch size during training. We also explain why large batch training degrades generalization performance, which previous studies have not clarified.

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jinhuan Duan ◽  
Xianxian Li ◽  
Shiqi Gao ◽  
Zili Zhong ◽  
Jinyan Wang

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Shengwei Yao ◽  
Yuping Wu ◽  
Jielan Yang ◽  
Jieqiong Xu

We proposed a three-term gradient descent method that can be well applied to address the optimization problems in this article. The search direction of the obtained method is generated in a specific subspace. Specifically, a quadratic approximation model is applied in the process of generating the search direction. In order to reduce the amount of calculation and make the best use of the existing information, the subspace was made up of the gradient of the current and prior iteration point and the previous search direction. By using the subspace-based optimization technology, the global convergence result is established under Wolfe line search. The results of numerical experiments show that the new method is effective and robust.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Keren Li ◽  
Shijie Wei ◽  
Pan Gao ◽  
Feihao Zhang ◽  
Zengrong Zhou ◽  
...  

AbstractThe gradient descent method is central to numerical optimization and is the key ingredient in many machine learning algorithms. It promises to find a local minimum of a function by iteratively moving along the direction of the steepest descent. Since for high-dimensional problems the required computational resources can be prohibitive, it is desirable to investigate quantum versions of the gradient descent, such as the recently proposed (Rebentrost et al.1). Here, we develop this protocol and implement it on a quantum processor with limited resources. A prototypical experiment is shown with a four-qubit nuclear magnetic resonance quantum processor, which demonstrates the iterative optimization process. Experimentally, the final point converged to the local minimum with a fidelity >94%, quantified via full-state tomography. Moreover, our method can be employed to a multidimensional scaling problem, showing the potential to outperform its classical counterparts. Considering the ongoing efforts in quantum information and data science, our work may provide a faster approach to solving high-dimensional optimization problems and a subroutine for future practical quantum computers.


Sign in / Sign up

Export Citation Format

Share Document