scholarly journals Submodular Batch Selection for Training Deep Neural Networks

Author(s):  
K J Joseph ◽  
Vamshi Teja R ◽  
Krishnakant Singh ◽  
Vineeth N Balasubramanian

Mini-batch gradient descent based methods are the de facto algorithms for training neural network architectures today.We introduce a mini-batch selection strategy based on submodular function maximization. Our novel submodular formulation captures the informativeness of each sample and diversity of the whole subset. We design an efficient, greedy algorithm which can give high-quality solutions to this NP-hard combinatorial optimization problem. Our extensive experiments on standard datasets show that the deep models trained using the proposed batch selection strategy provide better generalization than Stochastic Gradient Descent as well as a popular baseline sampling strategy across different learning rates, batch sizes, and distance metrics.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Stephanie S. W. Su ◽  
Sie Long Kek

In this paper, the current variant technique of the stochastic gradient descent (SGD) approach, namely, the adaptive moment estimation (Adam) approach, is improved by adding the standard error in the updating rule. The aim is to fasten the convergence rate of the Adam algorithm. This improvement is termed as Adam with standard error (AdamSE) algorithm. On the other hand, the mean-variance portfolio optimization model is formulated from the historical data of the rate of return of the S&P 500 stock, 10-year Treasury bond, and money market. The application of SGD, Adam, adaptive moment estimation with maximum (AdaMax), Nesterov-accelerated adaptive moment estimation (Nadam), AMSGrad, and AdamSE algorithms to solve the mean-variance portfolio optimization problem is further investigated. During the calculation procedure, the iterative solution converges to the optimal portfolio solution. It is noticed that the AdamSE algorithm has the smallest iteration number. The results show that the rate of convergence of the Adam algorithm is significantly enhanced by using the AdamSE algorithm. In conclusion, the efficiency of the improved Adam algorithm using the standard error has been expressed. Furthermore, the applicability of SGD, Adam, AdaMax, Nadam, AMSGrad, and AdamSE algorithms in solving the mean-variance portfolio optimization problem is validated.


Author(s):  
JUAN HUANG ◽  
HONG CHEN ◽  
LUOQING LI

We propose a stochastic gradient descent algorithm for the least square regression with coefficient regularization. An explicit expression of the solution via sampling operator and empirical integral operator is derived. Learning rates are given in terms of the suitable choices of the step sizes and regularization parameters.


2020 ◽  
Vol 34 (04) ◽  
pp. 5053-5060
Author(s):  
Linjian Ma ◽  
Gabe Montague ◽  
Jiayu Ye ◽  
Zhewei Yao ◽  
Amir Gholami ◽  
...  

There have been several recent work claiming record times for ImageNet training. This is achieved by using large batch sizes during training to leverage parallel resources to produce faster wall-clock training times per training epoch. However, often these solutions require massive hyper-parameter tuning, which is an important cost that is often ignored. In this work, we perform an extensive analysis of large batch size training for two popular methods that is Stochastic Gradient Descent (SGD) as well as Kronecker-Factored Approximate Curvature (K-FAC) method. We evaluate the performance of these methods in terms of both wall-clock time and aggregate computational cost, and study the hyper-parameter sensitivity by performing more than 512 experiments per batch size for each of these methods. We perform experiments on multiple different models on two datasets of CIFAR-10 and SVHN. The results show that beyond a critical batch size both K-FAC and SGD significantly deviate from ideal strong scaling behaviour, and that despite common belief K-FAC does not exhibit improved large-batch scalability behavior, as compared to SGD.


Author(s):  
Manisha Padala ◽  
Sujit Gujar

In classification models, fairness can be ensured by solving a constrained optimization problem. We focus on fairness constraints like Disparate Impact, Demographic Parity, and Equalized Odds, which are non-decomposable and non-convex. Researchers define convex surrogates of the constraints and then apply convex optimization frameworks to obtain fair classifiers. Surrogates serve as an upper bound to the actual constraints, and convexifying fairness constraints is challenging. We propose a neural network-based framework, \emph{FNNC}, to achieve fairness while maintaining high accuracy in classification. The above fairness constraints are included in the loss using Lagrangian multipliers. We prove bounds on generalization errors for the constrained losses which asymptotically go to zero. The network is optimized using two-step mini-batch stochastic gradient descent. Our experiments show that FNNC performs as good as the state of the art, if not better. The experimental evidence supplements our theoretical guarantees. In summary, we have an automated solution to achieve fairness in classification, which is easily extendable to many fairness constraints.


Author(s):  
Xifeng Guo ◽  
Long Gao ◽  
Xinwang Liu ◽  
Jianping Yin

Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering oriented loss. Though promising performance has been demonstrated in various applications, we observe that a vital ingredient has been overlooked by these work that the defined clustering loss may corrupt feature space, which leads to non-representative meaningless features and this in turn hurts clustering performance. To address this issue, in this paper, we propose the Improved Deep Embedded Clustering (IDEC) algorithm to take care of data structure preservation. Specifically, we manipulate feature space to scatter data points using a clustering loss as guidance. To constrain the manipulation and maintain the local structure of data generating distribution, an under-complete autoencoder is applied. By integrating the clustering loss and autoencoder's reconstruction loss, IDEC can jointly optimize cluster labels assignment and learn features that are suitable for clustering with local structure preservation. The resultant optimization problem can be effectively solved by mini-batch stochastic gradient descent and backpropagation. Experiments on image and text datasets empirically validate the importance of local structure preservation and the effectiveness of our algorithm.


Author(s):  
Yikai Zhang ◽  
Hui Qu ◽  
Chao Chen ◽  
Dimitris Metaxas

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm.


Sign in / Sign up

Export Citation Format

Share Document