scholarly journals Capacity Control of ReLU Neural Networks by Basis-Path Norm

Author(s):  
Shuxin Zheng ◽  
Qi Meng ◽  
Huishuai Zhang ◽  
Wei Chen ◽  
Nenghai Yu ◽  
...  

Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. It has been shown that the generalization error bound in terms of the path norm explains the empirical generalization behaviors of the ReLU neural networks better than that of other capacity measures. Moreover, optimization algorithms which take path norm as the regularization term to the loss function, like Path-SGD, have been shown to achieve better generalization performance. However, the path norm counts the values of all paths, and hence the capacity measure based on path norm could be improperly influenced by the dependency among different paths. It is also known that each path of a ReLU network can be represented by a small group of linearly independent basis paths with multiplication and division operation, which indicates that the generalization behavior of the network only depends on only a few basis paths. Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately. We establish a generalization error bound based on this basis path norm, and show it explains the generalization behaviors of ReLU networks more accurately than previous capacity measures via extensive experiments. In addition, we develop optimization algorithms which minimize the empirical risk regularized by the basis-path norm. Our experiments on benchmark datasets demonstrate that the proposed regularization method achieves clearly better performance on the test set than the previous regularization approaches.

Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 140
Author(s):  
Yanxia Yang ◽  
Pu Wang ◽  
Xuejin Gao

A radial basis function neural network (RBFNN), with a strong function approximation ability, was proven to be an effective tool for nonlinear process modeling. However, in many instances, the sample set is limited and the model evaluation error is fixed, which makes it very difficult to construct an optimal network structure to ensure the generalization ability of the established nonlinear process model. To solve this problem, a novel RBFNN with a high generation performance (RBFNN-GP), is proposed in this paper. The proposed RBFNN-GP consists of three contributions. First, a local generalization error bound, introducing the sample mean and variance, is developed to acquire a small error bound to reduce the range of error. Second, the self-organizing structure method, based on a generalization error bound and network sensitivity, is established to obtain a suitable number of neurons to improve the generalization ability. Third, the convergence of this proposed RBFNN-GP is proved theoretically in the case of structure fixation and structure adjustment. Finally, the performance of the proposed RBFNN-GP is compared with some popular algorithms, using two numerical simulations and a practical application. The comparison results verified the effectiveness of RBFNN-GP.


2013 ◽  
Vol 2013 ◽  
pp. 1-5
Author(s):  
Zeng Fanzi ◽  
Ma Xiaolong

This paper presents the multiclass classifier based on analytical center of feasible space (MACM). This multiclass classifier is formulated as quadratic constrained linear optimization and does not need repeatedly constructing classifiers to separate a single class from all the others. Its generalization error upper bound is proved theoretically. The experiments on benchmark datasets validate the generalization performance of MACM.


2020 ◽  
Vol 34 (04) ◽  
pp. 3791-3800
Author(s):  
Daizong Ding ◽  
Mi Zhang ◽  
Xudong Pan ◽  
Min Yang ◽  
Xiangnan He

Node embedding is a crucial task in graph analysis. Recently, several methods are proposed to embed a node as a distribution rather than a vector to capture more information. Although these methods achieved noticeable improvements, their extra complexity brings new challenges. For example, the learned representations of nodes could be sensitive to external noises on the graph and vulnerable to adversarial behaviors. In this paper, we first derive an upper bound on generalization error for Wasserstein embedding via the PAC-Bayesian theory. Based on this, we propose an algorithm called Adversarial PAC-Bayesian Learning (APBL) in order to minimize the generalization error bound. Furthermore, we provide a model called Regularized Adversarial Wasserstein Embedding Network (RAWEN) as an implementation of APBL. Besides our comprehensive analysis of the robustness of RAWEN, our work for the first time explores more kinds of embedded distributions. For evaluations, we conduct extensive experiments to demonstrate the effectiveness and robustness of our proposed embedding model compared with the state-of-the-art methods.


2021 ◽  
Author(s):  
Shuo Yang ◽  
Songhua Wu ◽  
Tongliang Liu ◽  
Min Xu

A major gap between few-shot and many-shot learning is the data distribution empirically observed by the model during training. In few-shot learning, the learned model can easily become over-fitted based on the biased distribution formed by only a few training examples, while the ground-truth data distribution is more accurately uncovered in many-shot learning to learn a well-generalized model. In this paper, we propose to calibrate the distribution of these few-sample classes to be more unbiased to alleviate such an over-fitting problem. The distribution calibration is achieved by transferring statistics from the classes with sufficient examples to those few-sample classes. After calibration, an adequate number of examples can be sampled from the calibrated distribution to expand the inputs to the classifier. Extensive experiments on three datasets, miniImageNet, tieredImageNet, and CUB, show that a simple linear classifier trained using the features sampled from our calibrated distribution can outperform the state-of-the-art accuracy by a large margin. We also establish a generalization error bound for the proposed distribution-calibration-based few-shot learning, which consists of the distribution assumption error, the distribution approximation error, and the estimation error. This generalization error bound theoretically justifies the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document