Competitive Cross-Entropy Loss: A Study on Training Single-Layer Neural Networks for Solving Nonlinearly Separable Classification Problems

2018 ◽  
Vol 50 (2) ◽  
pp. 1115-1122
Author(s):  
Kamaledin Ghiasi-Shirazi
2020 ◽  
Vol 34 (04) ◽  
pp. 5085-5092 ◽  
Author(s):  
Wan-Duo Kurt Ma ◽  
J. P. Lewis ◽  
W. Bastiaan Kleijn

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.


Author(s):  
Arnošt Veselý

This chapter deals with applications of artificial neural networks in classification and regression problems. Based on theoretical analysis it demonstrates that in classification problems one should use cross-entropy error function rather than the usual sum-of-square error function. Using gradient descent method for finding the minimum of the cross entropy error function, leads to the well-known backpropagation of error scheme of gradient calculation if at the output layer of the neural network the neurons with logistic or softmax output functions are used. The author believes that understanding the underlying theory presented in this chapter will help researchers in medical informatics to choose more suitable network architectures for medical applications and that it helps them to carry out the network training more effectively.


Author(s):  
DAVID A. ELIZONDO ◽  
ROBERT MORRIS ◽  
TIM WATSON ◽  
BENJAMIN N. PASSOW

The recursive deterministic perceptron (RDP) is a generalization of the single layer perceptron neural network. This neural network can separate, in a deterministic manner, any classification problem (linearly separable or not). It relies on the principle that in any nonlinearly separable (NLS) two-class classification problem, a linearly separable (LS) subset of one or more points belonging to one of the two classes can always be found. Small network topologies can be obtained when the LS subsets are of maximum cardinality. This is referred to as the problem of maximum separability and has been proven to be NP-Complete. Evolutionary computing techniques are applied to handle this problem in a more efficient way than the standard approaches in terms of complexity. These techniques enhance the RDP training in terms of speed of conversion and level of generalization. They provide an alternative to tackle large classification problems which is otherwise not feasible with the algorithmic versions of the RDP training methods.


2019 ◽  
Vol 117 (1) ◽  
pp. 161-170 ◽  
Author(s):  
Carlo Baldassi ◽  
Fabrizio Pittorino ◽  
Riccardo Zecchina

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1500
Author(s):  
Xiangde Zhang ◽  
Yuan Zhou ◽  
Jianping Wang ◽  
Xiaojun Lu

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.


2017 ◽  
Vol 29 (3) ◽  
pp. 861-866 ◽  
Author(s):  
Nolan Conaway ◽  
Kenneth J. Kurtz

Since the work of Minsky and Papert ( 1969 ), it has been understood that single-layer neural networks cannot solve nonlinearly separable classifications (i.e., XOR). We describe and test a novel divergent autoassociative architecture capable of solving nonlinearly separable classifications with a single layer of weights. The proposed network consists of class-specific linear autoassociators. The power of the model comes from treating classification problems as within-class feature prediction rather than directly optimizing a discriminant function. We show unprecedented learning capabilities for a simple, single-layer network (i.e., solving XOR) and demonstrate that the famous limitation in acquiring nonlinearly separable problems is not just about the need for a hidden layer; it is about the choice between directly predicting classes or learning to classify indirectly by predicting features.


2021 ◽  
pp. 78-89
Author(s):  
Panle Li ◽  
Xiaohui He ◽  
Dingjun Song ◽  
Zihao Ding ◽  
Mengjia Qiao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document