Competitive Cross-Entropy Loss: A Study on Training Single-Layer Neural Networks for Solving Nonlinearly Separable Classification Problems

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

Download Full-text

Fuzzily modular single-layer RBF neural networks for solving large-scale classification problems

The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05. ◽

10.1109/fuzzy.2005.1452545 ◽

2005 ◽

Author(s):

Gao Daqi ◽

Tong Zhen

Keyword(s):

Neural Networks ◽

Large Scale ◽

Single Layer ◽

Rbf Neural Networks ◽

Classification Problems ◽

Scale Classification

Download Full-text

Local Geometry of Cross Entropy Loss in Learning One-Hidden-Layer Neural Networks

2019 IEEE International Symposium on Information Theory (ISIT) ◽

10.1109/isit.2019.8849289 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haoyu Fu ◽

Yuejie Chi ◽

Yingbin Liang

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Local Geometry ◽

Entropy Loss ◽

Hidden Layer

Download Full-text

Classification and Prediction with Neural Networks

Data Mining and Medical Knowledge Management ◽

10.4018/978-1-60566-218-3.ch004 ◽

2011 ◽

pp. 76-107

Author(s):

Arnošt Veselý

Keyword(s):

Neural Networks ◽

Error Function ◽

Descent Method ◽

Cross Entropy ◽

Gradient Descent Method ◽

Classification Problems ◽

Network Training ◽

The Neural Network ◽

Gradient Calculation ◽

Entropy Error

This chapter deals with applications of artificial neural networks in classification and regression problems. Based on theoretical analysis it demonstrates that in classification problems one should use cross-entropy error function rather than the usual sum-of-square error function. Using gradient descent method for finding the minimum of the cross entropy error function, leads to the well-known backpropagation of error scheme of gradient calculation if at the output layer of the neural network the neurons with logistic or softmax output functions are used. The author believes that understanding the underlying theory presented in this chapter will help researchers in medical informatics to choose more suitable network architectures for medical applications and that it helps them to carry out the network training more effectively.

Download Full-text

CONSTRUCTIVE RECURSIVE DETERMINISTIC PERCEPTRON NEURAL NETWORKS WITH GENETIC ALGORITHMS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001413500195 ◽

2013 ◽

Vol 27 (06) ◽

pp. 1350019

Author(s):

DAVID A. ELIZONDO ◽

ROBERT MORRIS ◽

TIM WATSON ◽

BENJAMIN N. PASSOW

Keyword(s):

Neural Network ◽

Neural Networks ◽

Genetic Algorithms ◽

Single Layer ◽

Classification Problem ◽

Training Methods ◽

Maximum Cardinality ◽

Classification Problems ◽

Network Topologies ◽

Np Complete

The recursive deterministic perceptron (RDP) is a generalization of the single layer perceptron neural network. This neural network can separate, in a deterministic manner, any classification problem (linearly separable or not). It relies on the principle that in any nonlinearly separable (NLS) two-class classification problem, a linearly separable (LS) subset of one or more points belonging to one of the two classes can always be found. Small network topologies can be obtained when the LS subsets are of maximum cardinality. This is referred to as the problem of maximum separability and has been proven to be NP-Complete. Evolutionary computing techniques are applied to handle this problem in a more efficient way than the standard approaches in terms of complexity. These techniques enhance the RDP training in terms of speed of conversion and level of generalization. They provide an alternative to tackle large classification problems which is otherwise not feasible with the algorithmic versions of the RDP training methods.

Download Full-text

Shaping the learning landscape in neural networks around wide flat minima

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1908636117 ◽

2019 ◽

Vol 117 (1) ◽

pp. 161-170 ◽

Cited By ~ 2

Author(s):

Carlo Baldassi ◽

Fabrizio Pittorino ◽

Riccardo Zecchina

Keyword(s):

Neural Networks ◽

Loss Function ◽

Critical Points ◽

Learning Process ◽

Numerical Study ◽

Cross Entropy ◽

Stochastic Gradient Descent ◽

Neural Network Models ◽

Entropy Loss ◽

Error Loss

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Download Full-text

Personal Interest Attention Graph Neural Networks for Session-Based Recommendation

Entropy ◽

10.3390/e23111500 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1500

Author(s):

Xiangde Zhang ◽

Yuan Zhou ◽

Jianping Wang ◽

Xiaojun Lu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Objective Function ◽

Cross Entropy ◽

Personal Interest ◽

Entropy Loss ◽

Convolutional Networks ◽

The Cross ◽

Graph Neural Networks

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.

Download Full-text

Solving Nonlinearly Separable Classifications in a Single-Layer Neural Network

Neural Computation ◽

10.1162/neco_a_00931 ◽

2017 ◽

Vol 29 (3) ◽

pp. 861-866 ◽

Cited By ~ 1

Author(s):

Nolan Conaway ◽

Kenneth J. Kurtz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Discriminant Function ◽

Single Layer ◽

Classification Problems ◽

Learning Capabilities ◽

Separable Problems ◽

Hidden Layer ◽

Single Layer Network

Since the work of Minsky and Papert ( 1969 ), it has been understood that single-layer neural networks cannot solve nonlinearly separable classifications (i.e., XOR). We describe and test a novel divergent autoassociative architecture capable of solving nonlinearly separable classifications with a single layer of weights. The proposed network consists of class-specific linear autoassociators. The power of the model comes from treating classification problems as within-class feature prediction rather than directly optimizing a discriminant function. We show unprecedented learning capabilities for a simple, single-layer network (i.e., solving XOR) and demonstrate that the famous limitation in acquiring nonlinearly separable problems is not just about the need for a hidden layer; it is about the choice between directly predicting classes or learning to classify indirectly by predicting features.

Download Full-text