Improved Categorical Cross-Entropy Loss for Training Deep Neural Networks with Noisy Labels

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Analyzing Deep Neural Networks with Noisy Labels

2020 IEEE International Conference on Big Data and Smart Computing (BigComp) ◽

10.1109/bigcomp48618.2020.00012 ◽

2020 ◽

Author(s):

Chan Lim ◽

Sangwoo Han ◽

Jongwuk Lee

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Noisy Labels

Download Full-text

Training Robust Deep Neural Networks on Noisy Labels Using Adaptive Sample Selection with Disagreement

IEEE Access ◽

10.1109/access.2021.3119582 ◽

2021 ◽

pp. 1-1

Author(s):

Hiroshi Takeda ◽

Soh Yoshida ◽

Mitsuji Muneyasu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Sample Selection ◽

Noisy Labels

Download Full-text

Local Geometry of Cross Entropy Loss in Learning One-Hidden-Layer Neural Networks

2019 IEEE International Symposium on Information Theory (ISIT) ◽

10.1109/isit.2019.8849289 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haoyu Fu ◽

Yuejie Chi ◽

Yingbin Liang

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Local Geometry ◽

Entropy Loss ◽

Hidden Layer

Download Full-text

Robust Training of Deep Neural Networks with Noisy Labels by Graph Label Propagation

Communications in Computer and Information Science - Frontiers of Computer Vision ◽

10.1007/978-3-030-81638-4_23 ◽

2021 ◽

pp. 281-293

Author(s):

Yuichiro Nomura ◽

Takio Kurita

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Label Propagation ◽

Graph Label ◽

Noisy Labels

Download Full-text

Shaping the learning landscape in neural networks around wide flat minima

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1908636117 ◽

2019 ◽

Vol 117 (1) ◽

pp. 161-170 ◽

Cited By ~ 2

Author(s):

Carlo Baldassi ◽

Fabrizio Pittorino ◽

Riccardo Zecchina

Keyword(s):

Neural Networks ◽

Loss Function ◽

Critical Points ◽

Learning Process ◽

Numerical Study ◽

Cross Entropy ◽

Stochastic Gradient Descent ◽

Neural Network Models ◽

Entropy Loss ◽

Error Loss

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Download Full-text

Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment

10.36227/techrxiv.17031920.v1 ◽

2021 ◽

Author(s):

Mengke Li ◽

Yiu-ming Cheung ◽

Yang Lu

Keyword(s):

Visual Recognition ◽

Deep Neural Networks ◽

Sampling Strategy ◽

Cross Entropy ◽

Superior Performance ◽

Great Success ◽

Effective Number ◽

Entropy Loss ◽

Benchmark Datasets ◽

Varied Amplitude

<p>Long-tailed data is still a big challenge for deep neural networks, even though they have achieved great success on balanced data. We observe that vanilla training on long-tailed data with cross-entropy loss makes the instance-rich head classes severely squeeze the spatial distribution of the tail classes, which leads to difficulty in classifying tail class samples. Furthermore, the original cross-entropy loss can only propagate gradient short-lively because the gradient in softmax form rapidly approaches zero as the logit difference increases. This phenomenon is called softmax saturation. It is unfavorable for training on balanced data, but can be utilized to adjust the validity of the samples in long-tailed data, thereby solving the distorted embedding space of long-tailed problems. To this end, this paper therefore proposes the Gaussian clouded logit adjustment by Gaussian perturbing different class logits with varied amplitude. We define the amplitude of perturbation as cloud size and set relatively large cloud sizes to tail classes. The large cloud size can reduce the softmax saturation and thereby making tail class samples more active as well as enlarging the embedding space. To alleviate the bias in the classifier, we accordingly propose the class-based effective number sampling strategy with classifier re-training. Extensive experiments on benchmark datasets validate the superior performance of the proposed method.</p><br>

Download Full-text

Personal Interest Attention Graph Neural Networks for Session-Based Recommendation

Entropy ◽

10.3390/e23111500 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1500

Author(s):

Xiangde Zhang ◽

Yuan Zhou ◽

Jianping Wang ◽

Xiaojun Lu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Objective Function ◽

Cross Entropy ◽

Personal Interest ◽

Entropy Loss ◽

Convolutional Networks ◽

The Cross ◽

Graph Neural Networks

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.

Download Full-text

Neural Networks with Multidimensional Cross-Entropy Loss Functions

Computational Data and Social Networks - Lecture Notes in Computer Science ◽

10.1007/978-3-030-34980-6_5 ◽

2019 ◽

pp. 57-62 ◽

Cited By ~ 3

Author(s):

Alexander Semenov ◽

Vladimir Boginski ◽

Eduardo L. Pasiliao

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Loss Functions ◽

Entropy Loss

Download Full-text