Deep Learning with Taxonomic Loss for Plant Identification

Plant identification is a fine-grained classification task which aims to identify the family, genus, and species according to plant appearance features. Inspired by the hierarchical structure of taxonomic tree, the taxonomic loss was proposed, which could encode the hierarchical relationships among multilevel labels into the deep learning objective function by simple group and sum operation. By training various neural networks on PlantCLEF 2015 and PlantCLEF 2017 datasets, the experimental results demonstrated that the proposed loss function was easy to implement and outperformed the most commonly adopted cross-entropy loss. Eight neural networks were trained, respectively, by two different loss functions on PlantCLEF 2015 dataset, and the models trained by taxonomic loss led to significant performance improvements. On PlantCLEF 2017 dataset with 10,000 species, the SENet-154 model trained by taxonomic loss achieved the accuracies of 84.07%, 79.97%, and 73.61% at family, genus and species levels, which improved those of model trained by cross-entropy loss by 2.23%, 1.34%, and 1.08%, respectively. The taxonomic loss could further facilitate the fine-grained classification task with hierarchical labels.

Download Full-text

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/137 ◽

2019 ◽

Author(s):

Siying Wu ◽

Zheng-Jun Zha ◽

Zilei Wang ◽

Houqiang Li ◽

Feng Wu

Keyword(s):

Natural Language ◽

State Of The Art ◽

Cross Entropy ◽

Image Captioning ◽

Value Network ◽

Entropy Loss ◽

Fine Grained ◽

Performance Improvements ◽

Single Sentence ◽

Multiple State

Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarchical Policy-Value (DHPV) network for effective paragraph generation. We design new hierarchical supervisions consisting of hierarchical rewards and values at both sentence and word levels. The joint exploration of hierarchical rewards and values provides dense supervision cues for learning effective paragraph generator. We propose a new hierarchical policy-value architecture which exploits compositionality at token-to-token and sentence-to-sentence levels simultaneously and can preserve the semantic and syntactic constituent integrity. Extensive experiments on the Stanford image-paragraph benchmark have demonstrated the effectiveness of the proposed DHPV approach with performance improvements over multiple state-of-the-art methods.

Download Full-text

The HSIC Bottleneck: Deep Learning without Back-Propagation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5950 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5085-5092 ◽

Cited By ~ 1

Author(s):

Wan-Duo Kurt Ma ◽

J. P. Lewis ◽

W. Bastiaan Kleijn

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Back Propagation ◽

Single Layer ◽

Cross Entropy ◽

Entropy Loss ◽

Deep Networks ◽

Independence Criterion

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

Download Full-text

Local Geometry of Cross Entropy Loss in Learning One-Hidden-Layer Neural Networks

2019 IEEE International Symposium on Information Theory (ISIT) ◽

10.1109/isit.2019.8849289 ◽

2019 ◽

Cited By ~ 1

Author(s):

Haoyu Fu ◽

Yuejie Chi ◽

Yingbin Liang

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Local Geometry ◽

Entropy Loss ◽

Hidden Layer

Download Full-text

Shaping the learning landscape in neural networks around wide flat minima

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1908636117 ◽

2019 ◽

Vol 117 (1) ◽

pp. 161-170 ◽

Cited By ~ 2

Author(s):

Carlo Baldassi ◽

Fabrizio Pittorino ◽

Riccardo Zecchina

Keyword(s):

Neural Networks ◽

Loss Function ◽

Critical Points ◽

Learning Process ◽

Numerical Study ◽

Cross Entropy ◽

Stochastic Gradient Descent ◽

Neural Network Models ◽

Entropy Loss ◽

Error Loss

Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points and such minimizers are often satisfactory at avoiding overfitting. How these 2 features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far-reaching open question. In this paper we study basic nonconvex 1- and 2-layer neural network models that learn random patterns and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy-driven greedy and message-passing algorithms that focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian, and their generalization performance on real data.

Download Full-text

Personal Interest Attention Graph Neural Networks for Session-Based Recommendation

Entropy ◽

10.3390/e23111500 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1500

Author(s):

Xiangde Zhang ◽

Yuan Zhou ◽

Jianping Wang ◽

Xiaojun Lu

Keyword(s):

Neural Network ◽

Neural Networks ◽

Objective Function ◽

Cross Entropy ◽

Personal Interest ◽

Entropy Loss ◽

Convolutional Networks ◽

The Cross ◽

Graph Neural Networks

Session-based recommendations aim to predict a user’s next click based on the user’s current and historical sessions, which can be applied to shopping websites and APPs. Existing session-based recommendation methods cannot accurately capture the complex transitions between items. In addition, some approaches compress sessions into a fixed representation vector without taking into account the user’s interest preferences at the current moment, thus limiting the accuracy of recommendations. Considering the diversity of items and users’ interests, a personalized interest attention graph neural network (PIA-GNN) is proposed for session-based recommendation. This approach utilizes personalized graph convolutional networks (PGNN) to capture complex transitions between items, invoking an interest-aware mechanism to activate users’ interest in different items adaptively. In addition, a self-attention layer is used to capture long-term dependencies between items when capturing users’ long-term preferences. In this paper, the cross-entropy loss is used as the objective function to train our model. We conduct rich experiments on two real datasets, and the results show that PIA-GNN outperforms existing personalized session-aware recommendation methods.

Download Full-text

Ranking Loss: Maximizing the Success Rate in Deep Learning Side-Channel Analysis

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i1.25-55 ◽

2020 ◽

pp. 25-55

Author(s):

Gabriel Zaid ◽

Lilian Bossuet ◽

François Dassance ◽

Amaury Habrard ◽

Alexandre Venelli

Keyword(s):

Deep Learning ◽

Success Rate ◽

Loss Function ◽

Estimation Error ◽

Approximation Error ◽

Cross Entropy ◽

Side Channel ◽

Entropy Loss ◽

The Cross ◽

Ranking Loss

The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Compared to template attacks, deep learning techniques can deal with protected implementations, such as masking or desynchronization, without substantial preprocessing. However, important issues are still open. One challenging problem is to adapt the methods classically used in the machine learning field (e.g. loss function, performance metrics) to the specific side-channel context in order to obtain optimal results. We propose a new loss function derived from the learning to rank approach that helps preventing approximation and estimation errors, induced by the classical cross-entropy loss. We theoretically demonstrate that this new function, called Ranking Loss (RkL), maximizes the success rate by minimizing the ranking error of the secret key in comparison with all other hypotheses. The resulting model converges towards the optimal distinguisher when considering the mutual information between the secret and the leakage. Consequently, the approximation error is prevented. Furthermore, the estimation error, induced by the cross-entropy, is reduced by up to 23%. When the ranking loss is used, the convergence towards the best solution is up to 23% faster than a model using the cross-entropy loss function. We validate our theoretical propositions on public datasets.

Download Full-text

Fine-Grained Image Retrieval via Piecewise Cross Entropy loss

Image and Vision Computing ◽

10.1016/j.imavis.2019.10.006 ◽

2020 ◽

Vol 93 ◽

pp. 103820 ◽

Cited By ~ 2

Author(s):

Xianxian Zeng ◽

Yun Zhang ◽

Xiaodong Wang ◽

Kairui Chen ◽

Dong Li ◽

...

Keyword(s):

Image Retrieval ◽

Cross Entropy ◽

Entropy Loss ◽

Fine Grained

Download Full-text

Neural Networks with Multidimensional Cross-Entropy Loss Functions

Computational Data and Social Networks - Lecture Notes in Computer Science ◽

10.1007/978-3-030-34980-6_5 ◽

2019 ◽

pp. 57-62 ◽

Cited By ~ 3

Author(s):

Alexander Semenov ◽

Vladimir Boginski ◽

Eduardo L. Pasiliao

Keyword(s):

Neural Networks ◽

Cross Entropy ◽

Loss Functions ◽

Entropy Loss

Download Full-text

Deep neural networks – a developmental perspective

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2016.9 ◽

2016 ◽

Vol 5 ◽

Cited By ~ 4

Author(s):

Biing Hwang Juang

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Neural Networks ◽

Developmental Perspective ◽

Performance Improvements ◽

Research Activities ◽

Statistical Pattern ◽

Significant Performance ◽

The Subject ◽

Learning Guide

There is a recent surge in research activities around “deep neural networks” (DNN). While the notion of neural networks have enjoyed cycles of enthusiasm, which may continue its ebb and flow, concrete advances now abound. Significant performance improvements have been shown in a number of pattern recognition tasks. As a technical topic, DNN is important in classes and tutorial articles and related learning resources are available. Streams of questions, nonetheless, never subside from students or researchers and there appears to be a frustrating tendency among the learners to treat DNN simply as a black box. This is an awkward and alarming situation in education. This paper thus has the intent to help the reader to properly understand DNN, not just its mechanism (what and how) but its motivation and justification (why). It is written from a developmental perspective with a comprehensive view, from the very basic but oft-forgotten principle of statistical pattern recognition and decision theory, through the problem stages that may be encountered during system design, to key ideas that led to the new advance. This paper can serve as a learning guide with historical reviews and important references, helpful in reaching an insightful understanding of the subject.

Download Full-text

Co-Training for Visual Object Recognition Based on Self-Supervised Models Using a Cross-Entropy Regularization

Entropy ◽

10.3390/e23040423 ◽

2021 ◽

Vol 23 (4) ◽

pp. 423

Author(s):

Gabriel Díaz ◽

Billy Peralta ◽

Luis Caro ◽

Orietta Nicolis

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Object Recognition ◽

Training Model ◽

Cross Entropy ◽

Visual Object ◽

Visual Object Recognition ◽

Visual Objects ◽

Learning Techniques ◽

Proposed Model

Automatic recognition of visual objects using a deep learning approach has been successfully applied to multiple areas. However, deep learning techniques require a large amount of labeled data, which is usually expensive to obtain. An alternative is to use semi-supervised models, such as co-training, where multiple complementary views are combined using a small amount of labeled data. A simple way to associate views to visual objects is through the application of a degree of rotation or a type of filter. In this work, we propose a co-training model for visual object recognition using deep neural networks by adding layers of self-supervised neural networks as intermediate inputs to the views, where the views are diversified through the cross-entropy regularization of their outputs. Since the model merges the concepts of co-training and self-supervised learning by considering the differentiation of outputs, we called it Differential Self-Supervised Co-Training (DSSCo-Training). This paper presents some experiments using the DSSCo-Training model to well-known image datasets such as MNIST, CIFAR-100, and SVHN. The results indicate that the proposed model is competitive with the state-of-art models and shows an average relative improvement of 5% in accuracy for several datasets, despite its greater simplicity with respect to more recent approaches.

Download Full-text