Stochastic Loss Function

Qingliang Liu; Jinmei Lai

doi:10.1609/aaai.v34i04.5925

Stochastic Loss Function

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5925 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4884-4891

Author(s):

Qingliang Liu ◽

Jinmei Lai

Keyword(s):

Neural Networks ◽

Loss Function ◽

Real World ◽

Optimization Problem ◽

Deep Neural Networks ◽

Back Propagation ◽

Loss Functions ◽

Joint Optimization ◽

Neural Machine Translation ◽

Deep Networks

Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.

Download Full-text

Stratified neural networks in a time-to-event setting

10.1101/2021.02.01.429169 ◽

2021 ◽

Author(s):

Fabrizio Kuruc ◽

Harald Binder ◽

Moritz Hess

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Cox Proportional Hazards ◽

Cox Proportional Hazards Model ◽

Loss Functions ◽

Partial Likelihood ◽

Hazards Model

AbstractDeep neural networks are now frequently employed to predict survival conditional on omics-type biomarkers, e.g. by employing the partial likelihood of Cox proportional hazards model as loss function. Due to the generally limited number of observations in clinical studies, combining different data-sets has been proposed to improve learning of network parameters. However, if baseline hazards differ between the studies, the assumptions of Cox proportional hazards model are violated. Based on high dimensional transcriptome profiles from different tumor entities, we demonstrate how using a stratified partial likelihood as loss function allows for accounting for the different baseline hazards in a deep learning framework. Additionally, we compare the partial likelihood with the ranking loss, which is frequently employed as loss function in machine learning approaches due to its seemingly simplicity. Using RNA-seq data from the Cancer Genome Atlas (TCGA) we show that use of stratified loss functions leads to an overall better discriminatory power and lower prediction error compared to their nonstratified counterparts. We investigate which genes are identified to have the greatest marginal impact on prediction of survival when using different loss functions. We find that while similar genes are identified, in particular known prognostic genes receive higher importance from stratified loss functions. Taken together, pooling data from different sources for improved parameter learning of deep neural networks benefits largely from employing stratified loss functions that consider potentially varying baseline hazards. For easy application, we provide PyTorch code for stratified loss functions and an explanatory Jupyter notebook in a GitHub repository.

Download Full-text

Building robust neural networks using different loss functions

Analysis and data processing systems ◽

10.17212/2782-2001-2021-2-67-82 ◽

2021 ◽

pp. 67-82

Author(s):

Maria Sivak ◽

◽

Vladimir Timofeev ◽

Keyword(s):

Neural Network ◽

Neural Networks ◽

Loss Function ◽

Back Propagation ◽

Loss Functions ◽

Back Propagation Algorithm ◽

Error Back Propagation ◽

Propagation Algorithm ◽

Error Back Propagation Algorithm ◽

Parameter Values

The paper considers the problem of building robust neural networks using different robust loss functions. Applying such neural networks is reasonably when working with noisy data, and it can serve as an alternative to data preprocessing and to making neural network architecture more complex. In order to work adequately, the error back-propagation algorithm requires a loss function to be continuously or two-times differentiable. According to this requirement, two five robust loss functions were chosen (Andrews, Welsch, Huber, Ramsey and Fair). Using the above-mentioned functions in the error back-propagation algorithm instead of the quadratic one allows obtaining an entirely new class of neural networks. For investigating the properties of the built networks a number of computational experiments were carried out. Different values of outliers’ fraction and various numbers of epochs were considered. The first step included adjusting the obtained neural networks, which lead to choosing such values of internal loss function parameters that resulted in achieving the highest accuracy of a neural network. To determine the ranges of parameter values, a preliminary study was pursued. The results of the first stage allowed giving recommendations on choosing the best parameter values for each of the loss functions under study. The second stage dealt with comparing the investigated robust networks with each other and with the classical one. The analysis of the results shows that using the robust technique leads to a significant increase in neural network accuracy and in a learning rate.

Download Full-text

f-Similarity Preservation Loss for Soft Labels: A Demonstration on Cross-Corpus Speech Emotion Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015725 ◽

2019 ◽

Vol 33 ◽

pp. 5725-5732

Author(s):

Biqiao Zhang ◽

Yuqing Kong ◽

Georg Essl ◽

Emily Mower Provost

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Loss Function ◽

Deep Neural Networks ◽

Metric Learning ◽

Loss Functions ◽

Speech Emotion Recognition ◽

Subjective Data ◽

Dual Form ◽

Deep Metric Learning

In this paper, we propose a Deep Metric Learning (DML) approach that supports soft labels. DML seeks to learn representations that encode the similarity between examples through deep neural networks. DML generally presupposes that data can be divided into discrete classes using hard labels. However, some tasks, such as our exemplary domain of speech emotion recognition (SER), work with inherently subjective data, data for which it may not be possible to identify a single hard label. We propose a family of loss functions, fSimilarity Preservation Loss (f-SPL), based on the dual form of f-divergence for DML with soft labels. We show that the minimizer of f-SPL preserves the pairwise label similarities in the learned feature embeddings. We demonstrate the efficacy of the proposed loss function on the task of cross-corpus SER with soft labels. Our approach, which combines f-SPL and classification loss, significantly outperforms a baseline SER system with the same structure but trained with only classification loss in most experiments. We show that the presented techniques are more robust to over-training and can learn an embedding space in which the similarity between examples is meaningful.

Download Full-text

The HSIC Bottleneck: Deep Learning without Back-Propagation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5950 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5085-5092 ◽

Cited By ~ 1

Author(s):

Wan-Duo Kurt Ma ◽

J. P. Lewis ◽

W. Bastiaan Kleijn

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Back Propagation ◽

Single Layer ◽

Cross Entropy ◽

Entropy Loss ◽

Deep Networks ◽

Independence Criterion

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

Download Full-text

Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops

Nature Communications ◽

10.1038/s41467-021-25427-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Florian Stelzer ◽

André Röhm ◽

Raul Vicente ◽

Ingo Fischer ◽

Serhiy Yanchuk

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Single Neuron ◽

Deep Neural Networks ◽

Back Propagation ◽

Local Network ◽

Multiple Time ◽

Learning Tools ◽

Back Propagation Algorithm

AbstractDeep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron’s dynamics. By adjusting the feedback-modulation within the loops, we adapt the network’s connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks.

Download Full-text

Chapter 15. Human-Centered Concept Explanations for Neural Networks

10.3233/faia210362 ◽

2021 ◽

Author(s):

Chih-Kuan Yeh ◽

Been Kim ◽

Pradeep Ravikumar

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Case Studies ◽

Real World ◽

Deep Neural Networks ◽

Learning Models ◽

Real World Applications ◽

The Right ◽

Concept Activation ◽

Machine Learning Models

Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image is important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.

Download Full-text

Joint Optimization of Quantization and Structured Sparsity for Compressed Deep Neural Networks

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8682791 ◽

2019 ◽

Cited By ~ 4

Author(s):

Gaurav Srivastava ◽

Deepak Kadetotad ◽

Shihui Yin ◽

Visar Berisha ◽

Chaitali Chakrabarti ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Joint Optimization ◽

Structured Sparsity

Download Full-text

Syntactic Structure from Deep Learning

Annual Review of Linguistics ◽

10.1146/annurev-linguistics-032020-051035 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tal Linzen ◽

Marco Baroni

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Machine Translation ◽

Online Publication ◽

Deep Neural Networks ◽

Syntactic Structure ◽

Annual Review ◽

Publication Date ◽

Grammatical Knowledge ◽

Deep Networks

Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks and discuss the broader implications that this work has for theoretical linguistics. Expected final online publication date for the Annual Review of Linguistics, Volume 7 is January 14, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Transfer Learning and Deep Domain Adaptation

Advances and Applications in Deep Learning ◽

10.5772/intechopen.94072 ◽

2020 ◽

Author(s):

Wen Xu ◽

Jing He ◽

Yanfeng Shu

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Transfer Learning ◽

Real World ◽

Deep Neural Networks ◽

Domain Adaptation ◽

Fine Tuning ◽

Real World Applications ◽

Comprehensive Survey ◽

Sample Reconstruction

Transfer learning is an emerging technique in machine learning, by which we can solve a new task with the knowledge obtained from an old task in order to address the lack of labeled data. In particular deep domain adaptation (a branch of transfer learning) gets the most attention in recently published articles. The intuition behind this is that deep neural networks usually have a large capacity to learn representation from one dataset and part of the information can be further used for a new task. In this research, we firstly present the complete scenarios of transfer learning according to the domains and tasks. Secondly, we conduct a comprehensive survey related to deep domain adaptation and categorize the recent advances into three types based on implementing approaches: fine-tuning networks, adversarial domain adaptation, and sample-reconstruction approaches. Thirdly, we discuss the details of these methods and introduce some typical real-world applications. Finally, we conclude our work and explore some potential issues to be further addressed.

Download Full-text

Trimmed Robust Loss Function for Training Deep Neural Networks with Label Noise

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-20912-4_21 ◽

2019 ◽

pp. 215-222 ◽

Cited By ~ 1

Author(s):

Andrzej Rusiecki

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Label Noise

Download Full-text