Inference-Masked Loss for Deep Structured Output Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/382 ◽

2020 ◽

Author(s):

Quan Guo ◽

Hossein Rajaby Faghihi ◽

Yue Zhang ◽

Andrzej Uszok ◽

Parisa Kordjamshidi

Keyword(s):

Domain Knowledge ◽

Deep Neural Networks ◽

Loss Functions ◽

Structured Learning ◽

Local Error ◽

Structured Output Learning ◽

Output Constraints ◽

Log Likelihood ◽

Gradient Based ◽

Structured Output

Structured learning algorithms usually involve an inference phase that selects the best global output variables assignments based on the local scores of all possible assignments. We extend deep neural networks with structured learning to combine the power of learning representations and leveraging the use of domain knowledge in the form of output constraints during training. Introducing a non-differentiable inference module to gradient-based training is a critical challenge. Compared to using conventional loss functions that penalize every local error independently, we propose an inference-masked loss that takes into account the effect of inference and does not penalize the local errors that can be corrected by the inference. We empirically show the inference-masked loss combined with the negative log-likelihood loss improves the performance on different tasks, namely entity relation recognition on CoNLL04 and ACE2005 corpora, and spatial role labeling on CLEF 2017 mSpRL dataset. We show the proposed approach helps to achieve better generalizability, particularly in the low-data regime.

Download Full-text

Multi-utility Learning: Structured-Output Learning with Multiple Annotation-Specific Loss Functions

Lecture Notes in Computer Science - Energy Minimization Methods in Computer Vision and Pattern Recognition ◽

10.1007/978-3-319-14612-6_30 ◽

2015 ◽

pp. 406-420

Author(s):

Roman Shapovalov ◽

Dmitry Vetrov ◽

Anton Osokin ◽

Pushmeet Kohli

Keyword(s):

Loss Functions ◽

Specific Loss ◽

Structured Output Learning ◽

Structured Output

Download Full-text

Weakly-Supervised Structured Output Learning with Flexible and Latent Graphs Using High-Order Loss Functions

2015 IEEE International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2015.81 ◽

2015 ◽

Cited By ~ 4

Author(s):

Gustavo Carneiro ◽

Tingying Peng ◽

Christine Bayer ◽

Nassir Navab

Keyword(s):

High Order ◽

Loss Functions ◽

Structured Output Learning ◽

Structured Output ◽

Weakly Supervised

Download Full-text

Improving Adversarial Attacks on Deep Neural Networks via Constricted Gradient-based Perturbations

Information Sciences ◽

10.1016/j.ins.2021.04.033 ◽

2021 ◽

Author(s):

Yatie Xiao ◽

Chi-Man Pun

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Gradient Based

Download Full-text

Gradient Based Algorithms with Loss Functions and Kernels for Improved On-Policy Control

Lecture Notes in Computer Science - Recent Advances in Reinforcement Learning ◽

10.1007/978-3-642-29946-9_7 ◽

2012 ◽

pp. 30-41

Author(s):

Matthew Robards ◽

Peter Sunehag

Keyword(s):

Loss Functions ◽

Policy Control ◽

Gradient Based

Download Full-text

Genome Annotation with Structured Output Learning

Advanced Structured Prediction ◽

10.7551/mitpress/9969.003.0018 ◽

2014 ◽

Keyword(s):

Genome Annotation ◽

Structured Output Learning ◽

Structured Output

Download Full-text

Structured Output Learning with Candidate Labels for Local Parts

Advanced Information Systems Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40991-2_22 ◽

2013 ◽

pp. 336-352

Author(s):

Chengtao Li ◽

Jianwen Zhang ◽

Zheng Chen

Keyword(s):

Structured Output Learning ◽

Structured Output

Download Full-text

Weakly supervised structured output learning for semantic segmentation

2012 IEEE Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2012.6247757 ◽

2012 ◽

Cited By ~ 72

Author(s):

A. Vezhnevets ◽

V. Ferrari ◽

J. M. Buhmann

Keyword(s):

Semantic Segmentation ◽

Structured Output Learning ◽

Structured Output ◽

Weakly Supervised

Download Full-text

Comparison of Loss Functions for Training of Deep Neural Networks in Shogi

2018 Conference on Technologies and Applications of Artificial Intelligence (TAAI) ◽

10.1109/taai.2018.00014 ◽

2018 ◽

Author(s):

Hanhua Zhu ◽

Tomoyuki Kaneko

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Loss Functions

Download Full-text

Stratified neural networks in a time-to-event setting

10.1101/2021.02.01.429169 ◽

2021 ◽

Author(s):

Fabrizio Kuruc ◽

Harald Binder ◽

Moritz Hess

Keyword(s):

Neural Networks ◽

Loss Function ◽

Deep Neural Networks ◽

Proportional Hazards ◽

Proportional Hazards Model ◽

Cox Proportional Hazards ◽

Cox Proportional Hazards Model ◽

Loss Functions ◽

Partial Likelihood ◽

Hazards Model

AbstractDeep neural networks are now frequently employed to predict survival conditional on omics-type biomarkers, e.g. by employing the partial likelihood of Cox proportional hazards model as loss function. Due to the generally limited number of observations in clinical studies, combining different data-sets has been proposed to improve learning of network parameters. However, if baseline hazards differ between the studies, the assumptions of Cox proportional hazards model are violated. Based on high dimensional transcriptome profiles from different tumor entities, we demonstrate how using a stratified partial likelihood as loss function allows for accounting for the different baseline hazards in a deep learning framework. Additionally, we compare the partial likelihood with the ranking loss, which is frequently employed as loss function in machine learning approaches due to its seemingly simplicity. Using RNA-seq data from the Cancer Genome Atlas (TCGA) we show that use of stratified loss functions leads to an overall better discriminatory power and lower prediction error compared to their nonstratified counterparts. We investigate which genes are identified to have the greatest marginal impact on prediction of survival when using different loss functions. We find that while similar genes are identified, in particular known prognostic genes receive higher importance from stratified loss functions. Taken together, pooling data from different sources for improved parameter learning of deep neural networks benefits largely from employing stratified loss functions that consider potentially varying baseline hazards. For easy application, we provide PyTorch code for stratified loss functions and an explanatory Jupyter notebook in a GitHub repository.

Download Full-text

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5767 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3601-3608

Author(s):

Minhao Cheng ◽

Jinfeng Yi ◽

Pin-Yu Chen ◽

Huan Zhang ◽

Cho-Jui Hsieh

Keyword(s):

Deep Neural Networks ◽

Classification Problem ◽

Text Summarization ◽

Loss Functions ◽

Challenging Problem ◽

Success Rates ◽

Projected Gradient Method ◽

Input Space ◽

Adversarial Examples ◽

Output Space

Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the well-evaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.

Download Full-text