The Interplay Between Loss Functions and Structural Constraints in Dependency Parsing

Northern European Journal of Language Technology ◽

10.3384/nejlt.2000-1533.19643 ◽

2019 ◽

Vol 6 ◽

pp. 43-66

Author(s):

Robin Kurtz ◽

Marco Kuhlmann

Keyword(s):

Loss Function ◽

Optimization Problem ◽

Combinatorial Optimization Problem ◽

Loss Functions ◽

Structural Constraints ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Semantic Parsing ◽

Decoding Algorithms ◽

Dependency Trees

Dependency parsing can be cast as a combinatorial optimization problem with the objective to find the highest-scoring graph, where edge scores are learnt from data. Several of the decoding algorithms that have been applied to this task employ structural restrictions on candidate solutions, such as the restriction to projective dependency trees in syntactic parsing, or the restriction to noncrossing graphs in semantic parsing. In this paper we study the interplay between structural restrictions and a common loss function in neural dependency parsing, the structural hingeloss. We show how structural constraints can make networks trained under this loss function diverge and propose a modified loss function that solves this problem. Our experimental evaluation shows that the modified loss function can yield improved parsing accuracy, compared to the unmodified baseline.

Download Full-text

Going to the Roots of Dependency Parsing

Computational Linguistics ◽

10.1162/coli_a_00132 ◽

2013 ◽

Vol 39 (1) ◽

pp. 5-13 ◽

Cited By ~ 9

Author(s):

Miguel Ballesteros ◽

Joakim Nivre

Keyword(s):

Data Driven ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Empirical Results ◽

Root Node ◽

Dependency Trees

Dependency trees used in syntactic parsing often include a root node representing a dummy word prefixed or suffixed to the sentence, a device that is generally considered a mere technical convenience and is tacitly assumed to have no impact on empirical results. We demonstrate that this assumption is false and that the accuracy of data-driven dependency parsers can in fact be sensitive to the existence and placement of the dummy root node. In particular, we show that a greedy, left-to-right, arc-eager transition-based parser consistently performs worse when the dummy root node is placed at the beginning of the sentence (following the current convention in data-driven dependency parsing) than when it is placed at the end or omitted completely. Control experiments with an arc-standard transition-based parser and an arc-factored graphbased parser reveal no consistent preferences but nevertheless exhibit considerable variation in results depending on root placement. We conclude that the treatment of dummy root nodes in data-driven dependency parsing is an underestimated source of variation in experiments andmay also be a parameter worth tuning for some parsers.

Download Full-text

Stochastic Loss Function

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5925 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4884-4891

Author(s):

Qingliang Liu ◽

Jinmei Lai

Keyword(s):

Neural Networks ◽

Loss Function ◽

Real World ◽

Optimization Problem ◽

Deep Neural Networks ◽

Back Propagation ◽

Loss Functions ◽

Joint Optimization ◽

Neural Machine Translation ◽

Deep Networks

Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.

Download Full-text

A Dependency Perspective on RST Discourse Parsing and Evaluation

Computational Linguistics ◽

10.1162/coli_a_00314 ◽

2018 ◽

Vol 44 (2) ◽

pp. 197-235 ◽

Cited By ~ 6

Author(s):

Mathieu Morey ◽

Philippe Muller ◽

Nicholas Asher

Keyword(s):

Evaluation Framework ◽

Structure Theory ◽

Evaluation Procedure ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Unified Framework ◽

Discourse Parsing ◽

Dependency Trees ◽

Viable Approach ◽

Dependency Parser

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

Download Full-text

Valence electron spectroscopy of inhomogeneous media

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100130389 ◽

1992 ◽

Vol 50 (2) ◽

pp. 1150-1151

Author(s):

A. Howie ◽

D.W. McComb

Keyword(s):

Specific Gravity ◽

Loss Function ◽

Electron Spectroscopy ◽

Valence Electron ◽

Peak Height ◽

Loss Functions ◽

Loss Peak ◽

High Energies ◽

Valence Electron Density ◽

Electron Microscopist

The bulk loss function Im(-l/ε (ω)), a well established tool for the interpretation of valence loss spectra, is being progressively adapted to the wide variety of inhomogeneous samples of interest to the electron microscopist. Proportionality between n, the local valence electron density, and ε-1 (Sellmeyer's equation) has sometimes been assumed but may not be valid even in homogeneous samples. Figs. 1 and 2 show the experimentally measured bulk loss functions for three pure silicates of different specific gravity ρ - quartz (ρ = 2.66), coesite (ρ = 2.93) and a zeolite (ρ = 1.79). Clearly, despite the substantial differences in density, the shift of the prominent loss peak is very small and far less than that predicted by scaling e for quartz with Sellmeyer's equation or even the somewhat smaller shift given by the Clausius-Mossotti (CM) relation which assumes proportionality between n (or ρ in this case) and (ε - 1)/(ε + 2). Both theories overestimate the rise in the peak height for coesite and underestimate the increase at high energies.

Download Full-text

Solving a Combinatorial Optimization Problem with Feedforward Neural Networks

1993 American Control Conference ◽

10.23919/acc.1993.4793106 ◽

1993 ◽

Author(s):

Xianzhong Cui ◽

Kang G. Shin

Keyword(s):

Neural Networks ◽

Combinatorial Optimization ◽

Optimization Problem ◽

Feedforward Neural Networks ◽

Combinatorial Optimization Problem

Download Full-text

Inverse version of the kth maximization combinatorial optimization problem

Can Tho University Journal of Science ◽

10.22144/ctu.jen.2018.027 ◽

2018 ◽

Vol 54(5) ◽

pp. 72

Author(s):

Quoc, H.D. ◽

Kien, N.T. ◽

Thuy, T.T.C. ◽

Hai, L.H. ◽

Thanh, V.N.

Keyword(s):

Combinatorial Optimization ◽

Optimization Problem ◽

Combinatorial Optimization Problem

Download Full-text

Study on Radar Echo-Filling in an Occlusion Area by a Deep Learning Algorithm

Remote Sensing ◽

10.3390/rs13091779 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1779

Author(s):

Xiaoyan Yin ◽

Zhiqun Hu ◽

Jiafeng Zheng ◽

Boyong Li ◽

Yuanyuan Zuo

Keyword(s):

Deep Learning ◽

Loss Function ◽

Learning Algorithm ◽

Weather Radar ◽

Loss Functions ◽

Training Dataset ◽

Echo Intensity ◽

Common Mean ◽

Deep Learning Algorithm ◽

Radar Beam

Radar beam blockage is an important error source that affects the quality of weather radar data. An echo-filling network (EFnet) is proposed based on a deep learning algorithm to correct the echo intensity under the occlusion area in the Nanjing S-band new-generation weather radar (CINRAD/SA). The training dataset is constructed by the labels, which are the echo intensity at the 0.5° elevation in the unblocked area, and by the input features, which are the intensity in the cube including multiple elevations and gates corresponding to the location of bottom labels. Two loss functions are applied to compile the network: one is the common mean square error (MSE), and the other is a self-defined loss function that increases the weight of strong echoes. Considering that the radar beam broadens with distance and height, the 0.5° elevation scan is divided into six range bands every 25 km to train different models. The models are evaluated by three indicators: explained variance (EVar), mean absolute error (MAE), and correlation coefficient (CC). Two cases are demonstrated to compare the effect of the echo-filling model by different loss functions. The results suggest that EFnet can effectively correct the echo reflectivity and improve the data quality in the occlusion area, and there are better results for strong echoes when the self-defined loss function is used.

Download Full-text

A Densely Connected Network Based on U-Net for Medical Image Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3446618 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-14

Author(s):

Zhenzhen Yang ◽

Pengfei Xu ◽

Yongpeng Yang ◽

Bing-Kun Bao

Keyword(s):

Feature Extraction ◽

Image Segmentation ◽

Loss Function ◽

Network Architecture ◽

Medical Image ◽

Medical Image Segmentation ◽

Cross Entropy ◽

Loss Functions ◽

Feature Maps ◽

Different Levels

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.

Download Full-text

Quit When You Can: Efficient Evaluation of Ensembles by Optimized Ordering

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3451209 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-20

Author(s):

Serena Wang ◽

Maya Gupta ◽

Seungil You

Keyword(s):

Optimization Problem ◽

Optimal Solution ◽

Combinatorial Optimization Problem ◽

Joint Optimization ◽

Early Stopping ◽

Time Approximation ◽

Approximation Bound ◽

Evaluation Time ◽

Speed Up ◽

Real World Problems

Given a classifier ensemble and a dataset, many examples may be confidently and accurately classified after only a subset of the base models in the ensemble is evaluated. Dynamically deciding to classify early can reduce both mean latency and CPU without harming the accuracy of the original ensemble. To achieve such gains, we propose jointly optimizing the evaluation order of the base models and early-stopping thresholds. Our proposed objective is a combinatorial optimization problem, but we provide a greedy algorithm that achieves a 4-approximation of the optimal solution under certain assumptions, which is also the best achievable polynomial-time approximation bound. Experiments on benchmark and real-world problems show that the proposed Quit When You Can (QWYC) algorithm can speed up average evaluation time by 1.8–2.7 times on even jointly trained ensembles, which are more difficult to speed up than independently or sequentially trained ensembles. QWYC’s joint optimization of ordering and thresholds also performed better in experiments than previous fixed orderings, including gradient boosted trees’ ordering.

Download Full-text

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3424247 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-12

Author(s):

Shumin Shi ◽

Dan Luo ◽

Xing Wu ◽

Congjun Long ◽

Heyan Huang

Keyword(s):

Language Processing ◽

Manual Annotation ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Low Resource ◽

Resource Setting ◽

Dependency Tree ◽

Low Resource Setting ◽

Novel Method ◽

Multi Level

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

Download Full-text