Going to the Roots of Dependency Parsing

Dependency trees used in syntactic parsing often include a root node representing a dummy word prefixed or suffixed to the sentence, a device that is generally considered a mere technical convenience and is tacitly assumed to have no impact on empirical results. We demonstrate that this assumption is false and that the accuracy of data-driven dependency parsers can in fact be sensitive to the existence and placement of the dummy root node. In particular, we show that a greedy, left-to-right, arc-eager transition-based parser consistently performs worse when the dummy root node is placed at the beginning of the sentence (following the current convention in data-driven dependency parsing) than when it is placed at the end or omitted completely. Control experiments with an arc-standard transition-based parser and an arc-factored graphbased parser reveal no consistent preferences but nevertheless exhibit considerable variation in results depending on root placement. We conclude that the treatment of dummy root nodes in data-driven dependency parsing is an underestimated source of variation in experiments andmay also be a parameter worth tuning for some parsers.

Download Full-text

Dependency Parsing of Turkish

Computational Linguistics ◽

10.1162/coli.2008.07-017-r1-06-83 ◽

2008 ◽

Vol 34 (3) ◽

pp. 357-389 ◽

Cited By ~ 38

Author(s):

Gülşen Eryiğit ◽

Joakim Nivre ◽

Kemal Oflazer

Keyword(s):

Morphological Structure ◽

Data Driven ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Beam Search ◽

Lexical Information ◽

Order Language ◽

Word Forms ◽

Syntactic Relations ◽

The Impact

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.

Download Full-text

A Dependency Perspective on RST Discourse Parsing and Evaluation

Computational Linguistics ◽

10.1162/coli_a_00314 ◽

2018 ◽

Vol 44 (2) ◽

pp. 197-235 ◽

Cited By ~ 6

Author(s):

Mathieu Morey ◽

Philippe Muller ◽

Nicholas Asher

Keyword(s):

Evaluation Framework ◽

Structure Theory ◽

Evaluation Procedure ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Unified Framework ◽

Discourse Parsing ◽

Dependency Trees ◽

Viable Approach ◽

Dependency Parser

Computational text-level discourse analysis mostly happens within Rhetorical Structure Theory (RST), whose structures have classically been presented as constituency trees, and relies on data from the RST Discourse Treebank (RST-DT); as a result, the RST discourse parsing community has largely borrowed from the syntactic constituency parsing community. The standard evaluation procedure for RST discourse parsers is thus a simplified variant of PARSEVAL, and most RST discourse parsers use techniques that originated in syntactic constituency parsing. In this article, we isolate a number of conceptual and computational problems with the constituency hypothesis. We then examine the consequences, for the implementation and evaluation of RST discourse parsers, of adopting a dependency perspective on RST structures, a view advocated so far only by a few approaches to discourse parsing. While doing that, we show the importance of the notion of headedness of RST structures. We analyze RST discourse parsing as dependency parsing by adapting to RST a recent proposal in syntactic parsing that relies on head-ordered dependency trees, a representation isomorphic to headed constituency trees. We show how to convert the original trees from the RST corpus, RST-DT, and their binarized versions used by all existing RST parsers to head-ordered dependency trees. We also propose a way to convert existing simple dependency parser output to constituent trees. This allows us to evaluate and to compare approaches from both constituent-based and dependency-based perspectives in a unified framework, using constituency and dependency metrics. We thus propose an evaluation framework to compare extant approaches easily and uniformly, something the RST parsing community has lacked up to now. We can also compare parsers’ predictions to each other across frameworks. This allows us to characterize families of parsing strategies across the different frameworks, in particular with respect to the notion of headedness. Our experiments provide evidence for the conceptual similarities between dependency parsers and shift-reduce constituency parsers, and confirm that dependency parsing constitutes a viable approach to RST discourse parsing.

Download Full-text

The Interplay Between Loss Functions and Structural Constraints in Dependency Parsing

Northern European Journal of Language Technology ◽

10.3384/nejlt.2000-1533.19643 ◽

2019 ◽

Vol 6 ◽

pp. 43-66

Author(s):

Robin Kurtz ◽

Marco Kuhlmann

Keyword(s):

Loss Function ◽

Optimization Problem ◽

Combinatorial Optimization Problem ◽

Loss Functions ◽

Structural Constraints ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Semantic Parsing ◽

Decoding Algorithms ◽

Dependency Trees

Dependency parsing can be cast as a combinatorial optimization problem with the objective to find the highest-scoring graph, where edge scores are learnt from data. Several of the decoding algorithms that have been applied to this task employ structural restrictions on candidate solutions, such as the restriction to projective dependency trees in syntactic parsing, or the restriction to noncrossing graphs in semantic parsing. In this paper we study the interplay between structural restrictions and a common loss function in neural dependency parsing, the structural hingeloss. We show how structural constraints can make networks trained under this loss function diverge and propose a modified loss function that solves this problem. Our experimental evaluation shows that the modified loss function can yield improved parsing accuracy, compared to the unmodified baseline.

Download Full-text

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3424247 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-12

Author(s):

Shumin Shi ◽

Dan Luo ◽

Xing Wu ◽

Congjun Long ◽

Heyan Huang

Keyword(s):

Language Processing ◽

Manual Annotation ◽

Syntactic Parsing ◽

Dependency Parsing ◽

Low Resource ◽

Resource Setting ◽

Dependency Tree ◽

Low Resource Setting ◽

Novel Method ◽

Multi Level

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

Download Full-text

Data-driven, PCFG-based and Pseudo-PCFG-based Models for Chinese Dependency Parsing

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00229 ◽

2013 ◽

Vol 1 ◽

pp. 301-314 ◽

Cited By ~ 2

Author(s):

Weiwei Sun ◽

Xiaojun Wan

Keyword(s):

Comparative Study ◽

State Of The Art ◽

Data Driven ◽

Dependency Parsing ◽

Transition Graph ◽

Final Model ◽

System Combination ◽

Pos Tagging ◽

Heterogeneous Models ◽

The Impact

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.

Download Full-text

On the complexity of non-projective data-driven dependency parsing

Proceedings of the 10th International Conference on Parsing Technologies - IWPT '07 ◽

10.3115/1621410.1621426 ◽

2007 ◽

Cited By ~ 12

Author(s):

Ryan McDonald ◽

Giorgio Satta

Keyword(s):

Data Driven ◽

Dependency Parsing ◽

Projective Data

Download Full-text

Analyzing and Integrating Dependency Parsers

Computational Linguistics ◽

10.1162/coli_a_00039 ◽

2011 ◽

Vol 37 (1) ◽

pp. 197-230 ◽

Cited By ~ 26

Author(s):

Ryan McDonald ◽

Joakim Nivre

Keyword(s):

Word Order ◽

Integrated System ◽

Data Driven ◽

Dependency Parsing ◽

Learning Framework ◽

The Past ◽

Complex Phenomena ◽

Free Word

There has been a rapid increase in the volume of research on data-driven dependency parsers in the past five years. This increase has been driven by the availability of treebanks in a wide variety of languages—due in large part to the CoNLL shared tasks—as well as the straightforward mechanisms by which dependency theories of syntax can encode complex phenomena in free word order languages. In this article, our aim is to take a step back and analyze the progress that has been made through an analysis of the two predominant paradigms for data-driven dependency parsing, which are often called graph-based and transition-based dependency parsing. Our analysis covers both theoretical and empirical aspects and sheds light on the kinds of errors each type of parser makes and how they relate to theoretical expectations. Using these observations, we present an integrated system based on a stacking learning framework and show that such a system can learn to overcome the shortcomings of each non-integrated system.

Download Full-text