Experiments on POS tagging and data driven dependency parsing for Telugu language

Author(s):  
Mayana Humera Khanam ◽  
Palli Suryachandra ◽  
Kv Madhumurthy
2013 ◽  
Vol 1 ◽  
pp. 301-314 ◽  
Author(s):  
Weiwei Sun ◽  
Xiaojun Wan

We present a comparative study of transition-, graph- and PCFG-based models aimed at illuminating more precisely the likely contribution of CFGs in improving Chinese dependency parsing accuracy, especially by combining heterogeneous models. Inspired by the impact of a constituency grammar on dependency parsing, we propose several strategies to acquire pseudo CFGs only from dependency annotations. Compared to linguistic grammars learned from rich phrase-structure treebanks, well designed pseudo grammars achieve similar parsing accuracy and have equivalent contributions to parser ensemble. Moreover, pseudo grammars increase the diversity of base models; therefore, together with all other models, further improve system combination. Based on automatic POS tagging, our final model achieves a UAS of 87.23%, resulting in a significant improvement of the state of the art.


2015 ◽  
Author(s):  
Yuan Zhang ◽  
Chengtao Li ◽  
Regina Barzilay ◽  
Kareem Darwish

2011 ◽  
Vol 37 (1) ◽  
pp. 197-230 ◽  
Author(s):  
Ryan McDonald ◽  
Joakim Nivre

There has been a rapid increase in the volume of research on data-driven dependency parsers in the past five years. This increase has been driven by the availability of treebanks in a wide variety of languages—due in large part to the CoNLL shared tasks—as well as the straightforward mechanisms by which dependency theories of syntax can encode complex phenomena in free word order languages. In this article, our aim is to take a step back and analyze the progress that has been made through an analysis of the two predominant paradigms for data-driven dependency parsing, which are often called graph-based and transition-based dependency parsing. Our analysis covers both theoretical and empirical aspects and sheds light on the kinds of errors each type of parser makes and how they relate to theoretical expectations. Using these observations, we present an integrated system based on a stacking learning framework and show that such a system can learn to overcome the shortcomings of each non-integrated system.


2013 ◽  
Vol 39 (1) ◽  
pp. 5-13 ◽  
Author(s):  
Miguel Ballesteros ◽  
Joakim Nivre

Dependency trees used in syntactic parsing often include a root node representing a dummy word prefixed or suffixed to the sentence, a device that is generally considered a mere technical convenience and is tacitly assumed to have no impact on empirical results. We demonstrate that this assumption is false and that the accuracy of data-driven dependency parsers can in fact be sensitive to the existence and placement of the dummy root node. In particular, we show that a greedy, left-to-right, arc-eager transition-based parser consistently performs worse when the dummy root node is placed at the beginning of the sentence (following the current convention in data-driven dependency parsing) than when it is placed at the end or omitted completely. Control experiments with an arc-standard transition-based parser and an arc-factored graphbased parser reveal no consistent preferences but nevertheless exhibit considerable variation in results depending on root placement. We conclude that the treatment of dummy root nodes in data-driven dependency parsing is an underestimated source of variation in experiments andmay also be a parameter worth tuning for some parsers.


2008 ◽  
Vol 34 (3) ◽  
pp. 357-389 ◽  
Author(s):  
Gülşen Eryiğit ◽  
Joakim Nivre ◽  
Kemal Oflazer

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, pose interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative, free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical units called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We test our claim on two different parsing methods, one based on a probabilistic model with beam search and the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of the parsing method. We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank.


2017 ◽  
Author(s):  
Atreyee Mukherjee ◽  
Sandra Kübler ◽  
Matthias Scheutz

Sign in / Sign up

Export Citation Format

Share Document