scholarly journals A Cloud-Hosted MapReduce Architecture for Syntactic Parsing

Author(s):  
Yonas Woldemariam ◽  
Stefan Pletschacher ◽  
Christian Clausner ◽  
Julian Bass
Keyword(s):  
Author(s):  
Shumin Shi ◽  
Dan Luo ◽  
Xing Wu ◽  
Congjun Long ◽  
Heyan Huang

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.


2005 ◽  
Vol 31 (1) ◽  
pp. 71-106 ◽  
Author(s):  
Martha Palmer ◽  
Daniel Gildea ◽  
Paul Kingsbury

The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent coreference, quantification, and many other higher-order phenomena, but also broad, in that it covers every instance of every verb in the corpus and allows representative statistics to be calculated. We discuss the criteria used to define the sets of semantic roles used in the annotation process and to analyze the frequency of syntactic/semantic alternations in the corpus. We describe an automatic system for semantic role tagging trained on the corpus and discuss the effect on its performance of various types of information, including a comparison of full syntactic parsing with a flat representation and the contribution of the empty “trace” categories of the treebank.


2017 ◽  
Vol 52 (3) ◽  
pp. 2081-2097 ◽  
Author(s):  
Carlos Gómez-Rodríguez ◽  
Iago Alonso-Alonso ◽  
David Vilares

2012 ◽  
Vol 9 (3) ◽  
pp. 1231-1247 ◽  
Author(s):  
Mihaela Colhon

In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically resulted from syntactic parsing. The method reuses and adjusts existing tools and algorithms for cross-lingual transfer of syntactic constituents and syntactic trees alignment.


Author(s):  
Alexander Gelbukh ◽  
José A. Martínez F. ◽  
Andres Verastegui ◽  
Alberto Ochoa

In this chapter, an exhaustive parser is presented. The parser was developed to be used in a natural language interface to databases (NLIDB) project. This chapter includes a brief description of state-of-the-art NLIDBs, including a description of the methods used and the performance of some interfaces. Some of the general problems in natural language interfaces to databases are also explained. The exhaustive parser was developed, aiming at improving the overall performance of the interface; therefore, the interface is also briefly described. This chapter also presents the drawbacks discovered during the experimental tests of the parser, which show that it is unsuitable for improving the NLIDB performance.


English Today ◽  
2015 ◽  
Vol 31 (3) ◽  
pp. 46-58 ◽  
Author(s):  
Sergio Torres-Martínez

The central issue of the present article is the analysis of phrasal verbs (hereafter termed multiword verbs [MWVs]) from the perspective of construction grammars (Goldberg, 1995; Suttle and Goldberg, 2011). As is well known, English MWVs present special challenges to L2 learners due, among other things, to the shapelessness of their conceptual components and the ensuing impossibility to arrive at equivalent word-meaning correspondences (mappings) in the learners’ mother language (see Gillette et al., 1999). This brings us to the first theoretical claim of this paper – namely, that MWVs (also termed phrasal verbs, verb-particle collocations, verb-particle combinations etc.) are lexical chunks that can be retrieved by speakers either as wholes, without special recourse to syntactic parsing, or as verb-particle semantic associations (Cappelle et al., 2010). This idea is combined with the notion that MWVs inherit their syntax-semantics from prototypical Argument Structure Constructions (Goldberg, 2013a) within Verb Argument Constructions (VACs) frames. VACs are thus associated with prototype verbs like ‘go‘, ‘come’, ‘get’, ‘put’, etc., to project their meaning upon less-frequent verbs occupying a V-slot frame (a verbal position). It follows that MWVs function as hyponyms that express specific semantic nuances not available in prototype verbs. For example, in the sentence ‘Arya scooped up a rock and hurled it at Joffrey's head’ (George R. R. Martin, A Game of Thrones [1996]), the verb scoop up suggests a caused motion usually conveyed by the verb LIFT, i.e. the prototype of the simple transitive Verb Argument Construction. From this vantage, it is suggested that a way to activate the weak verb-object interface is through its assignation to specific prototypes bootstrapping (providing an initial basis for) both the conceptualisation of the MWVs and their potential mapping to specific words (which I term inherited surface forms).


Sign in / Sign up

Export Citation Format

Share Document