scholarly journals Labeling Chinese Predicates with Semantic Roles

2008 ◽  
Vol 34 (2) ◽  
pp. 225-255 ◽  
Author(s):  
Nianwen Xue

In this article we report work on Chinese semantic role labeling, taking advantage of two recently completed corpora, the Chinese PropBank, a semantically annotated corpus of Chinese verbs, and the Chinese Nombank, a companion corpus that annotates the predicate-argument structure of nominalized predicates. Because the semantic role labels are assigned to the constituents in a parse tree, we first report experiments in which semantic role labels are automatically assigned to hand-crafted parses in the Chinese Treebank. This gives us a measure of the extent to which semantic role labels can be bootstrapped from the syntactic annotation provided in the treebank. We then report experiments using automatic parses with decreasing levels of human annotation in the input to the syntactic parser: parses that use gold-standard segmentation and POS-tagging, parses that use only gold-standard segmentation, and fully automatic parses. These experiments gauge how successful semantic role labeling for Chinese can be in more realistic situations. Our results show that when hand-crafted parses are used, semantic role labeling accuracy for Chinese is comparable to what has been reported for the state-of-the-art English semantic role labeling systems trained and tested on the English PropBank, even though the Chinese PropBank is significantly smaller in size. When an automatic parser is used, however, the accuracy of our system is significantly lower than the English state of the art. This indicates that an improvement in Chinese parsing is critical to high-performance semantic role labeling for Chinese.

Author(s):  
Kashif Munir ◽  
Hai Zhao ◽  
Zuchao Li

The task of semantic role labeling ( SRL ) is dedicated to finding the predicate-argument structure. Previous works on SRL are mostly supervised and do not consider the difficulty in labeling each example which can be very expensive and time-consuming. In this article, we present the first neural unsupervised model for SRL. To decompose the task as two argument related subtasks, identification and clustering, we propose a pipeline that correspondingly consists of two neural modules. First, we train a neural model on two syntax-aware statistically developed rules. The neural model gets the relevance signal for each token in a sentence, to feed into a BiLSTM, and then an adversarial layer for noise-adding and classifying simultaneously, thus enabling the model to learn the semantic structure of a sentence. Then we propose another neural model for argument role clustering, which is done through clustering the learned argument embeddings biased toward their dependency relations. Experiments on the CoNLL-2009 English dataset demonstrate that our model outperforms the previous state-of-the-art baseline in terms of non-neural models for argument identification and classification.


2014 ◽  
Vol 51 ◽  
pp. 133-164 ◽  
Author(s):  
K. Woodsend ◽  
M. Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.


Author(s):  
Kristian Woodsend ◽  
Mirella Lapata

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.


2021 ◽  
pp. 1-48
Author(s):  
Zuchao Li ◽  
Hai Zhao ◽  
Shexia He ◽  
Jiaxun Cai

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.


2015 ◽  
Vol 3 ◽  
pp. 449-460 ◽  
Author(s):  
Michael Roth ◽  
Mirella Lapata

Frame semantic representations have been useful in several applications ranging from text-to-scene generation, to question answering and social network analysis. Predicting such representations from raw text is, however, a challenging task and corresponding models are typically only trained on a small set of sentence-level annotations. In this paper, we present a semantic role labeling system that takes into account sentence and discourse context. We introduce several new features which we motivate based on linguistic insights and experimentally demonstrate that they lead to significant improvements over the current state-of-the-art in FrameNet-based semantic role labeling.


2008 ◽  
Vol 34 (2) ◽  
pp. 161-191 ◽  
Author(s):  
Kristina Toutanova ◽  
Aria Haghighi ◽  
Christopher D. Manning

We present a model for semantic role labeling that effectively captures the linguistic intuition that a semantic argument frame is a joint structure, with strong dependencies among the arguments. We show how to incorporate these strong dependencies in a statistical joint model with a rich set of features over multiple argument phrases. The proposed model substantially outperforms a similar state-of-the-art local model that does not include dependencies among different arguments. We evaluate the gains from incorporating this joint information on the Propbank corpus, when using correct syntactic parse trees as input, and when using automatically derived parse trees. The gains amount to 24.1% error reduction on all arguments and 36.8% on core arguments for gold-standard parse trees on Propbank. For automatic parse trees, the error reductions are 8.3% and 10.3% on all and core arguments, respectively. We also present results on the CoNLL 2005 shared task data set. Additionally, we explore considering multiple syntactic analyses to cope with parser noise and uncertainty.


2009 ◽  
Vol 03 (01) ◽  
pp. 131-149
Author(s):  
YULAN YAN ◽  
YUTAKA MATSUO ◽  
MITSURU ISHIZUKA

Recently, Semantic Role Labeling (SRL) systems have been used to examine a semantic predicate-argument structure for natural occurring texts. Facing the challenge of extracting a universal set of semantic or thematic relations covering various types of semantic relationships between entities, based on the Concept Description Language for Natural Language (CDL.nl) which defines a set of semantic relations to describe the concept structure of text, we develop a shallow semantic parser to add a new layer of semantic annotation of natural language sentences as an extension of SRL. The parsing task is a relation extraction process with two steps: relation detection and relation classification. Firstly, based on dependency analysis, a rule-based algorithm is presented to detect all entity pairs between each pair for which there exists a relationship; secondly, we use a kernel-based method to assign CDL.nl relations to detected entity pairs by leveraging diverse features. Preliminary evaluation on a manual dataset shows that CDL.nl relations can be extracted with good performance.


2009 ◽  
Vol 15 (1) ◽  
pp. 143-172 ◽  
Author(s):  
NIANWEN XUE ◽  
MARTHA PALMER

AbstractWe report work on adding semantic role labels to the Chinese Treebank, a corpus already annotated with phrase structures. The work involves locating all verbs and their nominalizations in the corpus, and semi-automatically adding semantic role labels to their arguments, which are constituents in a parse tree. Although the same procedure is followed, different issues arise in the annotation of verbs and nominalized predicates. For verbs, identifying their arguments is generally straightforward given their syntactic structure in the Chinese Treebank as they tend to occupy well-defined syntactic positions. Our discussion focuses on the syntactic variations in the realization of the arguments as well as our approach to annotating dislocated and discontinuous arguments. In comparison, identifying the arguments for nominalized predicates is more challenging and we discuss criteria and procedures for distinguishing arguments from non-arguments. In particular we focus on the role of support verbs as well as the relevance of event/result distinctions in the annotation of the predicate-argument structure of nominalized predicates. We also present our approach to taking advantage of the syntactic structure in the Chinese Treebank to bootstrap the predicate-argument structure annotation of verbs. Finally, we discuss the creation of a lexical database of frame files and its role in guiding predicate-argument annotation. Procedures for ensuring annotation consistency and inter-annotator agreement evaluation results are also presented.


2019 ◽  
Vol 55 (2) ◽  
pp. 305-337 ◽  
Author(s):  
Alina Wróblewska ◽  
Piotr Rybak

Abstract The predicate-argument structure transparently encoded in dependency-based syntactic representations supports machine translation, question answering, information extraction, etc. The quality of dependency parsing is therefore a crucial issue in natural language processing. In the current paper we discuss the fundamental ideas of the dependency theory and provide an overview of selected dependency-based resources for Polish. Furthermore, we present some state-of-the-art dependency parsing systems whose models can be estimated on correctly annotated data. In the experimental part, we provide an in-depth evaluation of these systems on Polish data. Our results show that graph-based parsers, even those without any neural component, are better suited for Polish than transition-based parsing systems.


2013 ◽  
Vol 39 (3) ◽  
pp. 631-663 ◽  
Author(s):  
Beñat Zapirain ◽  
Eneko Agirre ◽  
Lluís Màrquez ◽  
Mihai Surdeanu

This paper focuses on a well-known open issue in Semantic Role Classification (SRC) research: the limited influence and sparseness of lexical features. We mitigate this problem using models that integrate automatically learned selectional preferences (SP). We explore a range of models based on WordNet and distributional-similarity SPs. Furthermore, we demonstrate that the SRC task is better modeled by SP models centered on both verbs and prepositions, rather than verbs alone. Our experiments with SP-based models in isolation indicate that they outperform a lexical baseline with 20 F1 points in domain and almost 40 F1 points out of domain. Furthermore, we show that a state-of-the-art SRC system extended with features based on selectional preferences performs significantly better, both in domain (17% error reduction) and out of domain (13% error reduction). Finally, we show that in an end-to-end semantic role labeling system we obtain small but statistically significant improvements, even though our modified SRC model affects only approximately 4% of the argument candidates. Our post hoc error analysis indicates that the SP-based features help mostly in situations where syntactic information is either incorrect or insufficient to disambiguate the correct role.


Sign in / Sign up

Export Citation Format

Share Document