Semi-Supervised Semantic Role Labeling via Structural Alignment

Large-scale annotated corpora are a prerequisite to developing high-performance semantic role labeling systems. Unfortunately, such corpora are expensive to produce, limited in size, and may not be representative. Our work aims to reduce the annotation effort involved in creating resources for semantic role labeling via semi-supervised learning. The key idea of our approach is to find novel instances for classifier training based on their similarity to manually labeled seed instances. The underlying assumption is that sentences that are similar in their lexical material and syntactic structure are likely to share a frame semantic analysis. We formalize the detection of similar sentences and the projection of role annotations as a graph alignment problem, which we solve exactly using integer linear programming. Experimental results on semantic role labeling show that the automatic annotations produced by our method improve performance over using hand-labeled instances alone.

Download Full-text

Text Rewriting Improves Semantic Role Labeling

Journal of Artificial Intelligence Research ◽

10.1613/jair.4431 ◽

2014 ◽

Vol 51 ◽

pp. 133-164 ◽

Cited By ~ 1

Author(s):

K. Woodsend ◽

M. Lapata

Keyword(s):

Gold Standard ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Semantic Role ◽

Semantic Role Labeling ◽

Comparable Corpora ◽

Rewrite Rules ◽

Model Training

Large-scale annotated corpora are a prerequisite to developing high-performance NLP systems. Such corpora are expensive to produce, limited in size, often demanding linguistic expertise. In this paper we use text rewriting as a means of increasing the amount of labeled data available for model training. Our method uses automatically extracted rewrite rules from comparable corpora and bitexts to generate multiple versions of sentences annotated with gold standard labels. We apply this idea to semantic role labeling and show that a model trained on rewritten data outperforms the state of the art on the CoNLL-2009 benchmark dataset.

Download Full-text

Text Rewriting Improves Semantic Role Labeling (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/729 ◽

2017 ◽

Author(s):

Kristian Woodsend ◽

Mirella Lapata

Keyword(s):

Gold Standard ◽

High Performance ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Semantic Role ◽

Semantic Role Labeling ◽

Comparable Corpora ◽

Rewrite Rules ◽

Model Training

Download Full-text

Syntax Role for Neural Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli_a_00408 ◽

2021 ◽

pp. 1-48

Author(s):

Zuchao Li ◽

Hai Zhao ◽

Shexia He ◽

Jiaxun Cai

Keyword(s):

Argument Structure ◽

Large Scale ◽

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Empirical Survey ◽

Learning Framework ◽

Syntactic Information ◽

Feature Based ◽

Predicate Argument Structure

Abstract Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruningbased and syntax feature-based. Experiments are conducted on the CoNLL-2005, 2009, and 2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.

Download Full-text

Syntax-Aware Neural Semantic Role Labeling

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017305 ◽

2019 ◽

Vol 33 ◽

pp. 7305-7313 ◽

Cited By ~ 1

Author(s):

Qingrong Xia ◽

Zhenghua Li ◽

Min Zhang ◽

Meishan Zhang ◽

Guohong Fu ◽

...

Keyword(s):

Close Correlation ◽

Semantic Role ◽

Semantic Role Labeling ◽

Semantic Parsing ◽

Single Model ◽

Improve Performance ◽

Syntactic Structures ◽

Word Sequence ◽

Input Sentence ◽

Feature Based

Semantic role labeling (SRL), also known as shallow semantic parsing, is an important yet challenging task in NLP. Motivated by the close correlation between syntactic and semantic structures, traditional discrete-feature-based SRL approaches make heavy use of syntactic features. In contrast, deep-neural-network-based approaches usually encode the input sentence as a word sequence without considering the syntactic structures. In this work, we investigate several previous approaches for encoding syntactic trees, and make a thorough study on whether extra syntax-aware representations are beneficial for neural SRL models. Experiments on the benchmark CoNLL-2005 dataset show that syntax-aware SRL approaches can effectively improve performance over a strong baseline with external word representations from ELMo. With the extra syntax-aware representations, our approaches achieve new state-of-the-art 85.6 F1 (single model) and 86.6 F1 (ensemble) on the test data, outperforming the corresponding strong baselines with ELMo by 0.8 and 1.0, respectively. Detailed error analysis are conducted to gain more insights on the investigated approaches.

Download Full-text

A Scalable Distributed Syntactic, Semantic, and Lexical Language Model

Computational Linguistics ◽

10.1162/coli_a_00107 ◽

2012 ◽

Vol 38 (3) ◽

pp. 631-671 ◽

Cited By ~ 2

Author(s):

Ming Tan ◽

Wenli Zhou ◽

Lei Zheng ◽

Shaojun Wang

Keyword(s):

Em Algorithm ◽

Large Scale ◽

Semantic Analysis ◽

Syntactic Structure ◽

Language Model ◽

Semantic Content ◽

Probabilistic Latent Semantic Analysis ◽

Translation System ◽

Lexical Information ◽

N Gram

This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an n-gram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm and a follow-up EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the Bleu score and “readability” of translations when applied to the task of re-ranking the N-best list from a state-of-the-art parsing-based machine translation system.

Download Full-text

Labeling Chinese Predicates with Semantic Roles

Computational Linguistics ◽

10.1162/coli.2008.34.2.225 ◽

2008 ◽

Vol 34 (2) ◽

pp. 225-255 ◽

Cited By ~ 48

Author(s):

Nianwen Xue

Keyword(s):

Gold Standard ◽

Argument Structure ◽

High Performance ◽

State Of The Art ◽

Semantic Role ◽

Semantic Role Labeling ◽

Pos Tagging ◽

Fully Automatic ◽

Syntactic Annotation ◽

Predicate Argument Structure

In this article we report work on Chinese semantic role labeling, taking advantage of two recently completed corpora, the Chinese PropBank, a semantically annotated corpus of Chinese verbs, and the Chinese Nombank, a companion corpus that annotates the predicate-argument structure of nominalized predicates. Because the semantic role labels are assigned to the constituents in a parse tree, we first report experiments in which semantic role labels are automatically assigned to hand-crafted parses in the Chinese Treebank. This gives us a measure of the extent to which semantic role labels can be bootstrapped from the syntactic annotation provided in the treebank. We then report experiments using automatic parses with decreasing levels of human annotation in the input to the syntactic parser: parses that use gold-standard segmentation and POS-tagging, parses that use only gold-standard segmentation, and fully automatic parses. These experiments gauge how successful semantic role labeling for Chinese can be in more realistic situations. Our results show that when hand-crafted parses are used, semantic role labeling accuracy for Chinese is comparable to what has been reported for the state-of-the-art English semantic role labeling systems trained and tested on the English PropBank, even though the Chinese PropBank is significantly smaller in size. When an automatic parser is used, however, the accuracy of our system is significantly lower than the English state of the art. This indicates that an improvement in Chinese parsing is critical to high-performance semantic role labeling for Chinese.

Download Full-text

Tree Kernels for Semantic Role Labeling

Computational Linguistics ◽

10.1162/coli.2008.34.2.193 ◽

2008 ◽

Vol 34 (2) ◽

pp. 193-224 ◽

Cited By ~ 58

Author(s):

Alessandro Moschitti ◽

Daniele Pighin ◽

Roberto Basili

Keyword(s):

Language Processing ◽

Large Scale ◽

Kernel Functions ◽

Feature Representation ◽

Training Data ◽

Support Vector ◽

Feature Engineering ◽

Semantic Role ◽

Learning Approaches ◽

Semantic Role Labeling

The availability of large scale data sets of manually annotated predicate-argument structures has recently favored the use of machine learning approaches to the design of automated semantic role labeling (SRL) systems. The main research in this area relates to the design choices for feature representation and for effective decompositions of the task in different learning models. Regarding the former choice, structural properties of full syntactic parses are largely employed as they represent ways to encode different principles suggested by the linking theory between syntax and semantics. The latter choice relates to several learning schemes over global views of the parses. For example, re-ranking stages operating over alternative predicate-argument sequences of the same sentence have shown to be very effective. In this article, we propose several kernel functions to model parse tree properties in kernel-based machines, for example, perceptrons or support vector machines. In particular, we define different kinds of tree kernels as general approaches to feature engineering in SRL. Moreover, we extensively experiment with such kernels to investigate their contribution to individual stages of an SRL architecture both in isolation and in combination with other traditional manually coded features. The results for boundary recognition, classification, and re-ranking stages provide systematic evidence about the significant impact of tree kernels on the overall accuracy, especially when the amount of training data is small. As a conclusive result, tree kernels allow for a general and easily portable feature engineering method which is applicable to a large family of natural language processing tasks.

Download Full-text