Overview of the NLPCC 2019 Shared Task: Cross-Domain Dependency Parsing

In recent years, the research on dependency parsing focuses on improving the accuracy of the domain-specific (in-domain) test datasets and has made remarkable progress. However, there are innumerable scenarios in the real world that are not covered by the dataset, namely, the out-of-domain dataset. As a result, parsers that perform well on the in-domain data usually suffer from significant performance degradation on the out-of-domain data. Therefore, to adapt the existing in-domain parsers with high performance to a new domain scenario, cross-domain transfer learning methods are essential to solve the domain problem in parsing. This paper examines two scenarios for cross-domain transfer learning: semi-supervised and unsupervised cross-domain transfer learning. Specifically, we adopt a pre-trained language model BERT for training on the source domain (in-domain) data at the subword level and introduce self-training methods varied from tri-training for these two scenarios. The evaluation results on the NLPCC-2019 shared task and universal dependency parsing task indicate the effectiveness of the adopted approaches on cross-domain transfer learning and show the potential of self-learning to cross-lingual transfer learning.

Download Full-text

CoNLL-X shared task on multilingual dependency parsing

10.3115/1596276.1596305 ◽

2006 ◽

Cited By ~ 147

Author(s):

Sabine Buchholz ◽

Erwin Marsi

Keyword(s):

Dependency Parsing ◽

Shared Task

Download Full-text

Domain Adaptation for Syntactic and Semantic Dependency Parsing Using Deep Belief Networks

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00138 ◽

2015 ◽

Vol 3 ◽

pp. 271-282 ◽

Cited By ~ 6

Author(s):

Haitong Yang ◽

Tao Zhuang ◽

Chengqing Zong

Keyword(s):

Test Data ◽

Domain Adaptation ◽

Feature Space ◽

Feature Representation ◽

Dependency Parsing ◽

Shared Task ◽

Target Domain ◽

Current Systems ◽

Semantic Dependency ◽

Very High

In current systems for syntactic and semantic dependency parsing, people usually define a very high-dimensional feature space to achieve good performance. But these systems often suffer severe performance drops on out-of-domain test data due to the diversity of features of different domains. This paper focuses on how to relieve this domain adaptation problem with the help of unlabeled target domain data. We propose a deep learning method to adapt both syntactic and semantic parsers. With additional unlabeled target domain data, our method can learn a latent feature representation (LFR) that is beneficial to both domains. Experiments on English data in the CoNLL 2009 shared task show that our method largely reduced the performance drop on out-of-domain test data. Moreover, we get a Macro F1 score that is 2.32 points higher than the best system in the CoNLL 2009 shared task in out-of-domain tests.

Download Full-text

Deep Contextualized Self-training for Low Resource Dependency Parsing

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00294 ◽

2019 ◽

Vol 7 ◽

pp. 695-713 ◽

Cited By ~ 1

Author(s):

Guy Rotman ◽

Roi Reichart

Keyword(s):

State Of The Art ◽

Resource Dependency ◽

Training Methods ◽

Dependency Parsing ◽

Training Algorithm ◽

Low Resource ◽

Supervised Training ◽

Cross Domain ◽

Gating Mechanism ◽

Multiple Languages

Neural dependency parsing has proven very effective, achieving state-of-the-art results on numerous domains and languages. Unfortunately, it requires large amounts of labeled data, which is costly and laborious to create. In this paper we propose a self-training algorithm that alleviates this annotation bottleneck by training a parser on its own output. Our Deep Contextualized Self-training (DCST) algorithm utilizes representation models trained on sequence labeling tasks that are derived from the parser’s output when applied to unlabeled data, and integrates these models with the base parser through a gating mechanism. We conduct experiments across multiple languages, both in low resource in-domain and in cross-domain setups, and demonstrate that DCST substantially outperforms traditional self-training as well as recent semi-supervised training methods. 1

Download Full-text

APGN: Adversarial and Parameter Generation Networks for Multi-Source Cross-Domain Dependency Parsing

10.18653/v1/2021.findings-emnlp.149 ◽

2021 ◽

Author(s):

Ying Li ◽

Meishan Zhang ◽

Zhenghua Li ◽

Min Zhang ◽

Zhefeng Wang ◽

...

Keyword(s):

Dependency Parsing ◽

Cross Domain

Download Full-text

A non-DNN Feature Engineering Approach to Dependency Parsing – FBAML at CoNLL 2017 Shared Task

10.18653/v1/k17-3015 ◽

2017 ◽

Author(s):

Xian Qian ◽

Yang Liu

Keyword(s):

Feature Engineering ◽

Dependency Parsing ◽

Shared Task ◽

Engineering Approach

Download Full-text

Dependency parsing of biomedical text with BERT

BMC Bioinformatics ◽

10.1186/s12859-020-03905-8 ◽

2020 ◽

Vol 21 (S23) ◽

Author(s):

Jenna Kanerva ◽

Filip Ginter ◽

Sampo Pyysalo

Keyword(s):

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Text Processing ◽

Syntactic Analysis ◽

Biomedical Text ◽

Dependency Parsing ◽

Shared Task ◽

Fine Tune ◽

Selection Of

Abstract Background: Syntactic analysis, or parsing, is a key task in natural language processing and a required component for many text mining approaches. In recent years, Universal Dependencies (UD) has emerged as the leading formalism for dependency parsing. While a number of recent tasks centering on UD have substantially advanced the state of the art in multilingual parsing, there has been only little study of parsing texts from specialized domains such as biomedicine. Methods: We explore the application of state-of-the-art neural dependency parsing methods to biomedical text using the recently introduced CRAFT-SA shared task dataset. The CRAFT-SA task broadly follows the UD representation and recent UD task conventions, allowing us to fine-tune the UD-compatible Turku Neural Parser and UDify neural parsers to the task. We further evaluate the effect of transfer learning using a broad selection of BERT models, including several models pre-trained specifically for biomedical text processing. Results: We find that recently introduced neural parsing technology is capable of generating highly accurate analyses of biomedical text, substantially improving on the best performance reported in the original CRAFT-SA shared task. We also find that initialization using a deep transfer learning model pre-trained on in-domain texts is key to maximizing the performance of the parsing methods.

Download Full-text