scholarly journals Cross-Lingual Dependency Parsing with Unlabeled Auxiliary Languages

2019 ◽  
Author(s):  
Wasi Uddin Ahmad ◽  
Zhisong Zhang ◽  
Xuezhe Ma ◽  
Kai-Wei Chang ◽  
Nanyun Peng
2014 ◽  
Author(s):  
Željko Agić ◽  
Jörg Tiedemann ◽  
Danijela Merkler ◽  
Simon Krek ◽  
Kaja Dobrovoljc ◽  
...  

Author(s):  
Shu Jiang ◽  
Zuchao Li ◽  
Hai Zhao ◽  
Bao-Liang Lu ◽  
Rui Wang

In recent years, the research on dependency parsing focuses on improving the accuracy of the domain-specific (in-domain) test datasets and has made remarkable progress. However, there are innumerable scenarios in the real world that are not covered by the dataset, namely, the out-of-domain dataset. As a result, parsers that perform well on the in-domain data usually suffer from significant performance degradation on the out-of-domain data. Therefore, to adapt the existing in-domain parsers with high performance to a new domain scenario, cross-domain transfer learning methods are essential to solve the domain problem in parsing. This paper examines two scenarios for cross-domain transfer learning: semi-supervised and unsupervised cross-domain transfer learning. Specifically, we adopt a pre-trained language model BERT for training on the source domain (in-domain) data at the subword level and introduce self-training methods varied from tri-training for these two scenarios. The evaluation results on the NLPCC-2019 shared task and universal dependency parsing task indicate the effectiveness of the adopted approaches on cross-domain transfer learning and show the potential of self-learning to cross-lingual transfer learning.


2019 ◽  
Vol 7 ◽  
pp. 643-659
Author(s):  
Amichay Doitch ◽  
Ram Yazdi ◽  
Tamir Hazan ◽  
Roi Reichart

The best solution of structured prediction models in NLP is often inaccurate because of limited expressive power of the model or to non-exact parameter estimation. One way to mitigate this problem is sampling candidate solutions from the model’s solution space, reasoning that effective exploration of this space should yield high-quality solutions. Unfortunately, sampling is often computationally hard and many works hence back-off to sub-optimal strategies, such as extraction of the best scoring solutions of the model, which are not as diverse as sampled solutions. In this paper we propose a perturbation-based approach where sampling from a probabilistic model is computationally efficient. We present a learning algorithm for the variance of the perturbations, and empirically demonstrate its importance. Moreover, while finding the argmax in our model is intractable, we propose an efficient and effective approximation. We apply our framework to cross-lingual dependency parsing across 72 corpora from 42 languages and to lightly supervised dependency parsing across 13 corpora from 12 languages, and demonstrate strong results in terms of both the quality of the entire solution list and of the final solution. 1


2019 ◽  
Author(s):  
Meishan Zhang ◽  
Yue Zhang ◽  
Guohong Fu

2019 ◽  
Author(s):  
Wasi Ahmad ◽  
Zhisong Zhang ◽  
Xuezhe Ma ◽  
Eduard Hovy ◽  
Kai-Wei Chang ◽  
...  

2018 ◽  
Author(s):  
Niko Partanen ◽  
Kyungtae Lim ◽  
Michael Rießler ◽  
Thierry Poibeau

2015 ◽  
Author(s):  
Jiang Guo ◽  
Wanxiang Che ◽  
David Yarowsky ◽  
Haifeng Wang ◽  
Ting Liu

2016 ◽  
Vol 55 ◽  
pp. 209-248 ◽  
Author(s):  
Jörg Tiedemann ◽  
Zeljko Agić

How do we parse the languages for which no treebanks are available? This contribution addresses the cross-lingual viewpoint on statistical dependency parsing, in which we attempt to make use of resource-rich source language treebanks to build and adapt models for the under-resourced target languages. We outline the benefits, and indicate the drawbacks of the current major approaches. We emphasize synthetic treebanking: the automatic creation of target language treebanks by means of annotation projection and machine translation. We present competitive results in cross-lingual dependency parsing using a combination of various techniques that contribute to the overall success of the method. We further include a detailed discussion about the impact of part-of-speech label accuracy on parsing results that provide guidance in practical applications of cross-lingual methods for truly under-resourced languages.


Sign in / Sign up

Export Citation Format

Share Document