scholarly journals Language engineering for syntactic knowledge transfer

2012 ◽  
Vol 9 (3) ◽  
pp. 1231-1247 ◽  
Author(s):  
Mihaela Colhon

In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically resulted from syntactic parsing. The method reuses and adjusts existing tools and algorithms for cross-lingual transfer of syntactic constituents and syntactic trees alignment.

Author(s):  
Željko Agić ◽  
Anders Johannsen ◽  
Barbara Plank ◽  
Héctor Martínez Alonso ◽  
Natalie Schluter ◽  
...  

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.


2016 ◽  
Author(s):  
Di Lu ◽  
Xiaoman Pan ◽  
Nima Pourdamghani ◽  
Shih-Fu Chang ◽  
Heng Ji ◽  
...  

2020 ◽  
Vol 34 (05) ◽  
pp. 9547-9554
Author(s):  
Mozhi Zhang ◽  
Yoshinari Fujinuma ◽  
Jordan Boyd-Graber

Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.


2015 ◽  
Vol 30 (5) ◽  
pp. 1299-1323 ◽  
Author(s):  
Geert Heyman ◽  
Ivan Vulić ◽  
Marie-Francine Moens
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document