MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.

Download Full-text

Deep Domain Adaptation for Low-Resource Cross-Lingual Text Classification Tasks

Communications in Computer and Information Science - Computational Linguistics ◽

10.1007/978-981-15-6168-9_14 ◽

2020 ◽

pp. 155-168

Author(s):

Guan-Yuan Chen ◽

Von-Wun Soo

Keyword(s):

Text Classification ◽

Domain Adaptation ◽

Low Resource ◽

Classification Tasks ◽

Cross Lingual

Download Full-text

Multi-level cross-lingual attentive neural architecture for low resource name tagging

Tsinghua Science & Technology ◽

10.23919/tst.2017.8195346 ◽

2017 ◽

Vol 22 (6) ◽

pp. 633-645 ◽

Cited By ~ 2

Author(s):

Xiaocheng Feng ◽

Lifu Huang ◽

Bing Qin ◽

Ying Lin ◽

Heng Ji ◽

...

Keyword(s):

Low Resource ◽

Neural Architecture ◽

Multi Level ◽

Cross Lingual

Download Full-text

Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2013.2281575 ◽

2014 ◽

Vol 22 (1) ◽

pp. 17-27 ◽

Cited By ~ 10

Author(s):

Liang Lu ◽

Arnab Ghoshal ◽

Steve Renals

Keyword(s):

Speech Recognition ◽

Mixture Models ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Low Resource ◽

Cross Lingual

Download Full-text

Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

Journal of Information Science ◽

10.1177/0165551520962781 ◽

2020 ◽

pp. 016555152096278

Author(s):

Rouzbeh Ghasemi ◽

Seyed Arad Ashrafi Asli ◽

Saeedeh Momtazi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Low Resource ◽

Proposed Model ◽

Significant Difference ◽

Cross Lingual

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.

Download Full-text

Multilingual Projection for Parsing Truly Low-Resource Languages

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00100 ◽

2016 ◽

Vol 4 ◽

pp. 301-312 ◽

Cited By ~ 12

Author(s):

Željko Agić ◽

Anders Johannsen ◽

Barbara Plank ◽

Héctor Martínez Alonso ◽

Natalie Schluter ◽

...

Keyword(s):

Empirical Evaluation ◽

Upper Bounds ◽

Low Resource ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Novel Approach ◽

Cross Lingual ◽

Test Languages ◽

Speech Tagging ◽

Parallel Texts

We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our method consistently provides top-level accuracies, close to established upper bounds, and outperforms several competitive baselines.

Download Full-text