Cross-Lingual Sentiment Classification from English to Arabic using Machine Translation

Sentiment classification typically relies on a large amount of labeled data. In practice, the availability of labels is highly imbalanced among different languages. To tackle this problem, cross-lingual sentiment classification approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i.e., the source language, usually English) to another language with fewer labels (i.e., the target language). The source and the target languages are usually bridged through off-the-shelf machine translation tools. Through such a channel, cross-language sentiment patterns can be successfully learned from English and transferred into the target languages. This approach, however, often fails to capture sentiment knowledge specific to the target language. In this paper, we employ emojis, which are widely available in many languages, as a new channel to learn both the cross-language and the language-specific sentiment patterns. We propose a novel representation learning method that uses emoji prediction as an instrument to learn respective sentiment-aware representations for each language. The learned representations are then integrated to facilitate cross-lingual sentiment classification.

Download Full-text

Deep Learning in Cross-Lingual English-Vietnamese Sentiment Classification

2018 10th International Conference on Knowledge and Systems Engineering (KSE) ◽

10.1109/kse.2018.8573366 ◽

2018 ◽

Author(s):

Alexander Sedunov ◽

Hady Salloum ◽

Alexander Sutin ◽

Nikolay Sedunov

Keyword(s):

Deep Learning ◽

Sentiment Classification ◽

Cross Lingual

Download Full-text

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

10.18653/v1/2020.acl-main.151 ◽

2020 ◽

Author(s):

Wei Zhao ◽

Goran Glavaš ◽

Maxime Peyrard ◽

Yang Gao ◽

Robert West ◽

...

Keyword(s):

Machine Translation ◽

Machine Translation Evaluation ◽

Cross Lingual ◽

Free Machine

Download Full-text

The data-driven Bulgarian WordNet: BTBWN

Cognitive Studies | Études cognitives ◽

10.11649/cs.1713 ◽

2018 ◽

Author(s):

Petya Osenova ◽

Kiril Simov

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Semantic Information ◽

Data Driven ◽

Lexical Resources ◽

Multilingual Information Retrieval ◽

Cross Lingual ◽

Princeton Wordnet ◽

Word Senses

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.

Download Full-text

Co-training for cross-lingual sentiment classification

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687913 ◽

2009 ◽

Cited By ~ 127

Author(s):

Xiaojun Wan

Keyword(s):

Sentiment Classification ◽

Cross Lingual

Download Full-text

Expanding the JHU Bible Corpus for Machine Translation of the Indigenous Languages of North America

10.33011/computel.v1i.949 ◽

2021 ◽

Vol 1 (2) ◽

Author(s):

Garrett Nicolai ◽

Edith Coates ◽

Ming Zhang ◽

Miika Silfverberg

Keyword(s):

North America ◽

Machine Translation ◽

Indigenous Languages ◽

Low Resource ◽

Bible Translations ◽

Cross Lingual

We present an extension to the JHU Bible corpus, collecting and normalizing more than thirty Bible translations in thirty Indigenous languages of North America. These exhibit a wide variety of interesting syntactic and morphological phenomena that are understudied in the computational community. Neural translation experiments demonstrate significant gains obtained through cross-lingual, many-to-many translation, with improvements of up to 8.4 BLEU over monolingual models for extremely low-resource languages.

Download Full-text

Zero-Shot Learning for Cross-Lingual News Sentiment Classification

Applied Sciences ◽

10.3390/app10175993 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5993

Author(s):

Andraž Pelicon ◽

Marko Pranjić ◽

Dragana Miljković ◽

Blaž Škrlj ◽

Senja Pollak

Keyword(s):

Classification System ◽

State Of The Art ◽

Sentiment Classification ◽

Training Data ◽

Test Set ◽

Novel Technique ◽

Analysis Task ◽

Negative News ◽

Cross Lingual ◽

News Sentiment

In this paper, we address the task of zero-shot cross-lingual news sentiment classification. Given the annotated dataset of positive, neutral, and negative news in Slovene, the aim is to develop a news classification system that assigns the sentiment category not only to Slovene news, but to news in another language without any training data required. Our system is based on the multilingual BERTmodel, while we test different approaches for handling long documents and propose a novel technique for sentiment enrichment of the BERT model as an intermediate training step. With the proposed approach, we achieve state-of-the-art performance on the sentiment analysis task on Slovenian news. We evaluate the zero-shot cross-lingual capabilities of our system on a novel news sentiment test set in Croatian. The results show that the cross-lingual approach also largely outperforms the majority classifier, as well as all settings without sentiment enrichment in pre-training.

Download Full-text

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301241 ◽

2019 ◽

Vol 33 ◽

pp. 241-248 ◽

Cited By ~ 3

Author(s):

Shuo Ren ◽

Zhirui Zhang ◽

Shujie Liu ◽

Ming Zhou ◽

Shuai Ma

Keyword(s):

Machine Translation ◽

Language Models ◽

Translation Process ◽

Weak Supervision ◽

Neural Machine Translation ◽

Back Translation ◽

Negative Effect ◽

Model Training ◽

Cross Lingual ◽

Pseudo Data

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.

Download Full-text

Transfer Learning for Cross-Lingual Sentiment Classification with Weakly Shared Deep Neural Networks

Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16 ◽

10.1145/2911451.2911490 ◽

2016 ◽

Cited By ~ 8

Author(s):

Guangyou Zhou ◽

Zhao Zeng ◽

Jimmy Xiangji Huang ◽

Tingting He

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Deep Neural Networks ◽

Sentiment Classification ◽

Cross Lingual

Download Full-text

Multilingual Dependency Parsing: Using Machine Translated Texts Instead of Parallel Corpora

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0017 ◽

2014 ◽

Vol 102 (1) ◽

pp. 93-104

Author(s):

Ramasamy Loganathan ◽

Mareček David ◽

Žabokrtský Zdenčk

Keyword(s):

Machine Translation ◽

The Other ◽

Target Language ◽

Grammar Induction ◽

Language Resources ◽

Parallel Corpora ◽

Similar Performance ◽

Part Of Speech ◽

Target Languages ◽

Cross Lingual

Abstract This paper revisits the projection-based approach to dependency grammar induction task. Traditional cross-lingual dependency induction tasks one way or the other, depend on the existence of bitexts or target language tools such as part-of-speech (POS) taggers to obtain reasonable parsing accuracy. In this paper, we transfer dependency parsers using only approximate resources, i.e., machine translated bitexts instead of manually created bitexts. We do this by obtaining the the source side of the text from a machine translation (MT) system and then apply transfer approaches to induce parser for the target languages. We further reduce the need for the availability of labeled target language resources by using unsupervised target tagger. We show that our approach consistently outperforms unsupervised parsers by a bigger margin (8.2% absolute), and results in similar performance when compared with delexicalized transfer parsers.

Download Full-text