Transition-based Adversarial Network for Cross-lingual Aspect Extraction

In fine-grained opinion mining, the task of aspect extraction involves the identification of explicit product features in customer reviews. This task has been widely studied in some major languages, e.g., English, but was seldom addressed in other minor languages due to the lack of annotated corpus. To solve it, we develop a novel deep model to transfer knowledge from a source language with labeled training data to a target language without any annotations. Different from cross-lingual sentiment classification, aspect extraction across languages requires more fine-grained adaptation. To this end, we utilize transition-based mechanism that reads a word each time and forms a series of configurations that represent the status of the whole sentence. We represent each configuration as a continuous feature vector and align these representations from different languages into a shared space through an adversarial network. In addition, syntactic structures are also integrated into the deep model to achieve more syntactically-sensitive adaptations. The proposed method is end-to-end and achieves state-of-the-art performance on English, French and Spanish restaurant review datasets.

Download Full-text

Semi-Supervised Aspect-Based Sentiment Analysis for Case-Related Microblog Reviews Using Case Knowledge Graph Embedding

International Journal of Asian Language Processing ◽

10.1142/s2717554520500125 ◽

2021 ◽

pp. 2050012

Author(s):

Peilian Zhao ◽

Cunli Mao ◽

Zhengtao Yu

Keyword(s):

Sentiment Analysis ◽

Domain Knowledge ◽

Opinion Mining ◽

Data Augmentation ◽

Training Data ◽

Knowledge Graph ◽

Fine Grained ◽

Learning Framework ◽

Proposed Model ◽

Real World Applications

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.

Download Full-text

Deep Persian sentiment analysis: Cross-lingual training for low-resource languages

Journal of Information Science ◽

10.1177/0165551520962781 ◽

2020 ◽

pp. 016555152096278

Author(s):

Rouzbeh Ghasemi ◽

Seyed Arad Ashrafi Asli ◽

Saeedeh Momtazi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Low Resource ◽

Proposed Model ◽

Significant Difference ◽

Cross Lingual

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.

Download Full-text

Early Finiteness in German and Dutch Child Language

Toegepaste Taalwetenschap in Artikelen ◽

10.1075/ttwia.81.08win ◽

2009 ◽

Vol 81 ◽

pp. 75-85

Author(s):

S. Winkler

Keyword(s):

Child Language ◽

Target Language ◽

Fine Grained ◽

The Status ◽

Dutch Child ◽

Corpus Data ◽

Linguistic Behaviour ◽

Dutch Children

The present paper deals with the acquisition of finiteness in German and Dutch child language. More specifically, it discusses the assumption of fundamental similarities in the development of the finiteness category in German and Dutch L1 as postulated by Dimroth et al. (2003). A comparison of German and Dutch child corpus data will show that Dimroth et al.'s assumption can be maintained as far as the overall development of the finiteness category is concerned. At a more fine-grained level, however, German and Dutch children exhibit different linguistic behaviour. This concerns in particular the means for the expression of early finiteness and the status of the auxiliary hebben/haben 'to have'. The observed differences can be explained as the result of target language specific properties of the input.

Download Full-text

Summaries

Toegepaste Taalwetenschap in Artikelen ◽

10.1075/ttwia.10.08sum ◽

1981 ◽

Vol 10 ◽

pp. 132-137

Keyword(s):

Child Language ◽

Target Language ◽

Fine Grained ◽

The Status ◽

Dutch Child ◽

Corpus Data ◽

Linguistic Behaviour ◽

Dutch Children

Download Full-text

Reinforced Transformer with Cross-Lingual Distillation for Cross-Lingual Aspect Sentiment Classification

Electronics ◽

10.3390/electronics10030270 ◽

2021 ◽

Vol 10 (3) ◽

pp. 270

Author(s):

Hanqian Wu ◽

Zhike Wang ◽

Feng Qing ◽

Shoushan Li

Keyword(s):

General Purpose ◽

Sentiment Classification ◽

Training Data ◽

Target Language ◽

Source Language ◽

Domain Specific ◽

Novel Approach ◽

The Rich ◽

Target Languages ◽

Cross Lingual

Though great progress has been made in the Aspect-Based Sentiment Analysis(ABSA) task through research, most of the previous work focuses on English-based ABSA problems, and there are few efforts on other languages mainly due to the lack of training data. In this paper, we propose an approach for performing a Cross-Lingual Aspect Sentiment Classification (CLASC) task which leverages the rich resources in one language (source language) for aspect sentiment classification in a under-resourced language (target language). Specifically, we first build a bilingual lexicon for domain-specific training data to translate the aspect category annotated in the source-language corpus and then translate sentences from the source language to the target language via Machine Translation (MT) tools. However, most MT systems are general-purpose, it non-avoidably introduces translation ambiguities which would degrade the performance of CLASC. In this context, we propose a novel approach called Reinforced Transformer with Cross-Lingual Distillation (RTCLD) combined with target-sensitive adversarial learning to minimize the undesirable effects of translation ambiguities in sentence translation. We conduct experiments on different language combinations, treating English as the source language and Chinese, Russian, and Spanish as target languages. The experimental results show that our proposed approach outperforms the state-of-the-art methods on different target languages.

Download Full-text

Deep interactive learning for fine-grained opinion mining : single-domain, cross-domain & cross-lingual

10.32657/10356/75872 ◽

2018 ◽

Author(s):

◽

Wenya Wang

Keyword(s):

Opinion Mining ◽

Interactive Learning ◽

Single Domain ◽

Fine Grained ◽

Cross Domain ◽

Cross Lingual

Download Full-text

Embedding Projection for Targeted Cross-lingual Sentiment: Model Comparisons and a Real-World Study

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11561 ◽

2019 ◽

Vol 66 ◽

Author(s):

Jeremy Barnes ◽

Roman Klinger

Keyword(s):

Sentiment Analysis ◽

State Of The Art ◽

Target Language ◽

Test Machine ◽

Fine Grained ◽

Sentence Level ◽

Level Information ◽

Cross Lingual ◽

Multiple Domains ◽

Embedding Methods

Sentiment analysis benefits from large, hand-annotated resources in order to train and test machine learning models, which are often data hungry. While some languages, e.g., English, have a vast arrayof these resources, most under-resourced languages do not, especially for fine-grained sentiment tasks, such as aspect-level or targeted sentiment analysis. To improve this situation, we propose a cross-lingual approach to sentiment analysis that is applicable to under-resourced languages and takes into account target-level information. This model incorporates sentiment information into bilingual distributional representations, byjointly optimizing them for semantics and sentiment, showing state-of-the-art performance at sentence-level when combined with machine translation. The adaptation to targeted sentiment analysis on multiple domains shows that our model outperforms other projection-based bilingual embedding methods on binary targetedsentiment tasks. Our analysis on ten languages demonstrates that the amount of unlabeled monolingual data has surprisingly little effect on the sentiment results. As expected, the choice of a annotated source language for projection to a target leads to better results for source-target language pairs which are similar. Therefore, our results suggest that more efforts should be spent on the creation of resources for less similar languages tothose which are resource-rich already. Finally, a domain mismatch leads to a decreased performance. This suggests resources in any language should ideally cover varieties of domains.

Download Full-text

Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

Electronics ◽

10.3390/electronics10243172 ◽

2021 ◽

Vol 10 (24) ◽

pp. 3172

Author(s):

Qingran Zhan ◽

Xiang Xie ◽

Chenguang Hu ◽

Juan Zuluaga-Gomez ◽

Jing Wang ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Recognition ◽

Training Data ◽

Target Language ◽

Learning Method ◽

Acoustic Features ◽

Adversarial Learning ◽

Phonological Knowledge ◽

Cross Lingual

Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.

Download Full-text

Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6500 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9547-9554

Author(s):

Mozhi Zhang ◽

Yoshinari Fujinuma ◽

Jordan Boyd-Graber

Keyword(s):

Knowledge Transfer ◽

Text Classification ◽

Document Classification ◽

Training Data ◽

Target Language ◽

Source Language ◽

Low Resource ◽

Classification Framework ◽

Related Language ◽

Cross Lingual

Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.

Download Full-text

Monolingual and Cross-Lingual Intent Detection without Training Data in Target Languages

Electronics ◽

10.3390/electronics10121412 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1412

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Askars Salimbajevs ◽

Raivis Skadiņš

Keyword(s):

Experimental Investigation ◽

Training Data ◽

Fine Tuning ◽

Target Language ◽

Learning Approach ◽

Lazy Learning ◽

Detection Problem ◽

Target Languages ◽

Cross Lingual ◽

Similar Accuracy

Due to recent DNN advancements, many NLP problems can be effectively solved using transformer-based models and supervised data. Unfortunately, such data is not available in some languages. This research is based on assumptions that (1) training data can be obtained by the machine translating it from another language; (2) there are cross-lingual solutions that work without the training data in the target language. Consequently, in this research, we use the English dataset and solve the intent detection problem for five target languages (German, French, Lithuanian, Latvian, and Portuguese). When seeking the most accurate solutions, we investigate BERT-based word and sentence transformers together with eager learning classifiers (CNN, BERT fine-tuning, FFNN) and lazy learning approach (Cosine similarity as the memory-based method). We offer and evaluate several strategies to overcome the data scarcity problem with machine translation, cross-lingual models, and a combination of the previous two. The experimental investigation revealed the robustness of sentence transformers under various cross-lingual conditions. The accuracy equal to ~0.842 is achieved with the English dataset with completely monolingual models is considered our top-line. However, cross-lingual approaches demonstrate similar accuracy levels reaching ~0.831, ~0.829, ~0.853, ~0.831, and ~0.813 on German, French, Lithuanian, Latvian, and Portuguese languages.

Download Full-text