Reinforced NMT for Sentiment and Content Preservation in Low-resource Scenario

Author(s):  
Divya Kumari ◽  
Asif Ekbal ◽  
Rejwanul Haque ◽  
Pushpak Bhattacharyya ◽  
Andy Way

The preservation of domain knowledge from source to the target is crucial in any translation workflows. Hence, translation service providers that use machine translation (MT) in production could reasonably expect that the translation process should transfer both the underlying pragmatics and the semantics of the source-side sentences into the target language. However, recent studies suggest that the MT systems often fail to preserve such crucial information (e.g., sentiment, emotion, gender traits) embedded in the source text in the target. In this context, the raw automatic translations are often directly fed to other natural language processing (NLP) applications (e.g., sentiment classifier) in a cross-lingual platform. Hence, the loss of such crucial information during the translation could negatively affect the performance of such downstream NLP tasks that heavily rely on the output of the MT systems. In our current research, we carefully balance both the sides (i.e., sentiment and semantics) during translation, by controlling a global-attention-based neural MT (NMT), to generate translations that encode the underlying sentiment of a source sentence while preserving its non-opinionated semantic content. Toward this, we use a state-of-the-art reinforcement learning method, namely, actor-critic , that includes a novel reward combination module, to fine-tune the NMT system so that it learns to generate translations that are best suited for a downstream task, viz. sentiment classification while ensuring the source-side semantics is intact in the process. Experimental results for Hindi–English language pair show that our proposed method significantly improves the performance of the sentiment classifier and alongside results in an improved NMT system.

2020 ◽  
pp. 016555152096278
Author(s):  
Rouzbeh Ghasemi ◽  
Seyed Arad Ashrafi Asli ◽  
Saeedeh Momtazi

With the advent of deep neural models in natural language processing tasks, having a large amount of training data plays an essential role in achieving accurate models. Creating valid training data, however, is a challenging issue in many low-resource languages. This problem results in a significant difference between the accuracy of available natural language processing tools for low-resource languages compared with rich languages. To address this problem in the sentiment analysis task in the Persian language, we propose a cross-lingual deep learning framework to benefit from available training data of English. We deployed cross-lingual embedding to model sentiment analysis as a transfer learning model which transfers a model from a rich-resource language to low-resource ones. Our model is flexible to use any cross-lingual word embedding model and any deep architecture for text classification. Our experiments on English Amazon dataset and Persian Digikala dataset using two different embedding models and four different classification networks show the superiority of the proposed model compared with the state-of-the-art monolingual techniques. Based on our experiment, the performance of Persian sentiment analysis improves 22% in static embedding and 9% in dynamic embedding. Our proposed model is general and language-independent; that is, it can be used for any low-resource language, once a cross-lingual embedding is available for the source–target language pair. Moreover, by benefitting from word-aligned cross-lingual embedding, the only required data for a reliable cross-lingual embedding is a bilingual dictionary that is available between almost all languages and the English language, as a potential source language.


2019 ◽  
Vol 31 (2) ◽  
pp. 571-574
Author(s):  
Ardian Fera

A preposition is a word or set of words that indicates location or some other relationship between a noun or pronoun and other parts of the sentence. It refers to the word or phrase which shows the relationship between one thing and another, linking nouns, pronouns and phrases to other words in a sentence. They are abstract words that have no concrete meaning. They merely show the relationships between groups of words. Within a preposition, there are many different variations in meaning that are conveyed. The proper interpretation of prepositions is an important issue for automatic natural language understanding. Although the complexity of preposition usage has been argued for and documented by various scholars in linguistics, psycholinguistics, and computational linguistics, very few studies have been done on the function of prepositions in natural language processing (NLP) applications. The reason is that prepositions are probably the most polysemous category and thus, their linguistic realizations are difficult to predict and their cross-linguistic regularities difficult to identify. Prepositions play a major role in the syntactic structures of the English language and they often make an essential contribution to sentence meaning by signifying temporal and spatial relationships, as well as abstract relations involving cause and purpose, agent and instrument, manner and accompaniment, support and much more. They are sensitive linguistic elements that are culturally acceptable and very well known to all members of the same linguistic community. According to cognitive semantics, the figurative senses of a preposition are extended from its spatial senses through conceptual metaphors. In a pedagogical context, it may be useful to draw learners' attention to those aspects of a preposition's spatial sense that are especially relevant for its metaphorization processes. Prepositions have type restrictions on their arguments, they assign thematic roles, and they have a semantic content, possibly underspecified. The only difference with the other open-class categories like nouns, verbs or adjectives is that they do not have any morphology.


2021 ◽  
Vol 7 ◽  
pp. e559
Author(s):  
Andraž Pelicon ◽  
Ravi Shekhar ◽  
Blaž Škrlj ◽  
Matthew Purver ◽  
Senja Pollak

Platforms that feature user-generated content (social media, online forums, newspaper comment sections etc.) have to detect and filter offensive speech within large, fast-changing datasets. While many automatic methods have been proposed and achieve good accuracies, most of these focus on the English language, and are hard to apply directly to languages in which few labeled datasets exist. Recent work has therefore investigated the use of cross-lingual transfer learning to solve this problem, training a model in a well-resourced language and transferring to a less-resourced target language; but performance has so far been significantly less impressive. In this paper, we investigate the reasons for this performance drop, via a systematic comparison of pre-trained models and intermediate training regimes on five different languages. We show that using a better pre-trained language model results in a large gain in overall performance and in zero-shot transfer, and that intermediate training on other languages is effective when little target-language data is available. We then use multiple analyses of classifier confidence and language model vocabulary to shed light on exactly where these gains come from and gain insight into the sources of the most typical mistakes.


2021 ◽  
pp. 67-78
Author(s):  
Любовь Владимировна Даржинова

In today’s digitalized world, discovering approaches to enhance written language processing is crucial for successful non-native language acquisition. Whereas psycholinguistic literature suggests that background knowledge generally facilitates written language processing,  hardly anything is known about whether religious affiliation as a part of language learner’s background affects non-native written language processing. Consequently, the current paper addresses the gap by conducting a small-scale web-based self-paced reading study. It explores whether English language learners  with Buddhist background process the Buddhist-related and religiously neutral texts similarly to those with the same proficiency level but with no religious affiliation. Thus, the experiment involved 20 Buddhist and non-religious learners of English from Russia’s regions of Kalmykia, Tuva, and Buryatia. The results of the experiment suggest that the Buddhist background of English language learners  contributes to faster processing and better recall of Buddhist-related texts in the target language. The paper argues for the need to supply written materials related to religion in a target language with notes and glossaries in order to hasten processing and improve recall in non-religious language learners.


English for Academic Purposes course focusing on the academic language needs of students is a subfield of English for Specific Purposes (ESP). It is a type of specialized course to integrate specific subject matter, language content, and material based on learners’ needs. The study aims to evaluate the British Council’s English for Academic Purposes (EAP) coursebook in terms of content, sequencing, learners’ autonomy, motivation, feedback and focus on language skills. Furthermore, the study tries to provide a general perception of the usefulness and effectiveness of the coursebook for undergraduate students. The EAP Students’ Manual coursebook is used as a primary source for the data collection. The researcher has chosen Nation & Macalister (2010) model of language teaching principles to analyze and discuss the data. The study found the coursebook a useful, effective and an appropriate source of English language learning in terms of the investigated aspects of the book. The findings report that the coursebook provides practice and practical usage in all domains of the academically required English language skills. It helps the students to build language competency and to be more independent learners. In addition, it provides an opportunity to the learners to think in the target language, use the language more practically and learn it in a natural type of environment. The study concludes and suggests that the content needs to be supplemented with English language audios and videos presenting the students relevant documentaries and helping material in order to make the coursebook and the learning process more useful, effective, interesting and motivating. Furthermore, the study recommends that while choosing /designing a coursebook for a certain course, it needs to be evaluated following the various criteria and language-teaching-principles suggested by different language researchers.


English for Academic Purposes course focusing on the academic language needs of students is a subfield of English for Specific Purposes (ESP). It is a type of specialized course to integrate specific subject matter, language content, and material based on learners’ needs. The study aims to evaluate the British Council’s English for Academic Purposes (EAP) coursebook in terms of content, sequencing, learners’ autonomy, motivation, feedback and focus on language skills. Furthermore, the study tries to provide a general perception of the usefulness and effectiveness of the coursebook for undergraduate students. The EAP Students’ Manual coursebook is used as a primary source for the data collection. The researcher has chosen Nation & Macalister (2010) model of language teaching principles to analyze and discuss the data. The study found the coursebook a useful, effective and an appropriate source of English language learning in terms of the investigated aspects of the book. The findings report that the coursebook provides practice and practical usage in all domains of the academically required English language skills. It helps the students to build language competency and to be more independent learners. In addition, it provides an opportunity to the learners to think in the target language, use the language more practically and learn it in a natural type of environment. The study concludes and suggests that the content needs to be supplemented with English language audios and videos presenting the students relevant documentaries and helping material in order to make the coursebook and the learning process more useful, effective, interesting and motivating. Furthermore, the study recommends that while choosing /designing a coursebook for a certain course, it needs to be evaluated following the various criteria and language-teaching-principles suggested by different language researchers.


2018 ◽  
Vol 6 (9) ◽  
pp. 7
Author(s):  
Dr. Shreeja Sharma ◽  
Prof. Shubhra Tripathi

The prime concern of every language teacher, and to some extent every linguist, is to device pedagogical methods and strategies which facilitate language acquisition. The concern of any teacher or applied linguist is, though not explicitly stated anywhere, to equip the learners with “correct” features of the language being learnt. Emphasis on “correctness” is due to the presumption that erroneous structures or deviations from linguistic code will lead to incomprehensibility and impediment in communication.As a result of such convictionsContrastive Analysis (CA) and Error Analysis (EA) focussed their attention on “correct” grammatical, lexical and syntactical features of Target Language (TL), in this case English.Both  Contrastive Analysts and Error Analysts analysed the language and tried to predict areas of ease or difficulty. This was often achieved with ‘some’ degree of success. However, in the present socio-educational milieu of Indian schools, where English language teaching is a significant stake, insights from CA and EA, particularly the latter, are either not taken into cognizance, or found inadequate. CA is taken into consideration, though obliquely, indirectly and cursorily, where English language is taught resorting to bilingualism. EA is usually ignored completely. Even when teachers correct students’ assignments and copies, they point out mistakes/errors, suggest corrections, but neither take into account why these mistakes/errors have occurred, nor how to prevent such cases in future. With the ever growing importance of English as a global language and a second language in India, no stakeholder in education can afford to undermine the significance of ELT.The time is therefore ripe to take a fresh look at Error Analysis (EA) and assess how it can be deployed as a powerful tool in school teaching.


2018 ◽  
Vol 28 (7) ◽  
pp. 2245-2249
Author(s):  
Suzana Ejupi ◽  
Lindita Skenderi

Working with English learners for many years, gives you the opportunity to encounter linguistic obstacles that they face while learning English language as a foreign language. Additionally, teaching for 13 years and observing the learning process, it enables you to recognize the students’ needs and at the same time, detect linguistic mistakes that they make, while practicing the target language. During my experience as a teacher, in terms of teaching and learning verbs in general and its grammatical categories in specific, it is noticed that Albanian learners find it relatively difficult the correct use of verbs in context and even more confusing the equivalent use of verbs in Albanian. Since verbs present an important part of speech, this study aims to investigate several differences and similarities between grammatical categories of verbs in English and Albanian. As a result, the Albanian learners of English language will be able to identify some of the major differences and similarities between the grammatical categories of verbs in English and Albanian; overcome the usual mistakes; gain the necessary knowledge regarding verbs and use them properly in English and Albanian.


Interpreting ◽  
2017 ◽  
Vol 19 (1) ◽  
pp. 1-20 ◽  
Author(s):  
Ena Hodzik ◽  
John N. Williams

We report a study on prediction in shadowing and simultaneous interpreting (SI), both considered as forms of real-time, ‘online’ spoken language processing. The study comprised two experiments, focusing on: (i) shadowing of German head-final sentences by 20 advanced students of German, all native speakers of English; (ii) SI of the same sentences into English head-initial sentences by 22 advanced students of German, again native English speakers, and also by 11 trainee and practising interpreters. Latency times for input and production of the target verbs were measured. Drawing on studies of prediction in English-language reading production, we examined two cues to prediction in both experiments: contextual constraints (semantic cues in the context) and transitional probability (the statistical likelihood of words occurring together in the language concerned). While context affected prediction during both shadowing and SI, transitional probability appeared to favour prediction during shadowing but not during SI. This suggests that the two cues operate on different levels of language processing in SI.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Candice Frances ◽  
Eugenia Navarra-Barindelli ◽  
Clara D. Martin

AbstractLanguage perception studies on bilinguals often show that words that share form and meaning across languages (cognates) are easier to process than words that share only meaning. This facilitatory phenomenon is known as the cognate effect. Most previous studies have shown this effect visually, whereas the auditory modality as well as the interplay between type of similarity and modality remain largely unexplored. In this study, highly proficient late Spanish–English bilinguals carried out a lexical decision task in their second language, both visually and auditorily. Words had high or low phonological and orthographic similarity, fully crossed. We also included orthographically identical words (perfect cognates). Our results suggest that similarity in the same modality (i.e., orthographic similarity in the visual modality and phonological similarity in the auditory modality) leads to improved signal detection, whereas similarity across modalities hinders it. We provide support for the idea that perfect cognates are a special category within cognates. Results suggest a need for a conceptual and practical separation between types of similarity in cognate studies. The theoretical implication is that the representations of items are active in both modalities of the non-target language during language processing, which needs to be incorporated to our current processing models.


Sign in / Sign up

Export Citation Format

Share Document