On Application of Natural Language Processing in Machine Translation

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

Download Full-text

Metrics for evaluating phonetics machine translation in Natural Language Processing through modified Edit Distance algorithm-A naïve approach

2015 International Conference on Computer Communication and Informatics (ICCCI) ◽

10.1109/iccci.2015.7218113 ◽

2015 ◽

Cited By ~ 1

Author(s):

M Hanumanthappa ◽

Rashmi S ◽

Mallamma V Reddy

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Edit Distance

Download Full-text

Biomedical Concept Recognition Using Deep Neural Sequence Models

10.1101/530337 ◽

2019 ◽

Cited By ~ 2

Author(s):

Negacy D. Hailu ◽

Michael Bada ◽

Asmelash Teka Hadgu ◽

Lawrence E. Hunter

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Conditional Random Field ◽

Concept Recognition ◽

Performance Improvements ◽

Art Performance

AbstractBackgroundthe automated identification of mentions of ontological concepts in natural language texts is a central task in biomedical information extraction. Despite more than a decade of effort, performance in this task remains below the level necessary for many applications.Resultsrecently, applications of deep learning in natural language processing have demonstrated striking improvements over previously state-of-the-art performance in many related natural language processing tasks. Here we demonstrate similarly striking performance improvements in recognizing biomedical ontology concepts in full text journal articles using deep learning techniques originally developed for machine translation. For example, our best performing system improves the performance of the previous state-of-the-art in recognizing terms in the Gene Ontology Biological Process hierarchy, from a previous best F1 score of 0.40 to an F1 of 0.70, nearly halving the error rate. Nearly all other ontologies show similar performance improvements.ConclusionsA two-stage concept recognition system, which is a conditional random field model for span detection followed by a deep neural sequence model for normalization, improves the state-of-the-art performance for biomedical concept recognition. Treating the biomedical concept normalization task as a sequence-to-sequence mapping task similar to neural machine translation improves performance.

Download Full-text

A Novel Natural Language Processing (NLP)–Based Machine Translation Model for English to Pakistan Sign Language Translation

Cognitive Computation ◽

10.1007/s12559-020-09731-7 ◽

2020 ◽

Vol 12 (4) ◽

pp. 748-765

Author(s):

Nabeel Sabir Khan ◽

Adnan Abid ◽

Kamran Abid

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sign Language ◽

Machine Translation ◽

Language Processing ◽

Language Translation ◽

Translation Model

Download Full-text

PhraseAttn: Dynamic Slot Capsule Networks for phrase representation in Neural Machine Translation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212101 ◽

2021 ◽

pp. 1-8

Author(s):

Binh Nguyen ◽

Binh Le ◽

Long H.B. Nguyen ◽

Dien Dinh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Vital Role ◽

Attention Mechanism ◽

Neural Machine Translation ◽

Translation Model ◽

Word Representation

Word representation plays a vital role in most Natural Language Processing systems, especially for Neural Machine Translation. It tends to capture semantic and similarity between individual words well, but struggle to represent the meaning of phrases or multi-word expressions. In this paper, we investigate a method to generate and use phrase information in a translation model. To generate phrase representations, a Primary Phrase Capsule network is first employed, then iteratively enhancing with a Slot Attention mechanism. Experiments on the IWSLT English to Vietnamese, French, and German datasets show that our proposed method consistently outperforms the baseline Transformer, and attains competitive results over the scaled Transformer with two times lower parameters.

Download Full-text

Machine translation using natural language processing

MATEC Web of Conferences ◽

10.1051/matecconf/201927702004 ◽

2019 ◽

Vol 277 ◽

pp. 02004

Author(s):

Middi Venkata Sai Rishita ◽

Middi Appala Raju ◽

Tanvir Ahmed Harris

Keyword(s):

Neural Network ◽

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Deep Neural Network ◽

English Text ◽

French Translation ◽

Rule Based

Machine Translation is the translation of text or speech by a computer with no human involvement. It is a popular topic in research with different methods being created, like rule-based, statistical and examplebased machine translation. Neural networks have made a leap forward to machine translation. This paper discusses the building of a deep neural network that functions as a part of end-to-end translation pipeline. The completed pipeline would accept English text as input and return the French Translation. The project has three main parts which are preprocessing, creation of models and Running the model on English Text.

Download Full-text

Abducing term variant translations in aligned texts

Terminology ◽

10.1075/term.10.1.06car ◽

2004 ◽

Vol 10 (1) ◽

pp. 101-130 ◽

Cited By ~ 2

Author(s):

Michael Carl ◽

Ecaterina Rascu ◽

Johann Haller ◽

Philippe Langlais

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Text Indexing ◽

Multiple Resources ◽

Specific Term ◽

Term Variation ◽

Bilingual Text

Term variation is an important issue in various applications of natural language processing (NLP) such as machine translation, information retrieval and text indexing. In this paper, we describe an ‘Abductive Terminological Database’ (ATDB) aiming to detect translations of terms and their variants in bilingual texts. We describe abduction as the process to infer specific term translation templates from multiple resources which have been induced from a bilingual text. We show that precision and recall of the ATDB increase when using more resources and when the resources interfere in a less restricted way. We discuss a way to feed back evaluation values into the induced resources thus allowing for weighted abduction which further enhances the precision of the tool.

Download Full-text

Natural language processing for similar languages, varieties, and dialects: A survey

Natural Language Engineering ◽

10.1017/s1351324920000492 ◽

2020 ◽

Vol 26 (6) ◽

pp. 595-612

Author(s):

Marcos Zampieri ◽

Preslav Nakov ◽

Yves Scherrer

Keyword(s):

Natural Language Processing ◽

Data Collection ◽

Natural Language ◽

Machine Translation ◽

Computational Methods ◽

Language Processing ◽

Language Varieties ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

AbstractThere has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.

Download Full-text

Analyzing Architectures for Neural Machine Translation using Low Computational Resources

International Journal on Natural Language Computing ◽

10.5121/ijnlc.2021.10502 ◽

2021 ◽

Vol 10 (5) ◽

pp. 9-16

Author(s):

Aditya Mandke ◽

Onkar Litake ◽

Dipali Kadam

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

State Of The Art ◽

Time Constraints ◽

Neural Machine Translation ◽

Recent Developments ◽

Computationally Expensive ◽

Computational Resources

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.

Download Full-text