Japanese translation teaching corpus based on bilingual non parallel data model

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189407 ◽

2020 ◽

pp. 1-11

Author(s):

Zheng Guo ◽

Zhu Jifeng

Keyword(s):

Machine Translation ◽

Language Processing ◽

Data Model ◽

Graph Representation ◽

Small Scale ◽

Translation Model ◽

Parallel Corpus ◽

Parallel Data ◽

Intelligent Technology ◽

Japanese Translation

In recent years, with the development of Internet and intelligent technology, Japanese translation teaching has gradually explored a new teaching mode. Under the guidance of natural language processing and intelligent machine translation, machine translation based on statistical model has gradually become one of the primary auxiliary tools in Japanese translation teaching. In order to solve the problems of small scale, slow speed and incomplete field in the traditional parallel corpus machine translation, this paper constructs a Japanese translation teaching corpus based on the bilingual non parallel data model, and uses this corpus to train Japanese translation teaching machine translation model Moses to get better auxiliary effect. In the process of construction, for non parallel corpus, we use the translation retrieval framework based on word graph representation to extract parallel sentence pairs from the corpus, and then build a translation retrieval model based on Bilingual non parallel data. The experimental results of training Moses translation model with Japanese translation corpus show that the bilingual nonparallel data model constructed in this paper has good translation retrieval performance. Compared with the existing algorithm, the Bleu value extracted in the parallel sentence pair is increased by 2.58. In addition, the retrieval method based on the structure of translation option words graph proposed in this paper is time efficient and has better performance and efficiency in assisting Japanese translation teaching.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation

Computational Intelligence and Neuroscience ◽

10.1155/2021/6682385 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Michael Adjeisah ◽

Guohua Liu ◽

Douglas Omwenga Nyabuga ◽

Richard Nuetey Nortey ◽

Jinling Song

Keyword(s):

Machine Translation ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Similarity Metrics ◽

Mahalanobis Distances ◽

Parallel Corpora ◽

Parallel Corpus ◽

Low Resource ◽

Sentence Level

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.

Download Full-text

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics ◽

10.1162/089120105775299168 ◽

2005 ◽

Vol 31 (4) ◽

pp. 477-504 ◽

Cited By ~ 104

Author(s):

Dragos Stefan Munteanu ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpora ◽

Parallel Corpus ◽

Scarce Resources ◽

Parallel Data ◽

Machine Translation System ◽

Novel Method ◽

Arabic And English

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.

Download Full-text

A Study on the Intelligent Translation Model for English Incorporating Neural Network Migration Learning

Wireless Communications and Mobile Computing ◽

10.1155/2021/1244389 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yanbo Zhang

Keyword(s):

Neural Network ◽

Machine Translation ◽

Semantic Information ◽

Data Representation ◽

Research Trend ◽

Future Research ◽

Text Data ◽

Translation Model ◽

Parallel Corpus ◽

End To End

Under the current artificial intelligence boom, machine translation is a research direction of natural language processing, which has important scientific research value and practical value. In practical applications, the variability of language, the limited capability of representing semantic information, and the scarcity of parallel corpus resources all constrain machine translation towards practicality and popularization. In this paper, we conduct deep mining of source language text data to express complex, high-level, and abstract semantic information using an appropriate text data representation model; then, for machine translation tasks with a large amount of parallel corpus, I use the capability of annotated datasets to build a more effective migration learning-based end-to-end neural network machine translation model on a supervised algorithm; then, for machine translation tasks with parallel corpus data resource-poor language machine translation tasks, migration learning techniques are used to prevent the overfitting problem of neural networks during training and to improve the generalization ability of end-to-end neural network machine translation models under low-resource conditions. Finally, for language translation tasks where the parallel corpus is extremely scarce but monolingual corpus is sufficient, the research focuses on unsupervised machine translation techniques, which will be a future research trend.

Download Full-text

A Novel Natural Language Processing (NLP)–Based Machine Translation Model for English to Pakistan Sign Language Translation

Cognitive Computation ◽

10.1007/s12559-020-09731-7 ◽

2020 ◽

Vol 12 (4) ◽

pp. 748-765

Author(s):

Nabeel Sabir Khan ◽

Adnan Abid ◽

Kamran Abid

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sign Language ◽

Machine Translation ◽

Language Processing ◽

Language Translation ◽

Translation Model

Download Full-text

PhraseAttn: Dynamic Slot Capsule Networks for phrase representation in Neural Machine Translation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212101 ◽

2021 ◽

pp. 1-8

Author(s):

Binh Nguyen ◽

Binh Le ◽

Long H.B. Nguyen ◽

Dien Dinh

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Vital Role ◽

Attention Mechanism ◽

Neural Machine Translation ◽

Translation Model ◽

Word Representation

Word representation plays a vital role in most Natural Language Processing systems, especially for Neural Machine Translation. It tends to capture semantic and similarity between individual words well, but struggle to represent the meaning of phrases or multi-word expressions. In this paper, we investigate a method to generate and use phrase information in a translation model. To generate phrase representations, a Primary Phrase Capsule network is first employed, then iteratively enhancing with a Slot Attention mechanism. Experiments on the IWSLT English to Vietnamese, French, and German datasets show that our proposed method consistently outperforms the baseline Transformer, and attains competitive results over the scaled Transformer with two times lower parameters.

Download Full-text

Template-Based Model for Mongolian-Chinese Machine Translation

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p0893 ◽

2016 ◽

Vol 20 (6) ◽

pp. 893-901

Author(s):

Jing Wu ◽

◽

Hongxu Hou ◽

Feilong Bao ◽

Yupeng Jiang

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Combined System ◽

Translation Model ◽

Parallel Corpus ◽

Address Data ◽

Fuzzy Match ◽

Novel Method ◽

Extraction Model ◽

Template Extraction

Mongolian and Chinese statistical machine translation (SMT) system has its limitation because of the complex Mongolian morphology, scarce resource of parallel corpus and the significant syntax differences. To address these problems, we propose a template-based machine translation (TBMT) system and combine it with the SMT system to achieve a better translation performance. The TBMT model we proposed includes a template extraction model and a template translation model. In the template extraction model, we present a novel method of aligning and abstracting static words from bilingual parallel corpus to extract templates automatically. In the template translation model, our specially designed method of filtering out the low quality matches can enhance the translation performance. Moreover, we apply lemmatization and Latinization to address data sparsity and do the fuzzy match. Experimentally, the coverage of TBMT system is over 50%. The combined SMT system translates all the other uncovered source sentences. The TBMT system outperforms the baselines of phrase-based and hierarchical phrase-based SMT systems for +3.08 and +1.40 BLEU points. The combined system of TBMT and SMT systems also performs better than the baselines of +2.49 and +0.81 BLEU points.

Download Full-text

“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

Chemical Science ◽

10.1039/c8sc02339e ◽

2018 ◽

Vol 9 (28) ◽

pp. 6091-6098 ◽

Cited By ~ 78

Author(s):

Philippe Schwaller ◽

Théophile Gaudin ◽

Dávid Lányi ◽

Costas Bekas ◽

Teodoro Laino

Keyword(s):

Organic Chemistry ◽

Machine Translation ◽

Chemical Reactions ◽

Language Processing ◽

Neural Machine Translation ◽

Translation Model ◽

Complex Organic

Using a text-based representation of molecules, chemical reactions are predicted with a neural machine translation model borrowed from language processing.

Download Full-text

English Machine Translation Model Based on an Improved Self-Attention Technology

Scientific Programming ◽

10.1155/2021/2601480 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Wenxia Pan

Keyword(s):

Machine Translation ◽

Language Processing ◽

Extraction Methods ◽

Attention Mechanism ◽

The Self ◽

Position Information ◽

Directional Information ◽

Translation Model ◽

Model Based ◽

Phrase Extraction

English machine translation is a natural language processing research direction that has important scientific research value and practical value in the current artificial intelligence boom. The variability of language, the limited ability to express semantic information, and the lack of parallel corpus resources all limit the usefulness and popularity of English machine translation in practical applications. The self-attention mechanism has received a lot of attention in English machine translation tasks because of its highly parallelizable computing ability, which reduces the model’s training time and allows it to capture the semantic relevance of all words in the context. The efficiency of the self-attention mechanism, however, differs from that of recurrent neural networks because it ignores the position and structure information between context words. The English machine translation model based on the self-attention mechanism uses sine and cosine position coding to represent the absolute position information of words in order to enable the model to use position information between words. This method, on the other hand, can reflect relative distance but does not provide directionality. As a result, a new model of English machine translation is proposed, which is based on the logarithmic position representation method and the self-attention mechanism. This model retains the distance and directional information between words, as well as the efficiency of the self-attention mechanism. Experiments show that the nonstrict phrase extraction method can effectively extract phrase translation pairs from the n-best word alignment results and that the extraction constraint strategy can improve translation quality even further. Nonstrict phrase extraction methods and n-best alignment results can significantly improve the quality of translation translations when compared to traditional phrase extraction methods based on single alignment.

Download Full-text

Tag-less Back-Translation

10.21203/rs.3.rs-465941/v1 ◽

2021 ◽

Author(s):

Idris Abdulmumin ◽

Bashir Shehu Galadanci ◽

Aliyu Garba

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Fine Tuning ◽

Huge Amount ◽

Neural Machine Translation ◽

Translation Model ◽

Parallel Data ◽

Back Translation ◽

Authentic Data ◽

Target Side

Abstract An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize the available huge amount of existing monolingual data because of the inability of translation models to differentiate between the authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In the approach - tag-less back-translation - the synthetic and authentic parallel data are treated as out-of-domain and in-domain data respectively and, through pre-training and fine-tuning, the translation model is shown to be able to learn more efficiently from them during training. Experimental results have shown that the approach outperforms the standard and tagged back-translation approaches on low resource English-Vietnamese and English-German neural machine translation.

Download Full-text