A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

Víctor M. Sánchez-Cartagena; Juan Antonio Pérez-Ortiz; Felipe Sánchez-Martínez

doi:10.1016/j.csl.2014.10.003

Neural Networks Classifier for Data Selection in Statistical Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0027 ◽

2017 ◽

Vol 108 (1) ◽

pp. 283-294 ◽

Cited By ~ 1

Author(s):

Álvaro Peris ◽

Mara Chinea-Ríos ◽

Francisco Casacuberta

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Data Selection ◽

Target Domain ◽

Translation Quality ◽

Bilingual Corpora ◽

Proper Estimation ◽

Adaptation Field

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.

Download Full-text

Automatic Filtering of Bilingual Corpora for Statistical Machine Translation

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/11428817_24 ◽

2005 ◽

pp. 263-274 ◽

Cited By ~ 10

Author(s):

Shahram Khadivi ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Bilingual Corpora ◽

Automatic Filtering

Download Full-text

Machine Translation Utilizing the Frequent-Item Set Concept

Sensors ◽

10.3390/s21041493 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1493

Author(s):

Hanan A. Hosni Mahmoud ◽

Hanan Abdullah Mengash

Keyword(s):

Data Structure ◽

Response Time ◽

Machine Translation ◽

Target Language ◽

Specific Language ◽

New Concepts ◽

Corpus Size ◽

Bilingual Corpora ◽

Ct System ◽

Arabic And English

In this paper, we introduce new concepts in the machine translation paradigm. We treat the corpus as a database of frequent word sets. A translation request triggers association rules joining phrases present in the source language, and phrases present in the target language. It has to be noted that a sequential scan of the corpus for such phrases will increase the response time in an unexpected manner. We introduce the pre-processing of the bilingual corpus through proposing a data structure called Corpus-Trie (CT) that renders a bilingual parallel corpus in a compact data structure representing frequent data items sets. We also present algorithms which utilize the CT to respond to translation requests and explore novel techniques in exhaustive experiments. Experiments were performed on specific language pairs, although the proposed method is not restricted to any specific language. Moreover, the proposed Corpus-Trie can be extended from bilingual corpora to accommodate multi-language corpora. Experiments indicated that the response time of a translation request is logarithmic to the count of unrepeated phrases in the original bilingual corpus (and thus, the Corpus-Trie size). In practical situations, 5–20% of the log of the number of the nodes have to be visited. The experimental results indicate that the BLEU score for the proposed CT system increases with the size of the number of phrases in the CT, for both English-Arabic and English-French translations. The proposed CT system was demonstrated to be better than both Omega-T and Apertium in quality of translation from a corpus size exceeding 1,600,000 phrases for English-Arabic translation, and 300,000 phrases for English-French translation.

Download Full-text

Building a Bilingual Corpus based on Hybrid Approach for Malayalam-English Machine Translation

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1095 ◽

2013 ◽

pp. 219-224

Author(s):

Rajesh. K. S ◽

Veena A Kumar ◽

CH. Dayakar Reddy

Keyword(s):

Machine Translation ◽

Hybrid Approach ◽

Word Alignment ◽

Translation Research ◽

Parallel Corpora ◽

Parallel Corpus ◽

Word Level ◽

Alignment System ◽

Bilingual Corpora ◽

Active Research

Word alignment in bilingual corpora has been a very active research topic in the Machine Translation research groups. In this research paper, we describe an alignment system that aligns English-Malayalam texts at word level in parallel sentences. The alignment of translated segments with source segments is very essential for building parallel corpora. Since word alignment research on Malayalam and English languages is still in its immaturity, it is not a trivial task for Malayalam-English text. A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be used in Malayalam-English machine translation. The proposed approach is a hybrid approach, a combination of corpus based and dictionary lookup approaches. The corpus based approach is based on the first three IBM models and Expectation Maximization (EM) algorithm. For the dictionary lookup approach, the proposed system uses the bilingual Malayalam-English Dictionary.

Download Full-text

Learning Lessons from Bilingual Corpora: Benefits for Machine Translation

International Journal of Corpus Linguistics ◽

10.1075/ijcl.5.2.06str ◽

2000 ◽

Vol 5 (2) ◽

pp. 199-230 ◽

Cited By ~ 1

Author(s):

Oliver Streiter ◽

Leonid L. Iomdin

Keyword(s):

Machine Translation ◽

Subject Domain ◽

Rule Based ◽

Parallel Corpora ◽

Specific Subject ◽

Multiword Expressions ◽

Bilingual Corpora ◽

Subject Domains

The research described in this paper is rooted in the endeavors to combine the advantages of corpus-based and rule-based MT approaches in order to improve the performance of MT systems—most importantly, the quality of translation. The authors review the ongoing activities in the field and present a case study, which shows how translation knowledge can be drawn from parallel corpora and compiled into the lexicon of a rule-based MT system. These data are obtained with the help of three procedures: (1) identification of hence unknown one-word translations, (2) statistical rating of the known one-word translations, and (3) extraction of new translations of multiword expressions (MWEs) followed by compilation steps which create new rules for the MT engine. As a result, the lexicon is enriched with translation equivalents attested for different subject domains, which facilitates the tuning of the MT system to a specific subject domain and improves the quality and adequacy of translation.

Download Full-text

Learning Curve with Machine Translation Based on Parallel, Bilingual Corpora

Studies in Big Data - Machine Intelligence and Big Data in Industry ◽

10.1007/978-3-319-30315-4_2 ◽

2016 ◽

pp. 11-21

Author(s):

Maciej Kowalski

Keyword(s):

Learning Curve ◽

Machine Translation ◽

Bilingual Corpora

Download Full-text

COMPREHENSIVE APPROACH FOR BILINGUAL MACHINE TRANSLATION

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2017.1413 ◽

2017 ◽

pp. 126-129

Author(s):

SHARANBASAPPA HONNASHETTY ◽

DR. M. HANUMANTHAPPA

Keyword(s):

Machine Translation ◽

Language Processing ◽

Translation System ◽

Major Focus ◽

Complex Sentences ◽

Novel Approach ◽

Machine Translation System ◽

Bilingual Corpora ◽

Simple Sentences ◽

Processing Group

Machine Translation has been a major focus of the NLP group since 1999, the principal focus of the Natural Language Processing group is to build a machine translation system that automatically learns translation mappings from bilingual corpora. This paper explores a novel approach for phrase based machine translation from English to Kannada and Kannada to English. The source text is analyzed then simple sentences are translated using the rules and the complex sentences are split into simple sentences later translation is performed.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

On (Not) Translating Lacan: Barbara Cassin's Sophistico-Analytical Performances

Paragraph ◽

10.3366/para.2020.0323 ◽

2020 ◽

Vol 43 (1) ◽

pp. 98-113

Author(s):

Michael Syrotinski

Keyword(s):

Machine Translation ◽

Reading And Writing ◽

The Relationship ◽

The Way

Barbara Cassin's Jacques the Sophist: Lacan, Logos, and Psychoanalysis, recently translated into English, constitutes an important rereading of Lacan, and a sustained commentary not only on his interpretation of Greek philosophers, notably the Sophists, but more broadly the relationship between psychoanalysis and sophistry. In her study, Cassin draws out the sophistic elements of Lacan's own language, or the way that Lacan ‘philosophistizes’, as she puts it. This article focuses on the relation between Cassin's text and her better-known Dictionary of Untranslatables, and aims to show how and why both ‘untranslatability’ and ‘performativity’ become keys to understanding what this book is not only saying, but also doing. It ends with a series of reflections on machine translation, and how the intersubjective dynamic as theorized by Lacan might open up the possibility of what is here termed a ‘translatorly’ mode of reading and writing.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text