Training Human Translators as Opposed to Programming Machine Translation Systems: A Performative Model

Abstract This paper combines the communicative model of translation with performative linguistics to arrive at a translation model that is meant to proactively shape the attitudes of future translators. Central to this model is the claim that the translator, like any communicator, has a communicative intent that gets expressed in the target text. This is contrasted with machine translation, which is concerned with finding equivalents to translation units without actually having anything to say in the target language. The paper concludes by indicating a way of building a translation evaluation system on the proposed model.

Download Full-text

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics ◽

10.1162/0891201042544884 ◽

2004 ◽

Vol 30 (4) ◽

pp. 417-449 ◽

Cited By ~ 212

Author(s):

Franz Josef Och ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Word Order ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Linear Modeling ◽

Translation Model ◽

Log Linear ◽

Translation Systems ◽

English Canadian

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

Download Full-text

IntroVNMT: An Introspective Model for Variational Neural Machine Translation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6411 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8830-8837

Author(s):

Xin Sheng ◽

Linli Xu ◽

Junliang Guo ◽

Jingchang Liu ◽

Ruoyu Zhao ◽

...

Keyword(s):

Machine Translation ◽

Latent Variables ◽

Image Synthesis ◽

Target Language ◽

Generative Adversarial Network ◽

Neural Machine Translation ◽

Adversarial Network ◽

Proposed Model ◽

Model Training ◽

High Level

We propose a novel introspective model for variational neural machine translation (IntroVNMT) in this paper, inspired by the recent successful application of introspective variational autoencoder (IntroVAE) in high quality image synthesis. Different from the vanilla variational NMT model, IntroVNMT is capable of improving itself introspectively by evaluating the quality of the generated target sentences according to the high-level latent variables of the real and generated target sentences. As a consequence of introspective training, the proposed model is able to discriminate between the generated and real sentences of the target language via the latent variables generated by the encoder of the model. In this way, IntroVNMT is able to generate more realistic target sentences in practice. In the meantime, IntroVNMT inherits the advantages of the variational autoencoders (VAEs), and the model training process is more stable than the generative adversarial network (GAN) based models. Experimental results on different translation tasks demonstrate that the proposed model can achieve significant improvements over the vanilla variational NMT model.

Download Full-text

Modeling Past and Future for Neural Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00011 ◽

2018 ◽

Vol 6 ◽

pp. 145-157 ◽

Cited By ~ 2

Author(s):

Zaixiang Zheng ◽

Hao Zhou ◽

Shujian Huang ◽

Lili Mou ◽

Xinyu Dai ◽

...

Keyword(s):

Machine Translation ◽

Alignment Error ◽

Source Information ◽

Neural Machine Translation ◽

Attention Model ◽

Translation Quality ◽

The Past ◽

Proposed Model ◽

Coverage Model ◽

Translation Systems

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which provides Neural Machine Translation (NMT) systems with the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves the performance in Chinese-English, German-English, and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in terms of both the translation quality and the alignment error rate.

Download Full-text

Machine Translation System Using Deep Learning for English to Urdu

Computational Intelligence and Neuroscience ◽

10.1155/2022/7873012 ◽

2022 ◽

Vol 2022 ◽

pp. 1-11

Author(s):

Syed Abdul Basit Andrabi ◽

Abdul Wahid

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Communication Technology ◽

Target Language ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Proposed Model ◽

Learning Technique ◽

Machine Translation System

Machine translation is an ongoing field of research from the last decades. The main aim of machine translation is to remove the language barrier. Earlier research in this field started with the direct word-to-word replacement of source language by the target language. Later on, with the advancement in computer and communication technology, there was a paradigm shift to data-driven models like statistical and neural machine translation approaches. In this paper, we have used a neural network-based deep learning technique for English to Urdu languages. Parallel corpus sizes of around 30923 sentences are used. The corpus contains sentences from English-Urdu parallel corpus, news, and sentences which are frequently used in day-to-day life. The corpus contains 542810 English tokens and 540924 Urdu tokens, and the proposed system is trained and tested using 70 : 30 criteria. In order to evaluate the efficiency of the proposed system, several automatic evaluation metrics are used, and the model output is also compared with the output from Google Translator. The proposed model has an average BLEU score of 45.83.

Download Full-text

Source-side Reordering to Improve Machine Translation between Languages with Distinct Word Orders

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3448252 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-18

Author(s):

Karunesh Kumar Arora ◽

Shyam Sunder Agrawal

Keyword(s):

Machine Translation ◽

Language Model ◽

Target Language ◽

Long Distance ◽

Parts Of Speech ◽

Source Sentence ◽

The Subject ◽

Statistical Systems ◽

Translation Systems ◽

The Impact

English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.

Download Full-text

A classification approach for detecting cross-lingual biomedical term translations

Natural Language Engineering ◽

10.1017/s1351324915000431 ◽

2015 ◽

Vol 23 (1) ◽

pp. 31-51 ◽

Cited By ~ 3

Author(s):

H. HAKAMI ◽

D. BOLLEGALA

Keyword(s):

Machine Translation ◽

Feature Space ◽

Target Language ◽

Average Precision ◽

Common Features ◽

Target Languages ◽

Cross Lingual ◽

Translation Accuracy ◽

Translation Systems ◽

Technical Terms

AbstractFinding translations for technical terms is an important problem in machine translation. In particular, in highly specialized domains such as biology or medicine, it is difficult to find bilingual experts to annotate sufficient cross-lingual texts in order to train machine translation systems. Moreover, new terms are constantly being generated in the biomedical community, which makes it difficult to keep the translation dictionaries up to date for all language pairs of interest. Given a biomedical term in one language (source language), we propose a method for detecting its translations in a different language (target language). Specifically, we train a binary classifier to determine whether two biomedical terms written in two languages are translations. Training such a classifier is often complicated due to the lack of common features between the source and target languages. We propose several feature space concatenation methods to successfully overcome this problem. Moreover, we study the effectiveness of contextual and character n-gram features for detecting term translations. Experiments conducted using a standard dataset for biomedical term translation show that the proposed method outperforms several competitive baseline methods in terms of mean average precision and top-k translation accuracy.

Download Full-text

Sublemma-Based Neural Machine Translation

Complexity ◽

10.1155/2021/5935958 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Thien Nguyen ◽

Huu Nguyen ◽

Phuoc Tran

Keyword(s):

Machine Translation ◽

Quality Data ◽

Human Judgment ◽

Linguistic Features ◽

Neural Machine Translation ◽

Low Resource ◽

Part Of Speech ◽

Proposed Model ◽

Translation Systems ◽

Ted Talks

Powerful deep learning approach frees us from feature engineering in many artificial intelligence tasks. The approach is able to extract efficient representations from the input data, if the data are large enough. Unfortunately, it is not always possible to collect large and quality data. For tasks in low-resource contexts, such as the Russian ⟶ Vietnamese machine translation, insights into the data can compensate for their humble size. In this study of modelling Russian ⟶ Vietnamese translation, we leverage the input Russian words by decomposing them into not only features but also subfeatures. First, we break down a Russian word into a set of linguistic features: part-of-speech, morphology, dependency labels, and lemma. Second, the lemma feature is further divided into subfeatures labelled with tags corresponding to their positions in the lemma. Being consistent with the source side, Vietnamese target sentences are represented as sequences of subtokens. Sublemma-based neural machine translation proves itself in our experiments on Russian-Vietnamese bilingual data collected from TED talks. Experiment results reveal that the proposed model outperforms the best available Russian ⟶ Vietnamese model by 0.97 BLEU. In addition, automatic machine judgment on the experiment results is verified by human judgment. The proposed sublemma-based model provides an alternative to existing models when we build translation systems from an inflectionally rich language, such as Russian, Czech, or Bulgarian, in low-resource contexts.

Download Full-text

Design and Testing of Automatic Machine Translation System Based on Chinese-English Phrase Translation

Mobile Information Systems ◽

10.1155/2021/3539155 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Jing Ning ◽

Haidong Ban

Keyword(s):

Machine Translation ◽

Language Processing ◽

Evaluation System ◽

Large Scale ◽

Automatic Machine ◽

Translation System ◽

Automatic Translation ◽

Translation Methods ◽

Translation Systems ◽

Design And Testing

With the development of linguistics and the improvement of computer performance, the effect of machine translation is getting better and better, and it is widely used. The automatic expression translation method based on the Chinese-English machine takes short sentences as the basic translation unit and makes full use of the order of short sentences. Compared with word-based statistical machine translation methods, the effect is greatly improved. The performance of machine translation is constantly improving. This article aims to study the design of phrase-based automatic machine translation systems by introducing machine translation methods and Chinese-English phrase translation, explore the design and testing of machine automatic translation systems based on the combination of Chinese-English phrase translation, and explain the role of machine automatic translation in promoting the development of translation. In this article, through the combination of machine translation experiments and machine automatic translation system design methods, the design and testing of machine automatic translation systems based on Chinese-English phrase translation combinations are studied to cultivate people's understanding of language, knowledge, and intelligence and then help solve other problems. Language processing issues promote the development of corpus linguistics. The experimental results in this article show that when the Chinese-English phrase translation probability table is changed from 82% to 51%, the BLEU translation evaluation system for the combination of Chinese-English phrases is improved. Automatic machine translation saves time and energy of translation work, which shows that machine translation shows its advantages due to its short development cycle and easy processing of large-scale corpora.

Download Full-text

Design of English Automatic Translation System Based on Machine Intelligent Translation and Secure Internet of Things

Mobile Information Systems ◽

10.1155/2021/8670739 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Haidong Ban ◽

Jing Ning

Keyword(s):

Neural Network ◽

Machine Translation ◽

Rapid Development ◽

Model Performance ◽

Internet Technology ◽

Target Language ◽

Translation System ◽

Improvement Rate ◽

Translation Model ◽

Automatic Translation

With the rapid development of Internet technology and the development of economic globalization, international exchanges in various fields have become increasingly active, and the need for communication between languages has become increasingly clear. As an effective tool, automatic translation can perform equivalent translation between different languages while preserving the original semantics. This is very important in practice. This paper focuses on the Chinese-English machine translation model based on deep neural networks. In this paper, we use the end-to-end encoder and decoder framework to create a neural machine translation model, the machine automatically learns its function, and the data is converted into word vectors in a distributed method and can be directly through the neural network perform the mapping between the source language and the target language. Research experiments show that, by adding part of the voice information to verify the effectiveness of the model performance improvement, the performance of the translation model can be improved. With the superimposition of the number of network layers from two to four, the improvement ratios of each model are 5.90%, 6.1%, 6.0%, and 7.0%, respectively. Among them, the model with an independent recurrent neural network as the network structure has the largest improvement rate and a higher improvement rate, so the system has high availability.

Download Full-text

Research on Uyghur-Chinese Neural Machine Translation Based on the Transformer at Multistrategy Segmentation Granularity

Mobile Information Systems ◽

10.1155/2021/5744248 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Zhiwang Xu ◽

Huibin Qin ◽

Yongzhu Hua

Keyword(s):

Neural Networks ◽

Machine Translation ◽

Semantic Features ◽

Training Method ◽

Chinese Translation ◽

Neural Machine Translation ◽

Translation Model ◽

Parallel Corpus ◽

Model Based ◽

Translation Systems

In recent years, machine translation based on neural networks has become the mainstream method in the field of machine translation, but there are still challenges of insufficient parallel corpus and sparse data in the field of low resource translation. Existing machine translation models are usually trained on word-granularity segmentation datasets. However, different segmentation granularities contain different grammatical and semantic features and information. Only considering word granularity will restrict the efficient training of neural machine translation systems. Aiming at the problem of data sparseness caused by the lack of Uyghur-Chinese parallel corpus and complex Uyghur morphology, this paper proposes a multistrategy segmentation granular training method for syllables, marked syllable, words, and syllable word fusion and targets traditional recurrent neural networks and convolutional neural networks; the disadvantage of the network is to build a Transformer Uyghur-Chinese Neural Machine Translation model based entirely on the multihead self-attention mechanism. In CCMT2019, dimension results on Uyghur-Chinese bilingual datasets show that the effect of multiple translation granularity training method is significantly better than the rest of granularity segmentation translation systems, while the Transformer model can obtain higher BLEU value than Uyghur-Chinese translation model based on Self-Attention-RNN.

Download Full-text