source sentence
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 20)

H-INDEX

4
(FIVE YEARS 2)

Author(s):  
Jaehun Shin ◽  
Wonkee Lee ◽  
Byung-Hyun Go ◽  
Baikjin Jung ◽  
Youngkil Kim ◽  
...  

Automatic post-editing (APE) is the study of correcting translation errors in the output of an unknown machine translation (MT) system and has been considered as a method of improving translation quality without any modification to conventional MT systems. Recently, several variants of Transformer that take both the MT output and its corresponding source sentence as inputs have been proposed for APE; and models introducing an additional attention layer into the encoder to jointly encode the MT output with its source sentence recorded a high-rank in the WMT19 APE shared task. We examine the effectiveness of such joint-encoding strategy in a controlled environment and compare four types of decoder multi-source attention strategies that have been introduced into previous APE models. The experimental results indicate that the joint-encoding strategy is effective and that taking the final encoded representation of the source sentence is the more proper strategy than taking such representation within the same encoder stack. Furthermore, among the multi-source attention strategies combined with the joint-encoding, the strategy that applies attention to the concatenated input representation and the strategy that adds up the individual attention to each input improve the quality of APE results over the strategy using the joint-encoding only.


Author(s):  
Carles Tebé ◽  
María Teresa Cabré

Computer-aided translation systems (CAT) based on Translation Memories (TM) are a widely diffused technology that uses database and code-protection features to improve the quality, efficiency and consistency of the human translation process. These systems basically consist of a textual database in which each source sentence of a translation is stored together with the target sentence (this is called a translation memory “unit”). New and changed translation proposals will then be stored in the database for future use. This textual database – the kernel of the system – is combined with a terminological database (TDB), which is used by translators to store independently, terminological equivalences or translation units of particular value.In this paper the authors outline a first draft of a methodology that describes the preparation of a bilingual terminology from – and within – TM applications. The bilingual corpus produced is called the ‘terminological memory’ of the translator.


Author(s):  
May Kyi Nyein ◽  
Khin Mar Soe

Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target English’s word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus.


Author(s):  
Karunesh Kumar Arora ◽  
Shyam Sunder Agrawal

English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.


Author(s):  
Jānis Veckrācis

The translation of legal documents – not a new field in translation practice or theoretical discourse – gained a new dimension for translators’ work in Latvia when, after restoring independence, the country was reintegrated into international processes and organizations. Consequently, the development of legal text translation competence has also become an important task in the study programs related to translation of LSP texts. Against this background, the paper addresses some of the issues of understanding and interpreting legislation in the translation situation, with a particular focus on working with the functions and implications of sentence syntax. This part of the work provides the translator with the opportunity to find not only successful grammatical solutions in the target language sentences, but above all, a prerequisite for understanding the meaning of the source text. For the purposes of the study, the relevant aspects are briefly outlined in a theoretical context by focusing on the specific features of legal texts and the competence-related requirements for translators; it also includes an analysis of examples based on both published translations of legislation and the typical problems encountered in student translations. The study leads to several conclusions. Accuracy (also with regard to interpretation), an element of the general concepts of equivalence/adequacy, stands out as a specific aspect and criterion of legal text translation quality; it is necessary to ensure that the meaning of terms is not broadened or narrowed and that the applicability or explicit/implicit attitude is not altered – translations of a number of units and elements tend to be almost literal. The practice of translating legal texts generally requires that target texts be rendered as consistently as possible, which to a large extent implies an almost literal relationship with the source text; any changes need explicit justification. A specific aspect of translators’ competence is the examination undertaken during the pre-translation phase to determine the applicability of the relevant legal provisions and select the most appropriate sources of information. An important prerequisite for a quality translation is understanding the essence of the source sentence.


Informatics ◽  
2020 ◽  
Vol 7 (3) ◽  
pp. 32
Author(s):  
Rebecca Webster ◽  
Margot Fonteyne ◽  
Arda Tezcan ◽  
Lieve Macken ◽  
Joke Daems

Due to the growing success of neural machine translation (NMT), many have started to question its applicability within the field of literary translation. In order to grasp the possibilities of NMT, we studied the output of the neural machine system of Google Translate (GNMT) and DeepL when applied to four classic novels translated from English into Dutch. The quality of the NMT systems is discussed by focusing on manual annotations, and we also employed various metrics in order to get an insight into lexical richness, local cohesion, syntactic, and stylistic difference. Firstly, we discovered that a large proportion of the translated sentences contained errors. We also observed a lower level of lexical richness and local cohesion in the NMTs compared to the human translations. In addition, NMTs are more likely to follow the syntactic structure of a source sentence, whereas human translations can differ. Lastly, the human translations deviate from the machine translations in style.


Author(s):  
Guanhua Chen ◽  
Yun Chen ◽  
Yong Wang ◽  
Victor O.K. Li

Leveraging lexical constraint is extremely significant in domain-specific machine translation and interactive machine translation. Previous studies mainly focus on extending beam search algorithm or augmenting the training corpus by replacing source phrases with the corresponding target translation. These methods either suffer from the heavy computation cost during inference or depend on the quality of the bilingual dictionary pre-specified by user or constructed with statistical machine translation. In response to these problems, we present a conceptually simple and empirically effective data augmentation approach in lexical constrained neural machine translation. Specifically, we make constraint-aware training data by first randomly sampling the phrases of the reference as constraints, and then packing them together into the source sentence with a separation symbol. Extensive experiments on several language pairs demonstrate that our approach achieves superior translation results over the existing systems, improving translation of constrained sentences without hurting the unconstrained ones.


2020 ◽  
Vol 34 (05) ◽  
pp. 7855-7862 ◽  
Author(s):  
Yinuo Guo ◽  
Tao Ge ◽  
Furu Wei

Sentence Split and Rephrase aims to break down a complex sentence into several simple sentences with its meaning preserved. Previous studies tend to address the issue by seq2seq learning from parallel sentence pairs, which takes a complex sentence as input and sequentially generates a series of simple sentences. However, the conventional seq2seq learning has two limitations for this task: (1) it does not take into account the facts stated in the long sentence; As a result, the generated simple sentences may miss or inaccurately state the facts in the original sentence. (2) The order variance of the simple sentences to be generated may confuse the seq2seq model during training because the simple sentences derived from the long source sentence could be in any order.To overcome the challenges, we first propose the Fact-aware Sentence Encoding, which enables the model to learn facts from the long sentence and thus improves the precision of sentence split; then we introduce Permutation Invariant Training to alleviate the effects of order variance in seq2seq learning for this task. Experiments on the WebSplit-v1.0 benchmark dataset show that our approaches can largely improve the performance over the previous seq2seq learning approaches. Moreover, an extrinsic evaluation on oie-benchmark verifies the effectiveness of our approaches by an observation that splitting long sentences with our state-of-the-art model as preprocessing is helpful for improving OpenIE performance.


2020 ◽  
Vol 34 (05) ◽  
pp. 9386-9393
Author(s):  
Jian Yang ◽  
Shuming Ma ◽  
Dongdong Zhang ◽  
ShuangZhi Wu ◽  
Zhoujun Li ◽  
...  

Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual pre-training adopt Translation Language Model to predict masked words with the concatenation of the source sentence and its target equivalent. In this work, we introduce a novel cross-lingual pre-training method, called Alternating Language Modeling (ALM). It code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases. More specifically, we randomly substitute source phrases with target translations to create code-switched sentences. Then, we use these code-switched data to train ALM model to learn to predict words of different languages. We evaluate our pre-training ALM on the downstream tasks of machine translation and cross-lingual classification. Experiments show that ALM can outperform the previous pre-training methods on three benchmarks.1


2020 ◽  
Vol 34 (05) ◽  
pp. 8311-8318
Author(s):  
Zuchao Li ◽  
Rui Wang ◽  
Kehai Chen ◽  
Masao Utiyama ◽  
Eiichiro Sumita ◽  
...  

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, target-side fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT English-to-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.


Sign in / Sign up

Export Citation Format

Share Document