word alignment
Recently Published Documents


TOTAL DOCUMENTS

300
(FIVE YEARS 54)

H-INDEX

16
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Le Hoai Bao ◽  
Trinh Vu Minh Hung ◽  
Hoang Khuê ◽  
Le Thanh Tung

Author(s):  
Lieve Macken ◽  
Els Lefever

In this paper, we will describe the current state-of-the-art of Statistical Machine Translation (SMT), and reflect on how SMT handles meaning. Statistical Machine Translation is a corpus-based approach to MT: it de-rives the required knowledge to generate new translations from corpora. General-purpose SMT systems do not use any formal semantic representa-tion. Instead, they directly extract translationally equivalent words or word sequences – expressions with the same meaning – from bilingual parallel corpora. All statistical translation models are based on the idea of word alignment, i.e., the automatic linking of corresponding words in parallel texts. The first generation SMT systems were word-based. From a linguistic point of view, the major problem with word-based systems is that the mean-ing of a word is often ambiguous, and is determined by its context. Current state-of-the-art SMT-systems try to capture the local contextual dependen-cies by using phrases instead of words as units of translation. In order to solve more complex ambiguity problems (where a broader text scope or even domain information is needed), a Word Sense Disambiguation (WSD) module is integrated in the Machine Translation environment.


2021 ◽  
Vol 7 (5) ◽  
pp. 2000-2011
Author(s):  
Weijie Liu

Objectives: At present, with the rapid development and application of the Internet, the cross-border transaction of e-commerce presents a blowout development, and the demand for language is increasing. Methods: In this paper, starting from the perspective of machine intelligent translation of English and Chinese, and in view of the problem of traditional contrastive translation of machine, the algorithm of strengthening neural network was used to solve the problem of translation. In the study, the process of intelligent translation was divided into two stages: encoding and decoding. In view of the language type and word alignment, the input and output modules were formed and the algorithm was optimized, and a recurrent neural network algorithm was used to build an RNN-embed intelligent translation model of English and Chinese. Results: The model was input through the character level in English and Chinese, and then the network was trained, so as to solve the problem that it is difficult to deal with the advanced semantics in the process of strengthening the neural network calculation of text information in the cross-border transaction of e-commerce. Conclusion: It is proved by experiments that the RNN-embed translation model based on the enhanced neural network algorithm can improve the quality of the long sentence translation compared with the machine translation.


Author(s):  
Luigi Procopio ◽  
Edoardo Barba ◽  
Federico Martelli ◽  
Roberto Navigli

Word Sense Disambiguation (WSD), i.e., the task of assigning senses to words in context, has seen a surge of interest with the advent of neural models and a considerable increase in performance up to 80% F1 in English. However, when considering other languages, the availability of training data is limited, which hampers scaling WSD to many languages. To address this issue, we put forward MultiMirror, a sense projection approach for multilingual WSD based on a novel neural discriminative model for word alignment: given as input a pair of parallel sentences, our model -- trained with a low number of instances -- is capable of jointly aligning, at the same time, all source and target tokens with each other, surpassing its competitors across several language combinations. We demonstrate that projecting senses from English by leveraging the alignments produced by our model leads a simple mBERT-powered classifier to achieve a new state of the art on established WSD datasets in French, German, Italian, Spanish and Japanese. We release our software and all our datasets at https://github.com/SapienzaNLP/multimirror.


2021 ◽  
Vol 11 (4) ◽  
pp. 1868
Author(s):  
Sari Dewi Budiwati ◽  
Al Hafiz Akbar Maulana Siagian ◽  
Tirana Noor Fatyanosa ◽  
Masayoshi Aritsugi

Phrase table combination in pivot approaches can be an effective method to deal with low-resource language pairs. The common practice to generate phrase tables in pivot approaches is to use standard symmetrization, i.e., grow-diag-final-and. Although some researchers found that the use of non-standard symmetrization could improve bilingual evaluation understudy (BLEU) scores, the use of non-standard symmetrization has not been commonly employed in pivot approaches. In this study, we propose a strategy that uses the non-standard symmetrization of word alignment in phrase table combination. The appropriate symmetrization is selected based on the highest BLEU scores in each direct translation of source–target, source–pivot, and pivot–target of Kazakh–English (Kk–En) and Japanese–Indonesian (Ja–Id). Our experiments show that our proposed strategy outperforms the direct translation in Kk–En with absolute improvements of 0.35 (a 11.3% relative improvement) and 0.22 (a 6.4% relative improvement) BLEU points for 3-gram and 5-gram, respectively. The proposed strategy shows an absolute gain of up to 0.11 (a 0.9% relative improvement) BLEU points compared to direct translation for 3-gram in Ja–Id. Our proposed strategy using a small phrase table obtains better BLEU scores than a strategy using a large phrase table. The size of the target monolingual and feature function weight of the language model (LM) could reduce perplexity scores.


Author(s):  
Peng Yin ◽  
Zhou Shu ◽  
Yingjun Xia ◽  
Tianmei Shen ◽  
Xiao Guan ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document