How Do Speakers Pause and Hesitate in English and Japanese? - A Comparison Using Parallel Corpora of English and Japanese Presentation Speeches -

Author(s):  
Michiko Watanabe ◽  
Yuma Shirahata ◽  
Ralph Rose ◽  
Kikuo Maekawa
Keyword(s):  
Informatica ◽  
2018 ◽  
Vol 29 (4) ◽  
pp. 693-710
Author(s):  
Algirdas Laukaitis ◽  
Darius Plikynas ◽  
Egidijus Ostasius

2016 ◽  
Vol 36 (1) ◽  
pp. 147
Author(s):  
Beatriz Sánchez Cárdenas ◽  
Pamela Faber

http://dx.doi.org/10.5007/2175-7968.2016v36nesp1p147Research in terminology has traditionally focused on nouns. Considerably less attention has been paid to other grammatical categories such as adverbs. However, these words can also be problematic for the novice translator, who tends to use the translation correspondences in bilingual dictionaries without realizing that formal equivalence is not necessarily the same as textual equivalence. However, semantic values, acquired in context, go far beyond dictionary meaning and are related to phenomena such as semantic prosody and preferences of lexical selection that can vary, depending on text type and specialized domain.This research explored the reasons why certain adverbial discourse connectors, apparently easy to translate, are a source of translation problems that cannot be easily resolved with a bilingual dictionary. Moreover, this study analyzed the use of parallel corpora in the translation classroom and how it can increase the quality of text production. For this purpose, we compared student translations before and after receiving training on the use of corpus analysis tools


2020 ◽  
Vol 10 (11) ◽  
pp. 3904
Author(s):  
Van-Hai Vu ◽  
Quang-Phuoc Nguyen ◽  
Joon-Choul Shin ◽  
Cheol-Young Ock

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.


2009 ◽  
Vol 4 (1) ◽  
pp. 1-30 ◽  
Author(s):  
Anna Romagnuolo

Political discourse has been the subject of increasing interest in recent decades with the development of ideological and rhetorical criticism focusing on US presidential speeches, especially after the events of 9/11. Indeed, extensive research literature already exists in the field of American presidential rhetoric. The same cannot be said for studies of political texts available in translation. Currently, translation studies seems to be more concerned with the politics and the politicization of translation than with the translation of political texts, which have been examined more from a synchronic perspective than a diachronic one. Using a diachronic parallel corpora of Italian translations (published in books and newspapers) of a specific genre of US presidential speech, the inaugural address, this study highlights recurring translation strategies as well as problems, related to culture-bound and value-laden political terms, style, and phraseology. This research also seeks to contribute to the definition of political language as a language for specific purposes.


Author(s):  
Bojana Mikelenić ◽  
Antoni Oliver

Resumen El presente estudio se basa en la teoría sobre la relación del complemento directo (CD) y de régimen (CR) en español –la posibilidad de su coocurrencia en el mismo predicado (Alarcos, 1966; Bosque, 1983; Rojo, 1983) y los predicados en los que estos dos complementos pueden alternar– para analizar las diferencias en la traducción de estos verbos y sus argumentos al croata. Mediante búsquedas en un corpus paralelo español-croata y la herramienta ReSiPC (Regular Expression Search in Parallel Corpora): (Antoni Oliver y Bojana Mikelenić, 2020) desarrollados ambos para esta investigación, se muestran ejemplos de diferentes soluciones al traducir dichas estructuras –verbos diferentes, el mismo verbo con el régimen diferente, cambio del verbo dependiendo del significado del complemento–, al igual que ciertos paralelismos entre los esquemas de los dos idiomas. Con esta aproximación se pretende contribuir en el desarrollo de la metodología y herramientas que facilitarán otros trabajos de esta índole, subrayando también su valor descriptivo, dado el escaso número de trabajos que comparan estos dos idiomas.


2016 ◽  
Vol 22 (4) ◽  
pp. 517-548 ◽  
Author(s):  
ANN IRVINE ◽  
CHRIS CALLISON-BURCH

AbstractWe use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.


Sign in / Sign up

Export Citation Format

Share Document