language pair
Recently Published Documents


TOTAL DOCUMENTS

199
(FIVE YEARS 104)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Iqra Muneer ◽  
Rao Muhammad Adeel Nawab

Cross-Lingual Text Reuse Detection (CLTRD) has recently attracted the attention of the research community due to a large amount of digital text readily available for reuse in multiple languages through online digital repositories. In addition, efficient machine translation systems are freely and readily available to translate text from one language into another, which makes it quite easy to reuse text across languages, and consequently difficult to detect it. In the literature, the most prominent and widely used approach for CLTRD is Translation plus Monolingual Analysis (T+MA). To detect CLTR for English-Urdu language pair, T+MA has been used with lexical approaches, namely, N-gram Overlap, Longest Common Subsequence, and Greedy String Tiling. This clearly shows that T+MA has not been thoroughly explored for the English-Urdu language pair. To fulfill this gap, this study presents an in-depth and detailed comparison of 26 approaches that are based on T+MA. These approaches include semantic similarity approaches (semantic tagger based approaches, WordNet-based approaches), probabilistic approach (Kullback-Leibler distance approach), monolingual word embedding-based approaches siamese recurrent architecture, and monolingual sentence transformer-based approaches for English-Urdu language pair. The evaluation was carried out using the CLEU benchmark corpus, both for the binary and the ternary classification tasks. Our extensive experimentation shows that our proposed approach that is a combination of 26 approaches obtained an F 1 score of 0.77 and 0.61 for the binary and ternary classification tasks, respectively, and outperformed the previously reported approaches [ 41 ] ( F 1 = 0.73) for the binary and ( F 1 = 0.55) for the ternary classification tasks) on the CLEU corpus.


Author(s):  
Ghazeefa Fatima ◽  
Rao Muhammad Adeel Nawab ◽  
Muhammad Salman Khan ◽  
Ali Saeed

Semantic word similarity is a quantitative measure of how much two words are contextually similar. Evaluation of semantic word similarity models requires a benchmark corpus. However, despite the millions of speakers and the large digital text of the Urdu language on the Internet, there is a lack of benchmark corpus for the Cross-lingual Semantic Word Similarity task for the Urdu language. This article reports our efforts in developing such a corpus. The newly developed corpus is based on the SemEval-2017 task 2 English dataset, and it contains 1,945 cross-lingual English–Urdu word pairs. For each of these pairs of words, semantic similarity scores were assigned by 11 native Urdu speakers. In addition to corpus generation, this article also reports the evaluation results of a baseline approach, namely “Translation Plus Monolingual Analysis” for automated identification of semantic similarity between English–Urdu word pairs. The results showed that the path length similarity measure performs better for the Google and Bing translated words. The newly created corpus and evaluation results are freely available online for further research and development.


Author(s):  
Mehmet Şahin ◽  
Sabri Gürses

This article investigates perceptions of technology-mediated translations of literary texts by two groups: translation students and professional literary translators. The participants post-edited an excerpt from a classic Dickens novel into Turkish using a machine translation (MT) system of their choice. The analysis of the post-edited texts, participants’ answers to survey questions, and interviews with professional translators suggest that MT is currently a long way from being an essential part of any literary translation practice for the English–Turkish language pair. Translators’ interactions with MT and negative attitudes toward it may change in a positive direction as MT improves and translation practice evolves.


Author(s):  
Gert Vercauteren ◽  
Nina Reviers ◽  
Kim Steyaert

The field of translation is undergoing various profound changes. On the one hand it is being thoroughly reshaped by the advent and constant improvement of new technologies. On the other hand, new forms of translation are starting to see the light of day in the wake of social and legal developments that require that products and content that are created, are accessible for everybody. One of these new forms of translation, is audio description (AD), a service that is aimed at making audiovisual content accessible to people with sight loss. New legislation requires that this content is accessible by 2025, which constitutes a tremendous task given the limited number of people that are at present trained as audio describers. A possible solution would be to use machine translation to translate existing audio descriptions into different languages. Since AD is characterized by short sentences and simple, concrete language, it could be a good candidate for machine translation. In the present study, we want to test this hypothesis for the English-Dutch language pair. Three 30 minute AD excerpts of different Dutch movies that were originally audio described in English, were translated into Dutch using DeepL. The translations were analysed using the harmonized DQF-MQM error typology and taking into account the specific multimodal nature of the source text and the intersemiotic dimension of the original audio description process. The analysis showed that the MT output had a relatively high error rate, particularly in the categories of Accuracy – mistranslation and Fluency – grammar. This seems to indicate that extensive post-editing will be needed, before the text can be used in a professional context.


Author(s):  
Łukasz Grabowski ◽  
Nicholas Groom

Abstract This study uses both parallel and comparable reference corpora in the English-Polish language pair to explore how translators deal with recurrent multi-word items performing specific discoursal functions. We also consider whether the observed tendencies overlap with those found in native texts, and the extent to which the discoursal functions realised by the multi-word items under scrutiny are “preserved” in translation. Capitalizing on findings from earlier research (Granger, 2014; Grabar & Lefer, 2015), we analyzed a pre-selected set of phrases signaling stance-taking and those functioning as textual, discourse-structuring devices originally found in the European Parliament proceedings corpus (Koehn, 2005) and included in the English-Polish parallel corpus Paralela (Pęzik, 2016). Since our goal was to explore whether and to what extent English functionally-defined phrases reflect the same level of formulaicity and regularity in both Polish translations and native Polish texts, the findings provided insights into the translation tendencies of such items, and revealed – using inter-rater agreement metrics – that the discoursal functions of recurrent n-grams may change in translation.


2021 ◽  
pp. 131-140
Author(s):  
Oksana Molchko

Culturally specifi c images and symbols are ethnic semantics carriers. They show historical, national and cultural experience of the nation. The translation studies analysis of similes with a fl ora name, verbalised in the Ukrainian-English language pair, enables tracing the peculiarities of culturally specifi c images and concepts. The article investigates, analyses and gives detailed characteristic of the notion of the culturally specifi c sense as an element of the actual sense of simile, peculiarities of universal and nationally specifi c attributes being the result of the national conceptualisation of a corresponding fl ora object (leaf, tree) in the consciousness of Ukrainian and English speakers. Translation studies analysis is applied aiming at revealing the ways of rendering the culturally specifi c sense in similes with a fl ora name (leaf, tree). Ways of translating simile with the utmost load of cultural information rendering are discussed. Key words: simile, fl ora name, translation, culturally specifi c sense, ways of translation


2021 ◽  
Vol 111 (6) ◽  
pp. 45-64
Author(s):  
Anne-Kathrin Gärtig-Bressan

The article considers contrastive linguistics as a discipline that interacts closely with its intralinguistic and applied neighbouring disciplines. Within this framework, the online ontology IMAGACT presents an instrument that allows to contrast how languages lexicalize concrete actions (movements, modification of objects, setting relations among objects, etc.) in their verbs. German and Italian, the language pair considered here, differ typologically in their lexicalization strategies, which leads to difficulties in L2 acquisition, translation and lexicography. The article shows how the corpus-based IMAGACT database, which presents a set of 1010 actions in short films and links them to the appropriate verbs in 15 languages, provides help in these fields, and how it can at the same time empirically support contrastive-typological findings


2021 ◽  
Vol 111 (6) ◽  
pp. 3-10
Author(s):  
Peggy Katelhön ◽  
Marina Brambilla ◽  
Albana Muco

This thematic issue of Linguistik online is dedicated to Contrastive linguistics for the language pair Italian-German. The contributions collected here deal with Italian-German language comparison from different points of view. The common feature of all of them is a corpus-oriented approach. Using authentic attestations from different linguistic sources, the linguistic structures of both languages are analysed and compared with each other. The granular and fine-grained comparison enabled the authors to work out interesting results not only in the fields of morphology and syntax, but also for pragmatics, and text and discourse linguistics for both languages, which can be profitably used in foreign language didactics, theoretical linguistics and translation studies.


2021 ◽  
Vol 111 (6) ◽  
pp. 105-136
Author(s):  
Gudrun Bukies

The topic of this article is ‘weight’ in the German-Italian language comparison. Which linguistic means are used to refer to weight in German (Gewicht) and what are the Italian equivalents? The material which has been collected is based on monolingual German and Italian dictionaries, reference works and text corpora as well as on bilingual German-Italian dictionaries and text excerpts. The classification of the so-called weight designations including derivatives, composites and word combinations is carried out from an etymological and lexical perspective. In addition to the dictionary entries, German-Italian translation examples show further equivalents of terms and expressions with regards to ‘weight’ in this language pair.


Sign in / Sign up

Export Citation Format

Share Document