Semi-Automatic Word-Aligned Tool for Thai-Vietnamese Parallel Corpus Construction

Author(s):  
Dang Ngoc Chuong ◽  
Pusadee Seresangtakul
2021 ◽  
Vol 14 (2) ◽  
pp. 494-508
Author(s):  
Francina Sole-Mauri ◽  
Pilar Sánchez-Gijón ◽  
Antoni Oliver

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.


2013 ◽  
Vol 95 ◽  
pp. 535-541 ◽  
Author(s):  
Fryni Kakoyianni-Doa ◽  
Stefanos Antaris ◽  
Eleni Tziafa

2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


2020 ◽  
Vol 54 (3) ◽  
pp. 581-613
Author(s):  
Abbie Hantgan

Abstract The purpose of this study is to re-evaluate the interpretation of a particle that has hitherto been analyzed as a marker either of addressee or the subject of a quoted clause in Ben Tey (Dogon, Mali). As both of these interpretations are typologically rare if not unique, a broader conceptualization for the particle as a quotative topic marker is proposed here. Data are from a newly compiled cross-linguistic annotated corpus of discourse reports within textual contexts. Along with data presentation and analysis, a methodology is illustrated for multilingual comparative corpus construction for the analysis of discourse reporting strategies.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Ali Hamid Meftah ◽  
Mustafa Qamhan ◽  
Yasser Seddiq ◽  
Yousef A. Alotaibi ◽  
Sid-Ahmed Selouani

Sign in / Sign up

Export Citation Format

Share Document