CASMACAT: An Open Source Workbench for Advanced Computer Aided Translation

Abstract We describe an open source workbench that offers advanced computer aided translation (CAT) functionality: post-editing machine translation (MT), interactive translation prediction (ITP), visualization of word alignment, extensive logging with replay mode, integration with eye trackers and e-pen.

Download Full-text

Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-010-0004-8 ◽

2010 ◽

Vol 93 (1) ◽

pp. 37-46 ◽

Cited By ~ 5

Author(s):

Qin Gao ◽

Stephan Vogel

Keyword(s):

Open Source ◽

Machine Translation ◽

Error Tolerance ◽

Word Alignment ◽

Tolerance Mechanism ◽

Phrase Extraction ◽

Word Clustering ◽

Hadoop Clusters

Training Phrase-Based Machine Translation Models on the Cloud: Open Source Machine Translation Toolkit Chaski In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standard Hadoop error-tolerance framework. The paper will describe the underlying methodology and the design of the system, together with instructions of how to run the system on Hadoop clusters.

Download Full-text

Using Machine Translation to Provide Target-Language Edit Hints in Computer Aided Translation Based on Translation Memories

Journal of Artificial Intelligence Research ◽

10.1613/jair.4630 ◽

2015 ◽

Vol 53 ◽

pp. 169-222 ◽

Cited By ~ 2

Author(s):

Miquel Esplà-Gomis ◽

Felipe Sánchez-Martínez ◽

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

Recommendation System ◽

General Purpose ◽

Target Language ◽

Word Alignment ◽

Translation Memory ◽

Computer Aided ◽

High Degree ◽

Better Than ◽

Target Words

This paper explores the use of general-purpose machine translation (MT) in assisting the users of computer-aided translation (CAT) systems based on translation memory (TM) to identify the target words in the translation proposals that need to be changed (either replaced or removed) or kept unedited, a task we term as "word-keeping recommendation". MT is used as a black box to align source and target sub-segments on the fly in the translation units (TUs) suggested to the user. Source-language (SL) and target-language (TL) segments in the matching TUs are segmented into overlapping sub-segments of variable length and machine-translated into the TL and the SL, respectively. The bilingual sub-segments obtained and the matching between the SL segment in the TU and the segment to be translated are employed to build the features that are then used by a binary classifier to determine the target words to be changed and those to be kept unedited. In this approach, MT results are never presented to the translator. Two approaches are presented in this work: one using a word-keeping recommendation system which can be trained on the TM used with the CAT system, and a more basic approach which does not require any training. Experiments are conducted by simulating the translation of texts in several language pairs with corpora belonging to different domains and using three different MT systems. We compare the performance obtained to that of previous works that have used statistical word alignment for word-keeping recommendation, and show that the MT-based approaches presented in this paper are more accurate in most scenarios. In particular, our results confirm that the MT-based approaches are better than the alignment-based approach when using models trained on out-of-domain TMs. Additional experiments were performed to check how dependent the MT-based recommender is on the language pair and MT system used for training. These experiments confirm a high degree of reusability of the recommendation models across various MT systems, but a low level of reusability across language pairs.

Download Full-text

Digital Mammography: Development of an Advanced Computer-Aided System for Breast Cancer Detection

10.21236/ada425978 ◽

2004 ◽

Author(s):

Heang P. Chan

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

Digital Mammography ◽

Breast Cancer Detection ◽

Computer Aided ◽

Advanced Computer

Download Full-text

Does GIZA++ Make Search Errors?

Computational Linguistics ◽

10.1162/coli_a_00008 ◽

2010 ◽

Vol 36 (3) ◽

pp. 295-302 ◽

Cited By ~ 2

Author(s):

Sujith Ravi ◽

Kevin Knight

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Alignment ◽

Alignment Algorithm

Word alignment is a critical procedure within statistical machine translation (SMT). Brown et al. (1993) have provided the most popular word alignment algorithm to date, one that has been implemented in the GIZA (Al-Onaizan et al., 1999) and GIZA++ (Och and Ney 2003) software and adopted by nearly every SMT project. In this article, we investigate whether this algorithm makes search errors when it computes Viterbi alignments, that is, whether it returns alignments that are sub-optimal according to a trained model.

Download Full-text

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation

10.21437/interspeech.2008-676 ◽

2008 ◽

Author(s):

Holger Schwenk ◽

Yannick Esteve

Keyword(s):

Open Source ◽

Machine Translation ◽

Data Selection ◽

Machine Translation Evaluation

Download Full-text

An Open-Source Web-Based Tool for Resource-Agnostic Interactive Translation Prediction

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0015 ◽

2014 ◽

Vol 102 (1) ◽

pp. 69-80 ◽

Cited By ~ 2

Author(s):

Torregrosa Daniel ◽

Forcada Mikel L. ◽

Pérez-Ortiz Juan Antonio

Keyword(s):

Open Source ◽

Machine Translation ◽

Web Application ◽

Statistical Machine Translation ◽

Black Box ◽

Translation System ◽

Web Tool ◽

Web Based ◽

Strongly Coupled ◽

Machine Translation System

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.

Download Full-text

Advanced Computer-Aided Gear Design, Analysis and Manufacturing

New Approaches to Gear Design and Production - Mechanisms and Machine Science ◽

10.1007/978-3-030-34945-5_3 ◽

2020 ◽

pp. 71-113

Author(s):

Claude Gosselin

Keyword(s):

Design Analysis ◽

Gear Design ◽

Computer Aided ◽

Advanced Computer

Download Full-text

OpenNMT: Open-Source Toolkit for Neural Machine Translation

10.18653/v1/p17-4012 ◽

2017 ◽

Cited By ~ 139

Author(s):

Guillaume Klein ◽

Yoon Kim ◽

Yuntian Deng ◽

Jean Senellart ◽

Alexander Rush

Keyword(s):

Open Source ◽

Machine Translation ◽

Neural Machine Translation

Download Full-text

Multi-engine machine translation with an open-source decoder for statistical machine translation

10.3115/1626355.1626381 ◽

2007 ◽

Cited By ~ 2

Author(s):

Yu Chen ◽

Andreas Eisele ◽

Christian Federmann ◽

Eva Hasler ◽

Michael Jellinghaus ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Statistical Machine Translation

Download Full-text

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Machine Translation ◽

10.1007/s10590-021-09260-6 ◽

2021 ◽

Author(s):

Tanmai Khanna ◽

Jonathan N. Washington ◽

Francis M. Tyers ◽

Sevilay Bayatlı ◽

Daniel G. Swanson ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Lexical Selection ◽

Rule Based ◽

Low Resource ◽

Language Technology ◽

Language Data ◽

Recursive Structures ◽

Platform Translation ◽

Free Open Source

AbstractThis paper presents an overview of Apertium, a free and open-source rule-based machine translation platform. Translation in Apertium happens through a pipeline of modular tools, and the platform continues to be improved as more language pairs are added. Several advances have been implemented since the last publication, including some new optional modules: a module that allows rules to process recursive structures at the structural transfer stage, a module that deals with contiguous and discontiguous multi-word expressions, and a module that resolves anaphora to aid translation. Also highlighted is the hybridisation of Apertium through statistical modules that augment the pipeline, and statistical methods that augment existing modules. This includes morphological disambiguation, weighted structural transfer, and lexical selection modules that learn from limited data. The paper also discusses how a platform like Apertium can be a critical part of access to language technology for so-called low-resource languages, which might be ignored or deemed unapproachable by popular corpus-based translation technologies. Finally, the paper presents some of the released and unreleased language pairs, concluding with a brief look at some supplementary Apertium tools that prove valuable to users as well as language developers. All Apertium-related code, including language data, is free/open-source and available at https://github.com/apertium.

Download Full-text