Evaluating Indirect Strategies for Chinese-Spanish Statistical Machine Translation

Although, Chinese and Spanish are two of the most spoken languages in the world, not much research has been done in machine translation for this language pair. This paper focuses on investigating the state-of-the-art of Chinese-to-Spanish statistical machine translation (SMT), which nowadays is one of the most popular approaches to machine translation. For this purpose, we report details of the available parallel corpus which are Basic Traveller Expressions Corpus (BTEC), Holy Bible and United Nations (UN). Additionally, we conduct experimental work with the largest of these three corpora to explore alternative SMT strategies by means of using a pivot language. Three alternatives are considered for pivoting: cascading, pseudo-corpus and triangulation. As pivot language, we use either English, Arabic or French. Results show that, for a phrase-based SMT system, English is the best pivot language between Chinese and Spanish. We propose a system output combination using the pivot strategies which is capable of outperforming the direct translation strategy. The main objective of this work is motivating and involving the research community to work in this important pair of languages given their demographic impact.

Download Full-text

Syntactic Pattern Based Word Alignment for Statistical Machine Translation

International Journal of Knowledge and Systems Science ◽

10.4018/ijkss.2014070103 ◽

2014 ◽

Vol 5 (3) ◽

pp. 36-45

Author(s):

Quang-Hung LE ◽

Anh-Cuong LE

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Research Community ◽

Word Alignment ◽

Syntactic Pattern ◽

Syntactic Information ◽

New Type ◽

Grammatical Structures ◽

Language Pair

Word alignment is the task of aligning bilingual words in a corpus of parallel sentences, and determining the probabilities for these aligned bilingual word pairs. It is the most important factor affecting the quality of any Statistical Machine Translation (SMT) systems. The IBM word alignment models are most well-known in the SMT research community. These models are pure statistical models and therefore they are not good for some language pairs which have differences in linguistic aspects (e.g. grammatical structures). This paper aims to improve the IBM models by using syntactic information. The authors first propose a new type of constraint based on bilingual syntactic patterns, and then integrate it into the IBM models. Finally, they show how to estimate the models' parameters using this new type of constraint. The experiments are conducted on the English-Vietnamese language pair for evaluation.

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Analyzing Subword Techniques to Improve English to Sinhala Neural Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500174 ◽

2021 ◽

pp. 2050017

Author(s):

Rashmini Naranpanawa ◽

Ravinga Perera ◽

Thilakshi Fonseka ◽

Uthayasanker Thayasivam

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Rare Word ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Low Resource ◽

Word Level ◽

Morphologically Rich Languages

Neural machine translation (NMT) is a remarkable approach which performs much better than the Statistical machine translation (SMT) models when there is an abundance of parallel corpus. However, vanilla NMT is primarily based upon word-level with a fixed vocabulary. Therefore, low resource morphologically rich languages such as Sinhala are mostly affected by the out of vocabulary (OOV) and Rare word problems. Recent advancements in subword techniques have opened up opportunities for low resource communities by enabling open vocabulary translation. In this paper, we extend our recently published state-of-the-art EN-SI translation system using the transformer and explore standard subword techniques on top of it to identify which subword approach has a greater effect on English Sinhala language pair. Our models demonstrate that subword segmentation strategies along with the state-of-the-art NMT can perform remarkably when translating English sentences into a rich morphology language regardless of a large parallel corpus.

Download Full-text

Phrase-Based Statistical Machine Translation for a Low-Density Language Pair

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-13059-5_27 ◽

2010 ◽

pp. 273-277 ◽

Cited By ~ 1

Author(s):

Maxim Roy ◽

Fred Popowich

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Low Density ◽

Language Pair

Download Full-text

A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

Computational Linguistics ◽

10.1162/coli_a_00245 ◽

2016 ◽

Vol 42 (2) ◽

pp. 163-205 ◽

Cited By ~ 10

Author(s):

Arianna Bisazza ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

Computational Models ◽

Statistical Machine Translation ◽

Research Area ◽

Linguistic Knowledge ◽

Large Collection ◽

Comprehensive Survey ◽

Single Method ◽

Optimal Approach ◽

Language Pair

Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orient the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.

Download Full-text

Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Language Resources and Evaluation ◽

10.1007/s10579-011-9137-0 ◽

2011 ◽

Vol 45 (2) ◽

pp. 181-208 ◽

Cited By ~ 8

Author(s):

Mireia Farrús ◽

Marta R. Costa-jussà ◽

José B. Mariño ◽

Marc Poch ◽

Adolfo Hernández ◽

...

Keyword(s):

Error Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Spanish Language ◽

Language Pair

Download Full-text

State-of-the-art English to Persian Statistical Machine Translation system

The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) ◽

10.1109/aisp.2012.6313739 ◽

2012 ◽

Cited By ~ 6

Author(s):

Amin Mansouri ◽

Heshaam Faili

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

Dynamically Shaping the Reordering Search Space of Phrase-Based Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00231 ◽

2013 ◽

Vol 1 ◽

pp. 327-340 ◽

Cited By ~ 4

Author(s):

Arianna Bisazza ◽

Marcello Federico

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Search Space ◽

Input Word ◽

Binary Classifier ◽

Crucial Issue ◽

Trade Off ◽

Translation Quality ◽

Very High

Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages. In fact, the optimal trade-off between accuracy and complexity of decoding is nowadays reached by harshly limiting the input permutation space. We propose a method to dynamically shape such space and, thus, capture long-range word movements without hurting translation quality nor decoding time. The space defined by loose reordering constraints is dynamically pruned through a binary classifier that predicts whether a given input word should be translated right after another. The integration of this model into a phrase-based decoder improves a strong Arabic-English baseline already including state-of-the-art early distortion cost (Moore and Quirk, 2007) and hierarchical phrase orientation models (Galley and Manning, 2008). Significant improvements in the reordering of verbs are achieved by a system that is notably faster than the baseline, while bleu and meteor remain stable, or even increase, at a very high distortion limit.

Download Full-text

CovidMulti-Net: A Parallel-Dilated Multi Scale Feature Fusion Architecture for the Identification of COVID-19 Cases from Chest X-ray Images

10.1101/2021.05.19.21257430 ◽

2021 ◽

Author(s):

Md. Saikat Islam Khan ◽

Anichur Rahman ◽

Md. Razaul Karim ◽

Nasima Islam Bithi ◽

Shahab Band ◽

...

Keyword(s):

Healthcare Professionals ◽

Feature Fusion ◽

State Of The Art ◽

Research Community ◽

Diagnostic Model ◽

X Ray ◽

Multi Scale ◽

The World ◽

Chest X Ray ◽

Learning Concept

The COVID-19 pandemic is an emerging respiratory infectious disease, having a significant impact on the health and life of many people around the world. Therefore, early identification of COVID-19 patients is the fastest way to restrain the spread of the pandemic. However, as the number of cases grows at an alarming pace, most developing countries are now facing a shortage of medical resources and testing kits. Besides, using testing kits to detect COVID-19 cases is a time-consuming, expensive, and cumbersome procedure. Faced with these obstacles, most physicians, researchers, and engineers have advocated for the advancement of computer-aided deep learning models to assist healthcare professionals in quickly and inexpensively recognize COVID-19 cases from chest X-ray (CXR) images. With this motivation, this paper proposes a CovidMulti-Net architecture based on the transfer learning concept to classify COVID-19 cases from normal and other pneumonia cases using three publicly available datasets that include 1341, 1341, and 446 CXR images from healthy samples and 902, 1564, and 1193 CXR images infected with Viral Pneumonia, Bacterial Pneumonia, and COVID-19 diseases. In the proposed framework, features from CXR images are extracted using three well-known pre-trained models, including DenseNet-169, ResNet-50, and VGG-19. The extracted features are then fed into a concatenate layer, making a robust hybrid model. The proposed framework achieved a classification accuracy of 99.4%, 95.2%, and 94.8% for 2-Class, 3-Class, and 4-Class datasets, exceeding all the other state-of-the-art models. These results suggest that the CovidMulti-Net frameworks ability to discriminate individuals with COVID-19 infection from healthy ones and provides the opportunity to be used as a diagnostic model in clinics and hospitals. We also made all the materials publicly accessible for the research community at: https://github.com/saikat15010/CovidMulti-Net-Architecture.git.

Download Full-text