Building a Bilingual Corpus based on Hybrid Approach for Malayalam-English Machine Translation
Word alignment in bilingual corpora has been a very active research topic in the Machine Translation research groups. In this research paper, we describe an alignment system that aligns English-Malayalam texts at word level in parallel sentences. The alignment of translated segments with source segments is very essential for building parallel corpora. Since word alignment research on Malayalam and English languages is still in its immaturity, it is not a trivial task for Malayalam-English text. A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other. Thus, the main purpose of this system is to construct word-aligned parallel corpus to be used in Malayalam-English machine translation. The proposed approach is a hybrid approach, a combination of corpus based and dictionary lookup approaches. The corpus based approach is based on the first three IBM models and Expectation Maximization (EM) algorithm. For the dictionary lookup approach, the proposed system uses the bilingual Malayalam-English Dictionary.