Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia
Keyword(s):
Comparable corpora come as an alternative to parallel corpora for the languages where the parallel corpora is scarce. The efficiency of the models trained on comparable corpora is comparatively less to that of the parallel corpora however it helps to compensate much to the machine translation. In this article, the authors have explored Wikipedia as a potential source and delineated the process of alignment of documents which will be further used for the extraction of parallel data. The parallel data thus extracted will help to enhance the performance of Statistical Machine translation.
2016 ◽
Vol 22
(4)
◽
pp. 603-625
◽
2005 ◽
Vol 31
(4)
◽
pp. 477-504
◽
2019 ◽
Vol 33
◽
pp. 6367-6374
◽
2016 ◽
Vol 22
(4)
◽
pp. 517-548
◽
2016 ◽
Vol 22
(4)
◽
pp. 549-573
◽