Combining Lexical Context with Pseudo-alignment for Bilingual Lexicon Extraction from Comparable Corpora

Author(s):  
Bo Li ◽  
Qunyan Zhu ◽  
Tingting He ◽  
Qianjun Chen
2014 ◽  
Vol 7 ◽  
pp. 1335-1343
Author(s):  
Hong-Seok Kwon ◽  
Hyeong-Won Seo ◽  
Minah Cheon ◽  
Jae-Hoon Kim

Author(s):  
E. Gaussier ◽  
J.-M. Renders ◽  
I. Matveeva ◽  
C. Goutte ◽  
H. Déjean

2020 ◽  
Vol 13 (5) ◽  
pp. 379-392
Author(s):  
Rizka Sholikah ◽  
◽  
Yasuhiko Morimoto ◽  
Agus Arifin ◽  
Chastine Fatichah ◽  
...  

2018 ◽  
Vol 24 (4) ◽  
pp. 523-549 ◽  
Author(s):  
BO LI ◽  
ERIC GAUSSIER ◽  
DAN YANG

AbstractComparable corpora serve as an important substitute for parallel resources in cases of under-resourced language pairs. Previous work mostly aims to find a better strategy to exploit existing comparable corpora, while ignoring the variety in corpus quality. The quality of comparable corpora affects a lot its usability in practice, a fact that has been justified by several studies. However, researchers have not been able to establish a widely accepted and fully validated framework to measure corpus quality. We will thus investigate in this paper a comprehensive methodology to deal with the quality of comparable corpora. To be exact, we will propose several comparability measures and a quantitative strategy to test those measures. Our experiments show that the proposed comparability measure can capture gold-standard comparability levels very well and is robust to the bilingual dictionary used. Moreover, we will show in the task of bilingual lexicon extraction that the proposed measure correlates well with the performance of the real world application.


2016 ◽  
Vol 22 (4) ◽  
pp. 575-601 ◽  
Author(s):  
EMMANUEL MORIN ◽  
AMIR HAZEM

AbstractThe main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora and on the quality of bilingual terminology extraction by doing different experiments. Moreover, we have introduced a strategy into the context-based projection method to re-estimate word co-occurrence observations. This is done by using smoothing or prediction techniques that boost the observations of word co-occurrences which are mainly useful for the smallest part of an unbalanced comparable corpus. Our results show that the use of unbalanced specialized comparable corpora results in a significant improvement in the quality of extracted lexicons.


Sign in / Sign up

Export Citation Format

Share Document