Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval

Author(s):  
Lingjun Zhao ◽  
Rabih Zbib ◽  
Zhuolin Jiang ◽  
Damianos Karakos ◽  
Zhongqiang Huang
2019 ◽  
Author(s):  
Elizabeth Boschee ◽  
Joel Barry ◽  
Jayadev Billa ◽  
Marjorie Freedman ◽  
Thamme Gowda ◽  
...  

2020 ◽  
Vol 34 (05) ◽  
pp. 8066-8073
Author(s):  
Katharina Kann ◽  
Ophélie Lacroix ◽  
Anders Søgaard

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.


2016 ◽  
Vol 68 (4) ◽  
pp. 448-477 ◽  
Author(s):  
Dong Zhou ◽  
Séamus Lawless ◽  
Xuan Wu ◽  
Wenyu Zhao ◽  
Jianxun Liu

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value – Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.


Sign in / Sign up

Export Citation Format

Share Document