Weakly Supervised Attentional Model for Low Resource Ad-hoc Cross-lingual Information Retrieval

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision – e.g., cross-lingual transfer, type-level supervision, or a combination thereof – have been reported to perform almost as well as supervised ones. However, weakly supervised POS taggers are commonly only evaluated on languages that are very different from truly low-resource languages, and the taggers use sources of information, like high-coverage and almost error-free dictionaries, which are likely not available for resource-poor languages. We train and evaluate state-of-the-art weakly supervised POS taggers for a typologically diverse set of 15 truly low-resource languages. On these languages, given a realistic amount of resources, even our best model gets only less than half of the words right. Our results highlight the need for new and different approaches to POS tagging for truly low-resource languages.

Download Full-text

Incorporating context within the language modeling approach for ad hoc information retrieval

ACM SIGIR Forum ◽

10.1145/1147197.1147211 ◽

2006 ◽

Vol 40 (1) ◽

pp. 70-70 ◽

Cited By ~ 1

Author(s):

Leif Azzopardi

Keyword(s):

Information Retrieval ◽

Ad Hoc ◽

Language Modeling ◽

Modeling Approach

Download Full-text

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

ACM SIGIR Forum ◽

10.1145/3130348.3130377 ◽

2017 ◽

Vol 51 (2) ◽

pp. 268-276 ◽

Cited By ~ 44

Author(s):

Chengxiang Zhai ◽

John Lafferty

Keyword(s):

Information Retrieval ◽

Ad Hoc ◽

Language Models ◽

Smoothing Methods

Download Full-text

Merging Strategy for Cross-Lingual Information Retrieval Systems based on Learning Vector Quantization

Neural Processing Letters ◽

10.1007/s11063-005-2659-y ◽

2005 ◽

Vol 22 (2) ◽

pp. 149-161 ◽

Cited By ~ 1

Author(s):

M. T. Martín-Valdivia ◽

F. Martínez-Santiago ◽

L. A. Ureña-López

Keyword(s):

Information Retrieval ◽

Vector Quantization ◽

Learning Vector Quantization ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Cross Lingual ◽

Merging Strategy

Download Full-text

Log-Bilinear Document Language Model for Ad-hoc Information Retrieval

Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management - CIKM '14 ◽

10.1145/2661829.2661919 ◽

2014 ◽

Author(s):

Xinhui Tu ◽

Jing Luo ◽

Bo Li ◽

Tingting He

Keyword(s):

Information Retrieval ◽

Ad Hoc ◽

Language Model

Download Full-text

Cross-lingual and ensemble MLPs strategies for low-resource speech recognition

10.21437/interspeech.2012-11 ◽

2012 ◽

Author(s):

Yanmin Qian ◽

Jia Liu

Keyword(s):

Speech Recognition ◽

Low Resource ◽

Cross Lingual

Download Full-text

Cross-lingual transfer learning during supervised training in low resource scenarios

10.21437/interspeech.2015-700 ◽

2015 ◽

Author(s):

Amit Das ◽

Mark Hasegawa-Johnson

Keyword(s):

Transfer Learning ◽

Low Resource ◽

Supervised Training ◽

Cross Lingual

Download Full-text

A study of user profile representation for personalized cross-language information retrieval

Aslib Journal of Information Management ◽

10.1108/ajim-06-2015-0091 ◽

2016 ◽

Vol 68 (4) ◽

pp. 448-477 ◽

Cited By ~ 5

Author(s):

Dong Zhou ◽

Séamus Lawless ◽

Xuan Wu ◽

Wenyu Zhao ◽

Jianxun Liu

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

User Profile ◽

User Profiles ◽

Content Type ◽

Cross Language Information Retrieval ◽

Cross Lingual ◽

Cross Language ◽

Representation Techniques ◽

Comprehensive Study

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value – Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Download Full-text

How to Parse Low-Resource Languages: Cross-Lingual Parsing, Target Language Annotation, or Both?

10.18653/v1/w19-7713 ◽

2019 ◽

Author(s):

Ailsa Meechan-Maddon ◽

Joakim Nivre

Keyword(s):

Target Language ◽

Low Resource ◽

Cross Lingual

Download Full-text