Building CLIA for Resource-Scarce African Languages

Author(s):  
Kula Kekeba Tune ◽  
Vasudeva Varma

Since most of the existing major search engines and commercial Information Retrieval (IR) systems are primarily designed for well-resourced European and Asian languages, they have paid little attention to the development of Cross-Language Information Access (CLIA) technologies for resource-scarce African languages. This paper presents the authors' experience in building CLIA for indigenous African languages, with a special focus on the development and evaluation of Oromo-English-CLIR. The authors have adopted a knowledge-based query translation approach to design and implement their initial Oromo-English CLIR (OMEN-CLIR). Apart from designing and building the first OMEN-CLIR from scratch, another major contribution of this study is assessing the performance of the proposed retrieval system at one of the well-recognized international Cross-Language Evaluation Forums like the CLEF campaign. The overall performance of OMEN-CLIR was found to be very promising and encouraging, given the limited amount of linguistic resources available for severely under-resourced African languages like Afaan Oromo.

2015 ◽  
Vol 5 (1) ◽  
pp. 48-67
Author(s):  
Kula Kekeba Tune ◽  
Vasudeva Varma

Since most of the existing major search engines and commercial Information Retrieval (IR) systems are primarily designed for well-resourced European and Asian languages, they have paid little attention to the development of Cross-Language Information Access (CLIA) technologies for resource-scarce African languages. This paper presents the authors' experience in building CLIA for indigenous African languages, with a special focus on the development and evaluation of Oromo-English-CLIR. The authors have adopted a knowledge-based query translation approach to design and implement their initial Oromo-English CLIR (OMEN-CLIR). Apart from designing and building the first OMEN-CLIR from scratch, another major contribution of this study is assessing the performance of the proposed retrieval system at one of the well-recognized international Cross-Language Evaluation Forums like the CLEF campaign. The overall performance of OMEN-CLIR was found to be very promising and encouraging, given the limited amount of linguistic resources available for severely under-resourced African languages like Afaan Oromo.


Author(s):  
Vasudeva Varma ◽  
Aditya Mogadala

In this chapter, the authors start their discussion highlighting the importance of Cross Lingual and Multilingual Information Retrieval and access research areas. They then discuss the distinction between Cross Language Information Retrieval (CLIR), Multilingual Information Retrieval (MLIR), Cross Language Information Access (CLIA), and Multilingual Information Access (MLIA) research areas. In addition, in further sections, issues and challenges in these areas are outlined, and various approaches, including machine learning-based and knowledge-based approaches to address the multilingual information access, are discussed. The authors describe various subsystems of a MLIA system ranging from query processing to output generation by sharing their experience of building a MLIA system and discuss its architecture. Then evaluation aspects of the MLIA and CLIA systems are discussed at the end of this chapter.


Author(s):  
Juncal Gutiérrez-Artacho ◽  
María-Dolores Olvera-Lobo

Within the sphere of the Web, the overload of information is more notable than in other contexts. Question answering systems (QAS) are presented as an alternative to the traditional Information Retrieval (IR) systems, seeking to offer precise and understandable answers to factual questions instead of showing the user a list of documents related to a given search . Given that the QAS is presented as a substantial advance in the improvement of IR, it becomes necessary to determine its effectiveness for the final user. With this aim, 7 studies were undertaken to evaluate: a) in the first two, the linguistic resources and tools used in these systems for multilingual retrieval (Research 1; Research 2); and b) the performance and quality of the answers of the main monolingual and multilingual QA of general domain and specialized domain in the Web in response to different types of questions and subjects, so that different evaluation means can be applied (Research 3, Research 4, Research 5, Research 6, Research 7).


Author(s):  
Juncal Gutiérrez-Artacho ◽  
María-Dolores Olvera-Lobo

Within the sphere of the web, the overload of information is more notable than in other contexts. Question answering systems (QAS) are presented as an alternative to the traditional information retrieval (IR) systems seeking to offer precise and understandable answers to factual questions instead of showing the user a list of documents related to a given search. Given that the QAS is presented as a substantial advance in the improvement of IR, it becomes necessary to determine its effectiveness for the final user. With this aim, seven studies were undertaken to evaluate: 1) in the first two, the linguistic resources and tools used in these systems for multilingual retrieval (Research 1, Research 2), and 2) the performance and quality of the answers of the main monolingual and multilingual QA of general domain and specialized domain in the web in response to different types of questions and subjects, so that different evaluation means can be applied (Research 3, Research 4, Research 5, Research 6, Research 7).


2003 ◽  
Vol 29 (3) ◽  
pp. 381-419 ◽  
Author(s):  
Wessel Kraaij ◽  
Jian-Yun Nie ◽  
Michel Simard

Although more and more language pairs are covered by machine translation (MT) services, there are still many pairs that lack translation resources. Cross-language information retrieval (CLIR) is an application that needs translation functionality of a relatively low level of sophistication, since current models for information retrieval (IR) are still based on a bag of words. The Web provides a vast resource for the automatic construction of parallel corpora that can be used to train statistical translation models automatically. The resulting translation models can be embedded in several ways in a retrieval model. In this article, we will investigate the problem of automatically mining parallel texts from the Web and different ways of integrating the translation models within the retrieval process. Our experiments on standard test collections for CLIR show that the Web-based translation models can surpass commercial MT systems in CLIR tasks. These results open the perspective of constructing a fully automatic query translation device for CLIR at a very low cost.


Sign in / Sign up

Export Citation Format

Share Document