multilingual information retrieval
Recently Published Documents


TOTAL DOCUMENTS

62
(FIVE YEARS 4)

H-INDEX

9
(FIVE YEARS 0)

Author(s):  
Joseph P. Telemala ◽  
Hussein Suleman

Habitual switching of languages is a common behaviour among polyglots when searching for information on the Web. Studies in information retrieval (IR) and multilingual information retrieval (MLIR) suggest that part of the reason for such regular switching of languages is the topic of search. Unlike survey-based studies, this study uses query and click-through logs. It exploits the querying and results selection behaviour of Swahili MLIR system users to explore how topic of search (query) is associated with language preferences—topic-language preferences. This article is based on a carefully controlled study using Swahili-speaking Web users in Tanzania who interacted with a guided multilingual search engine. From the statistical analysis of queries and click-through logs, it was revealed that language preferences may be associated with the topics of search. The results also suggest that language preferences are not static; they vary along the course of Web search from query to results selection. In most of the topics, users either had significantly no language preference or preferred to query in Kiswahili and changed their preference to either English or no preference for language when selecting/clicking on the results. The findings of this study might provide researchers with more insights in developing better MLIR systems that support certain types of users and in certain scenarios.


Author(s):  
Khaw, Jasmina Yen Min Et.al

Parallel texts corpora are essential resources especially in translation and multilingual information retrieval. However, the publicly available parallel text corpora are limited to certain types and domains.  Besides, Malay dialects are not standardized in term of writing. The existing alignment algorithms that is used to analayze the writing will require a large training data to obtain a good result. The paper describes our methodology in acquiring a parallel text corpus of Standard Malay and Malay dialects, particularly Kelantan Malay and Sarawak Malay. Second, we propose a hybrid of distance-based and statistical-based alignment algorithm to align words and phrases of the parallel text. The proposed approach has a better precision and recall than the state-of-the-art GIZA++. In the paper, the alignment obtained were also compared to find out the lexical similarities and differences between SM and the two dialects.


2020 ◽  
Vol 46 (6) ◽  
pp. 102258
Author(s):  
Zanab Safdar ◽  
Ruqia Safdar Bajwa ◽  
Shafiq Hussain ◽  
Haslinda Binti Abdullah ◽  
Kalsoom Safdar ◽  
...  

Author(s):  
Petya Osenova ◽  
Kiril Simov

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.


2017 ◽  
Vol 35 (3) ◽  
pp. 410-426 ◽  
Author(s):  
Li Si ◽  
Qiuyu Pan ◽  
Xiaozhe Zhuang

Purpose This paper aims to understand user information behaviours when they perform multilingual information retrieval. It also offers reference for the development of multilingual information retrieval systems and relevant service platforms. Design/methodology/approach The authors designed an experiment on multilingual information retrieval with WorldWideScience, utilized Camtasia studio7 (a screen capturing and recording tool) to record overall operational processes of subjects and collected participants’ thought processes with think-aloud protocols. Meanwhile, a questionnaire survey and interviews were used to examine the subjects’ background information, their feelings for the experiment and their ideas about the experimental platform, respectively. Thirty-two valid data points were obtained by 41 subjects. Findings The users preferred their own language for retrieval. Most users from social science chose general search or advanced search freely according to the tasks. The majority of the participants selected key words directly from the tasks as search terms. Doctoral candidates were more likely to construct a search query with logic symbols. Translation tools were utilized for assisting retrieval and solving doubts of translation. When facing obstacles, users stayed on the original web page to explore continually, followed by back to homepage. Originality/value This paper provides a study of user behaviour through investigating how users behave on the whole process of retrieving multilingual information. The findings offer advice for optimizing the function of multilingual information retrieval systems and service platforms.


Sign in / Sign up

Export Citation Format

Share Document