Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems

Author(s):  
Lam Tung Giang ◽  
Vo Trung Hung ◽  
Huynh Cong Phap ◽  
2020 ◽  
Vol 28 (3) ◽  
pp. 148-168
Author(s):  
Jin Zhang ◽  
Yuehua Zhao ◽  
Xin Cai ◽  
Taowen Le ◽  
Wei Fei ◽  
...  

Relevance judgment plays an extremely significant role in information retrieval. This study investigates the differences between American users and Chinese users in relevance judgment during the information retrieval process. 384 sets of relevance scores with 50 scores in each set were collected from 16 American users and 16 Chinese users as they judged retrieval records from two major search engines based on 24 predefined search tasks from 4 domain categories. Statistical analyses reveal that there are significant differences between American assessors and Chinese assessors in relevance judgments. Significant gender differences also appear within both the American and the Chinese assessor groups. The study also revealed significant interactions among cultures, genders, and subject categories. These findings can enhance the understanding of cultural impact on information retrieval and can assist in the design of effective cross-language information retrieval systems.


In the era of globalization, internet being accessible and affordable has gained huge popularity and is widely being used almost everywhere by Government, private organizations, companies, banks, etc. as well as by individuals. It has empowered its users to contribute to the creation of information on web enabling them to use their native languages which consequently has drastically increased the volume of web-accessible documents available in languages other than English. This exponential growth of information on the internet has also induced several challenges before the information retrieval systems. Most of the present monolingual information retrieval systems can retrieve documents in the language of query only, missing the information in other languages that may be more relevant to the user. The need of information retrieval systems to become multilingual has given rise to the research in Cross Language Information Retrieval (CLIR) which can cross the language barriers and retrieve more relevant results from documents in different languages. This article is a review of motivation, issues, work and challenges related to various CLIR approaches. Starting with the most fundamental approaches of translation, it is attempted to study and present a review of more advanced approaches for enhancing the retrieval results in CLIR proposed by various researchers working in this domain.


Author(s):  
Lam Tung Giang ◽  
Vo Trung Hung ◽  
Huynh Cong Phap

In information retrieval systems, the proximity of query terms has been employed to enable ranking models to go beyond the ”bag of words” assumption and it can promote scores of documents where the matched query terms are close to each other. In this article, we study the integration of proximity models into cross-language information retrieval systems. The new proximity models are proposed and incorporated into existing cross-language information systems by combining the proximity score and the original score to re-rank retrieved documents. The experiment results show that the proposed models can help to improve the retrieval performance by 4%-7%, in terms of Mean Average Precision.


Author(s):  
Kalyani Lokhande ◽  
Dhanashree Tayade

Nowadays, different types of content in different languages are available on World Wide Web and their usage is increasing rapidly. Cross Language Information Retrieval (CLIR) deals with retrieval of documents in another language than the language of the requested query. Various researchers worked on Cross Language Information Retrieval systems for Indian languages using different translation approaches. There is still CLIR system to be developed which allow user to retrieve Marathi documents when English query is given. In the proposed English to Marathi Cross Language Information Retrieval system, translation is based on query translation approach. The proposed system retrieves Marathi documents depending on matching terms in query. The performance of the proposed system is improved by query pre-processing and query expansion using WordNet.


Terminology ◽  
2001 ◽  
Vol 7 (1) ◽  
pp. 63-83 ◽  
Author(s):  
Hiroshi Nakagawa

Bilingual machine readable dictionaries are important and indispensable resources of information for cross-language information retrieval, and machine translation. Recently, these cross-language informational activities have begun to focus on specific academic or technological domains. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. The proposed method is two-fold. At the first stage, candidate terms are extracted from a Japanese and English corpus, respectively, and ranked according to their importance as terms. At the second stage, ambiguous translations are resolved by selecting the target language translation which is the nearest in rank to the source language term. Finally, we evaluate the proposed method in an experiment.


Author(s):  
B. N. V. Narasimha Raju ◽  
M. S. V. S. Bhadri Raju ◽  
K. V. V. Satyanarayana

<span id="docs-internal-guid-5b69f940-7fff-f443-1f09-a00e5e983714"><span>In cross-language information retrieval (CLIR), the neural machine translation (NMT) plays a vital role. CLIR retrieves the information written in a language which is different from the user's query language. In CLIR, the main concern is to translate the user query from the source language to the target language. NMT is useful for translating the data from one language to another. NMT has better accuracy for different languages like English to German and so-on. In this paper, NMT has applied for translating English to Indian languages, especially for Telugu. Besides NMT, an effort is also made to improve accuracy by applying effective preprocessing mechanism. The role of effective preprocessing in improving accuracy will be less but countable. Machine translation (MT) is a data-driven approach where parallel corpus will act as input in MT. NMT requires a massive amount of parallel corpus for performing the translation. Building an English - Telugu parallel corpus is costly because they are resource-poor languages. Different mechanisms are available for preparing the parallel corpus. The major issue in preparing parallel corpus is data replication that is handled during preprocessing. The other issue in machine translation is the out-of-vocabulary (OOV) problem. Earlier dictionaries are used to handle OOV problems. To overcome this problem the rare words are segmented into sequences of subwords during preprocessing. The parameters like accuracy, perplexity, cross-entropy and BLEU scores shows better translation quality for NMT with effective preprocessing.</span></span>


Sign in / Sign up

Export Citation Format

Share Document