Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems

Relevance judgment plays an extremely significant role in information retrieval. This study investigates the differences between American users and Chinese users in relevance judgment during the information retrieval process. 384 sets of relevance scores with 50 scores in each set were collected from 16 American users and 16 Chinese users as they judged retrieval records from two major search engines based on 24 predefined search tasks from 4 domain categories. Statistical analyses reveal that there are significant differences between American assessors and Chinese assessors in relevance judgments. Significant gender differences also appear within both the American and the Chinese assessor groups. The study also revealed significant interactions among cultures, genders, and subject categories. These findings can enhance the understanding of cultural impact on information retrieval and can assist in the design of effective cross-language information retrieval systems.

Download Full-text

Comparative Evaluation of Cross-language Information Retrieval Systems

Lecture Notes in Computer Science - From Integrated Publication and Information Systems to Information and Knowledge Environments ◽

10.1007/978-3-540-31842-2_16 ◽

2005 ◽

pp. 152-161 ◽

Cited By ~ 1

Author(s):

Carol Peters

Keyword(s):

Information Retrieval ◽

Comparative Evaluation ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

Download Full-text

Cross Language Information Retrieval (CLIR): A Survey of Approaches for Exploring Web Across Languages

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k7833.1110120 ◽

2020 ◽

Vol 10 (1) ◽

pp. 326-332

Keyword(s):

Information Retrieval ◽

The Internet ◽

Native Languages ◽

Retrieval Systems ◽

Private Organizations ◽

Almost Everywhere ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language ◽

Monolingual Information Retrieval

In the era of globalization, internet being accessible and affordable has gained huge popularity and is widely being used almost everywhere by Government, private organizations, companies, banks, etc. as well as by individuals. It has empowered its users to contribute to the creation of information on web enabling them to use their native languages which consequently has drastically increased the volume of web-accessible documents available in languages other than English. This exponential growth of information on the internet has also induced several challenges before the information retrieval systems. Most of the present monolingual information retrieval systems can retrieve documents in the language of query only, missing the information in other languages that may be more relevant to the user. The need of information retrieval systems to become multilingual has given rise to the research in Cross Language Information Retrieval (CLIR) which can cross the language barriers and retrieve more relevant results from documents in different languages. This article is a review of motivation, issues, work and challenges related to various CLIR approaches. Starting with the most fundamental approaches of translation, it is attempted to study and present a review of more advanced approaches for enhancing the retrieval results in CLIR proposed by various researchers working in this domain.

Download Full-text

Building Proximity Models for Cross Language Information Retrieval

Journal of Science and Technology Issue on Information and Communications Technology ◽

10.31130/jst.2015.5 ◽

2015 ◽

Vol 1 ◽

pp. 8

Author(s):

Lam Tung Giang ◽

Vo Trung Hung ◽

Huynh Cong Phap

Keyword(s):

Information Retrieval ◽

Information Systems ◽

Bag Of Words ◽

Retrieval Performance ◽

Average Precision ◽

Ranking Models ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

In information retrieval systems, the proximity of query terms has been employed to enable ranking models to go beyond the ”bag of words” assumption and it can promote scores of documents where the matched query terms are close to each other. In this article, we study the integration of proximity models into cross-language information retrieval systems. The new proximity models are proposed and incorporated into existing cross-language information systems by combining the proximity score and the original score to re-rank retrieved documents. The experiment results show that the proposed models can help to improve the retrieval performance by 4%-7%, in terms of Mean Average Precision.

Download Full-text

Embedded Fuzzy Bilingual Dictionary model for cross language information retrieval systems

International Journal of Information Technology ◽

10.1007/s41870-018-0181-5 ◽

2018 ◽

Vol 10 (4) ◽

pp. 457-463 ◽

Cited By ~ 2

Author(s):

Olufade F. W. Onifade ◽

Ayodeji O. J. Ibitoye ◽

Pabitra Mitra

Keyword(s):

Information Retrieval ◽

Bilingual Dictionary ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

Download Full-text

English-Marathi Cross Language Information Retrieval System

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.34 ◽

2017 ◽

Vol 7 (8) ◽

pp. 112

Author(s):

Kalyani Lokhande ◽

Dhanashree Tayade

Keyword(s):

Information Retrieval ◽

Retrieval System ◽

Information Retrieval System ◽

Indian Languages ◽

Query Translation ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Different Types ◽

Information Retrieval Systems ◽

Cross Language

Nowadays, diﬀerent types of content in diﬀerent languages are available on World Wide Web and their usage is increasing rapidly. Cross Language Information Retrieval (CLIR) deals with retrieval of documents in another language than the language of the requested query. Various researchers worked on Cross Language Information Retrieval systems for Indian languages using diﬀerent translation approaches. There is still CLIR system to be developed which allow user to retrieve Marathi documents when English query is given. In the proposed English to Marathi Cross Language Information Retrieval system, translation is based on query translation approach. The proposed system retrieves Marathi documents depending on matching terms in query. The performance of the proposed system is improved by query pre-processing and query expansion using WordNet.

Download Full-text

Comparative Analysis of Information Retrieval Models on Quran dataset in Cross-Language Information Retrieval systems

IEEE Access ◽

10.1109/access.2021.3126168 ◽

2021 ◽

pp. 1-1

Author(s):

Ayman A. Taan ◽

Shafiq Ur Rehman Khan ◽

Ali Raza ◽

Ayaz Muhammad Hanif ◽

Hira Anwar

Keyword(s):

Information Retrieval ◽

Comparative Analysis ◽

Retrieval Models ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

Download Full-text

Evaluation of Cross-Language Information Retrieval Systems

10.1007/3-540-45691-0 ◽

2002 ◽

Cited By ~ 10

Keyword(s):

Information Retrieval ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

Download Full-text

Disambiguation of single noun translations extracted from bilingual comparable corpora

Terminology ◽

10.1075/term.7.1.06nak ◽

2001 ◽

Vol 7 (1) ◽

pp. 63-83 ◽

Cited By ~ 4

Author(s):

Hiroshi Nakagawa

Keyword(s):

Information Retrieval ◽

Language Translation ◽

Target Language ◽

Comparable Corpora ◽

Source Language ◽

Bilingual Dictionary ◽

Second Stage ◽

Cross Language Information Retrieval ◽

Machine Readable ◽

Cross Language

Bilingual machine readable dictionaries are important and indispensable resources of information for cross-language information retrieval, and machine translation. Recently, these cross-language informational activities have begun to focus on specific academic or technological domains. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. The proposed method is two-fold. At the first stage, candidate terms are extracted from a Japanese and English corpus, respectively, and ranked according to their importance as terms. At the second stage, ambiguous translations are resolved by selecting the target language translation which is the nearest in rank to the source language term. Finally, we evaluate the proposed method in an experiment.

Download Full-text

Effective preprocessing based neural machine translation for English to Telugu cross-language information retrieval

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp306-315 ◽

2021 ◽

Vol 10 (2) ◽

pp. 306

Author(s):

B. N. V. Narasimha Raju ◽

M. S. V. S. Bhadri Raju ◽

K. V. V. Satyanarayana

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Query Language ◽

Target Language ◽

Main Concern ◽

Neural Machine Translation ◽

Parallel Corpus ◽

User Query ◽

Cross Language Information Retrieval ◽

Cross Language

<span id="docs-internal-guid-5b69f940-7fff-f443-1f09-a00e5e983714"><span>In cross-language information retrieval (CLIR), the neural machine translation (NMT) plays a vital role. CLIR retrieves the information written in a language which is different from the user's query language. In CLIR, the main concern is to translate the user query from the source language to the target language. NMT is useful for translating the data from one language to another. NMT has better accuracy for different languages like English to German and so-on. In this paper, NMT has applied for translating English to Indian languages, especially for Telugu. Besides NMT, an effort is also made to improve accuracy by applying effective preprocessing mechanism. The role of effective preprocessing in improving accuracy will be less but countable. Machine translation (MT) is a data-driven approach where parallel corpus will act as input in MT. NMT requires a massive amount of parallel corpus for performing the translation. Building an English - Telugu parallel corpus is costly because they are resource-poor languages. Different mechanisms are available for preparing the parallel corpus. The major issue in preparing parallel corpus is data replication that is handled during preprocessing. The other issue in machine translation is the out-of-vocabulary (OOV) problem. Earlier dictionaries are used to handle OOV problems. To overcome this problem the rare words are segmented into sequences of subwords during preprocessing. The parameters like accuracy, perplexity, cross-entropy and BLEU scores shows better translation quality for NMT with effective preprocessing.</span></span>

Download Full-text