Disambiguation of single noun translations extracted from bilingual comparable corpora

Bilingual machine readable dictionaries are important and indispensable resources of information for cross-language information retrieval, and machine translation. Recently, these cross-language informational activities have begun to focus on specific academic or technological domains. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. The proposed method is two-fold. At the first stage, candidate terms are extracted from a Japanese and English corpus, respectively, and ranked according to their importance as terms. At the second stage, ambiguous translations are resolved by selecting the target language translation which is the nearest in rank to the source language term. Finally, we evaluate the proposed method in an experiment.

Download Full-text

Japanese-English Cross Language Information Retrieval based on Comparable Corpora and Bilingual Dictionary

Journal of Natural Language Processing ◽

10.5715/jnlp.5.4_77 ◽

1998 ◽

Vol 5 (4) ◽

pp. 77-93

Author(s):

AKITOSHI OKUMURA ◽

KAI ISHIKAWA ◽

KENJI SATOH

Keyword(s):

Information Retrieval ◽

Comparable Corpora ◽

Bilingual Dictionary ◽

Cross Language Information Retrieval ◽

Cross Language ◽

Japanese English

Download Full-text

Learning bilingual translations from comparable corpora to cross-language information retrieval

10.3115/1118935.1118943 ◽

2003 ◽

Cited By ~ 12

Author(s):

Fatiha Sadat ◽

Masatoshi Yoshikawa ◽

Shunsuke Uemura

Keyword(s):

Information Retrieval ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Experiments on Cross-Language Information Retrieval Using Comparable Corpora of Chinese, Japanese, and Korean Languages

Evaluating Information Retrieval and Access Tasks - The Information Retrieval Series ◽

10.1007/978-981-15-5554-1_2 ◽

2020 ◽

pp. 21-37

Author(s):

Kazuaki Kishida ◽

Kuang-hua Chen

Keyword(s):

Information Retrieval ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Effective arabic-english cross-language information retrieval via machine-readable dictionaries and machine translation

Proceedings of the tenth international conference on Information and knowledge management - CIKM'01 ◽

10.1145/502585.502635 ◽

2001 ◽

Cited By ~ 8

Author(s):

Mohammed Aljlayl ◽

Ophir Frieder

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Cross Language Information Retrieval ◽

Machine Readable ◽

Cross Language

Download Full-text

Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems

International Journal of Engineering Research and ◽

10.17577/ijertv4is040317 ◽

2015 ◽

Vol V4 (04) ◽

Author(s):

Lam Tung Giang ◽

Vo Trung Hung ◽

Huynh Cong Phap ◽

Keyword(s):

Information Retrieval ◽

Target Language ◽

Retrieval Systems ◽

Cross Language Information Retrieval ◽

Information Retrieval Systems ◽

Cross Language

Download Full-text

Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework

Information Processing & Management ◽

10.1016/j.ipm.2015.08.001 ◽

2016 ◽

Vol 52 (2) ◽

pp. 299-318 ◽

Cited By ~ 12

Author(s):

Razieh Rahimi ◽

Azadeh Shakery ◽

Irwin King

Keyword(s):

Information Retrieval ◽

Language Modeling ◽

Modeling Framework ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Effects of Comparable Corpora on Cross-Language Information Retrieval

Proceedings of the 7th International Workshop on Natural Language Processing and Cognitive Science ◽

10.5220/0003029200530059 ◽

2010 ◽

Keyword(s):

Information Retrieval ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora

Information Retrieval ◽

10.1007/s10791-012-9200-5 ◽

2012 ◽

Vol 16 (3) ◽

pp. 331-368 ◽

Cited By ~ 26

Author(s):

Ivan Vulić ◽

Wim De Smet ◽

Marie-Francine Moens

Keyword(s):

Information Retrieval ◽

Topic Models ◽

Retrieval Models ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Latent Topic ◽

Cross Language

Download Full-text

Using Comparable Corpora to Improve the Effectiveness of Cross-Language Information Retrieval

Advances in Natural Language Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-14770-8_36 ◽

2010 ◽

pp. 320-331 ◽

Cited By ~ 1

Author(s):

Fatiha Sadat

Keyword(s):

Information Retrieval ◽

Comparable Corpora ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Effective preprocessing based neural machine translation for English to Telugu cross-language information retrieval

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i2.pp306-315 ◽

2021 ◽

Vol 10 (2) ◽

pp. 306

Author(s):

B. N. V. Narasimha Raju ◽

M. S. V. S. Bhadri Raju ◽

K. V. V. Satyanarayana

Keyword(s):

Information Retrieval ◽

Machine Translation ◽

Query Language ◽

Target Language ◽

Main Concern ◽

Neural Machine Translation ◽

Parallel Corpus ◽

User Query ◽

Cross Language Information Retrieval ◽

Cross Language

<span id="docs-internal-guid-5b69f940-7fff-f443-1f09-a00e5e983714"><span>In cross-language information retrieval (CLIR), the neural machine translation (NMT) plays a vital role. CLIR retrieves the information written in a language which is different from the user's query language. In CLIR, the main concern is to translate the user query from the source language to the target language. NMT is useful for translating the data from one language to another. NMT has better accuracy for different languages like English to German and so-on. In this paper, NMT has applied for translating English to Indian languages, especially for Telugu. Besides NMT, an effort is also made to improve accuracy by applying effective preprocessing mechanism. The role of effective preprocessing in improving accuracy will be less but countable. Machine translation (MT) is a data-driven approach where parallel corpus will act as input in MT. NMT requires a massive amount of parallel corpus for performing the translation. Building an English - Telugu parallel corpus is costly because they are resource-poor languages. Different mechanisms are available for preparing the parallel corpus. The major issue in preparing parallel corpus is data replication that is handled during preprocessing. The other issue in machine translation is the out-of-vocabulary (OOV) problem. Earlier dictionaries are used to handle OOV problems. To overcome this problem the rare words are segmented into sequences of subwords during preprocessing. The parameters like accuracy, perplexity, cross-entropy and BLEU scores shows better translation quality for NMT with effective preprocessing.</span></span>

Download Full-text