UB at CLEF2004: Cross Language Information Retrieval Using Statistical Language Models

Author(s):  
Miguel E. Ruiz ◽  
Munirathnam Srikanth
2021 ◽  
Vol 11 (5) ◽  
pp. 7598-7604
Author(s):  
H. V. T. Chi ◽  
D. L. Anh ◽  
N. L. Thanh ◽  
D. Dinh

Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.


Sign in / Sign up

Export Citation Format

Share Document