English-Vietnamese Cross-Lingual Paraphrase Identification Using MT-DNN

Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.

Download Full-text

A study of user profile representation for personalized cross-language information retrieval

Aslib Journal of Information Management ◽

10.1108/ajim-06-2015-0091 ◽

2016 ◽

Vol 68 (4) ◽

pp. 448-477 ◽

Cited By ~ 5

Author(s):

Dong Zhou ◽

Séamus Lawless ◽

Xuan Wu ◽

Wenyu Zhao ◽

Jianxun Liu

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

User Profile ◽

User Profiles ◽

Content Type ◽

Cross Language Information Retrieval ◽

Cross Lingual ◽

Cross Language ◽

Representation Techniques ◽

Comprehensive Study

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value – Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.

Download Full-text

The Singleton Fallacy: Why Current Critiques of Language Models Miss the Point

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.682578 ◽

2021 ◽

Vol 4 ◽

Author(s):

Magnus Sahlgren ◽

Fredrik Carlsson

Keyword(s):

Neural Network ◽

Natural Language ◽

Computational Linguistics ◽

Natural Language Understanding ◽

Language Models ◽

Current Debate ◽

Language Understanding ◽

Original Motivation ◽

Language Meaning

This paper discusses the current critique against neural network-based Natural Language Understanding solutions known as language models. We argue that much of the current debate revolves around an argumentation error that we refer to as the singleton fallacy: the assumption that a concept (in this case, language, meaning, and understanding) refers to a single and uniform phenomenon, which in the current debate is assumed to be unobtainable by (current) language models. By contrast, we argue that positing some form of (mental) “unobtanium” as definiens for understanding inevitably leads to a dualistic position, and that such a position is precisely the original motivation for developing distributional methods in computational linguistics. As such, we argue that language models present a theoretically (and practically) sound approach that is our current best bet for computers to achieve language understanding. This understanding must however be understood as a computational means to an end.

Download Full-text

UB at CLEF2004: Cross Language Information Retrieval Using Statistical Language Models

Multilingual Information Access for Text, Speech and Images - Lecture Notes in Computer Science ◽

10.1007/11519645_19 ◽

2005 ◽

pp. 180-187 ◽

Cited By ~ 1

Author(s):

Miguel E. Ruiz ◽

Munirathnam Srikanth

Keyword(s):

Information Retrieval ◽

Language Models ◽

Cross Language Information Retrieval ◽

Statistical Language Models ◽

Cross Language

Download Full-text

Where do Clinical Language Models Break Down? A Critical Behavioural Exploration of the ClinicalBERT Deep Transformer Model

Journal of Computational Vision and Imaging Systems ◽

10.15353/jcvis.v6i1.3548 ◽

2021 ◽

Vol 6 (1) ◽

pp. 1-4

Author(s):

Alexander MacLean ◽

Alexander Wong

Keyword(s):

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Language Models ◽

Clinical Knowledge ◽

Language Understanding ◽

Improved Performance ◽

Transformer Model ◽

Clinical Domain

The introduction of Bidirectional Encoder Representations from Transformers (BERT) was a major breakthrough for transfer learning in natural language processing, enabling state-of-the-art performance across a large variety of complex language understanding tasks. In the realm of clinical language modeling, the advent of BERT led to the creation of ClinicalBERT, a state-of-the-art deep transformer model pretrained on a wealth of patient clinical notes to facilitate for downstream predictive tasks in the clinical domain. While ClinicalBERT has been widely leveraged by the research community as the foundation for building clinical domain-specific predictive models given its overall improved performance in the Medical Natural Language inference (MedNLI) challenge compared to the seminal BERT model, the fine-grained behaviour and intricacies of this popular clinical language model has not been well-studied. Without this deeper understanding, it is very challenging to understand where ClinicalBERT does well given its additional exposure to clinical knowledge, where it doesn't, and where it can be improved in a meaningful manner. Motivated to garner a deeper understanding, this study presents a critical behaviour exploration of the ClinicalBERT deep transformer model using MedNLI challenge dataset to better understanding the following intricacies: 1) decision-making similarities between ClinicalBERT and BERT (leverage a new metric we introduce called Model Alignment), 2) where ClinicalBERT holds advantages over BERT given its clinical knowledge exposure, and 3) where ClinicalBERT struggles when compared to BERT. The insights gained about the behaviour of ClinicalBERT will help guide towards new directions for designing and training clinical language models in a way that not only addresses the remaining gaps and facilitates for further improvements in clinical language understanding performance, but also highlights the limitation and boundaries of use for such models.

Download Full-text

Designing a Language-Independent Search Prototype for Accessing Multilingual Resources from Metadata-Enabled Repositories Information Need

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais286 ◽

2013 ◽

Author(s):

Lynne C. Howarth ◽

Thea Miller

Keyword(s):

Information Retrieval ◽

Natural Language ◽

Equitable Access ◽

Information Need ◽

World Community ◽

Internet Connection ◽

Search Results ◽

Cross Language Information Retrieval ◽

The World ◽

Cross Language

As research described herein suggests, designing a cross-language information retrieval (CLIR) prototype that supports natural language queries in any language, and presents search results in visual category clusters, represents another step towards providing equitable access to the world community by anyone with an Internet connection and an information need.Comme la présente recherche le suggère, la conception d’un prototype de recherche d’information multilingue (RIML) qui permet d’exploiter les requêtes en langage naturel dans n’importe langage, et présente les résultats de recherche sur les grappes de catégories visuelles. Ceci constitue une autre étape pour offrir l’accès équitable à la communauté internationale pour tous ceux qui possèdent une connexion Internet et un besoin informationnel.

Download Full-text

Issues and Challenges in Building Multilingual Information Access Systems

Emerging Applications of Natural Language Processing ◽

10.4018/978-1-4666-2169-5.ch008 ◽

2013 ◽

pp. 171-202

Author(s):

Vasudeva Varma ◽

Aditya Mogadala

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Query Processing ◽

Information Access ◽

Knowledge Based ◽

Research Areas ◽

Multilingual Information Retrieval ◽

Cross Language Information Retrieval ◽

Cross Lingual ◽

Cross Language

In this chapter, the authors start their discussion highlighting the importance of Cross Lingual and Multilingual Information Retrieval and access research areas. They then discuss the distinction between Cross Language Information Retrieval (CLIR), Multilingual Information Retrieval (MLIR), Cross Language Information Access (CLIA), and Multilingual Information Access (MLIA) research areas. In addition, in further sections, issues and challenges in these areas are outlined, and various approaches, including machine learning-based and knowledge-based approaches to address the multilingual information access, are discussed. The authors describe various subsystems of a MLIA system ranging from query processing to output generation by sharing their experience of building a MLIA system and discuss its architecture. Then evaluation aspects of the MLIA and CLIA systems are discussed at the end of this chapter.

Download Full-text

Applying Light Natural Language Processing to Ad-Hoc Cross Language Information Retrieval

Accessing Multilingual Information Repositories - Lecture Notes in Computer Science ◽

10.1007/11878773_19 ◽

2006 ◽

pp. 170-178 ◽

Cited By ~ 1

Author(s):

Christina Lioma ◽

Craig Macdonald ◽

Ben He ◽

Vassilis Plachouras ◽

Iadh Ounis

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Ad Hoc ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

A learning to rank approach for cross-language information retrieval exploiting multiple translation resources

Natural Language Engineering ◽

10.1017/s1351324919000032 ◽

2019 ◽

Vol 25 (3) ◽

pp. 363-384

Author(s):

Hosein Azarbonyad ◽

Azadeh Shakery ◽

Heshaam Faili

Keyword(s):

Information Retrieval ◽

Learning To Rank ◽

Combination Method ◽

Average Precision ◽

Language Difference ◽

Cross Language Information Retrieval ◽

Cross Lingual ◽

Effective Use ◽

Cross Language

AbstractCross-language information retrieval (CLIR), finding information in one language in response to queries expressed in another language, has attracted much attention due to the explosive growth of multilingual information in the World Wide Web. One important issue in CLIR is how to apply monolingual information retrieval (IR) methods in cross-lingual environments. Recently, learning to rank (LTR) approach has been successfully employed in different IR tasks. In this paper, we use LTR for CLIR. In order to adapt monolingual LTR techniques in CLIR and pass the barrier of language difference, we map monolingual IR features to CLIR ones using translation information extracted from different translation resources. The performance of CLIR is highly dependent on the size and quality of available bilingual resources. Effective use of available resources is especially important in low-resource language pairs. In this paper, we further propose an LTR-based method for combining translation resources in CLIR. We have studied the effectiveness of the proposed approach using different translation resources. Our results also show that LTR can be used to successfully combine different translation resources to improve the CLIR performance. In the best scenario, the LTR-based combination method improves the performance of single-resource-based CLIR method by 6% in terms of Mean Average Precision.

Download Full-text

Coreference Resolution: Toward End-to-End and Cross-Lingual Systems

Information ◽

10.3390/info11020074 ◽

2020 ◽

Vol 11 (2) ◽

pp. 74 ◽

Cited By ~ 1

Author(s):

André Ferreira Cruz ◽

Gil Rocha ◽

Henrique Lopes Cardoso

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Coreference Resolution ◽

Language Understanding ◽

External Resources ◽

End To End ◽

Cross Lingual ◽

Open Issues

The task of coreference resolution has attracted considerable attention in the literature due to its importance in deep language understanding and its potential as a subtask in a variety of complex natural language processing problems. In this study, we outlined the field’s terminology, describe existing metrics, their differences and shortcomings, as well as the available corpora and external resources. We analyzed existing state-of-the-art models and approaches, and reviewed recent advances and trends in the field, namely end-to-end systems that jointly model different subtasks of coreference resolution, and cross-lingual systems that aim to overcome the challenges of less-resourced languages. Finally, we discussed the main challenges and open issues faced by coreference resolution systems.

Download Full-text

A Framework for English-Odia Cross-Language Information Retrieval System

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2020/214922020 ◽

2020 ◽

Vol 9 (2) ◽

pp. 2320-2325

Author(s):

Gouranga Charan Jena

Keyword(s):

Information Retrieval ◽

Retrieval System ◽

Information Retrieval System ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text