A Fuzzy Logic Based Synonym Resolution Approach for Automated Information Retrieval

2020 ◽  
pp. 818-836
Author(s):  
Mamta Kathuria ◽  
Chander Kumar Nagpal ◽  
Neelam Duhan

Precise semantic similarity measurement between words is vital from the viewpoint of many automated applications in the areas of word sense disambiguation, machine translation, information retrieval and data clustering, etc. Rapid growth of the automated resources and their diversified novel applications has further reinforced this requirement. However, accurate measurement of semantic similarity is a daunting task due to inherent ambiguities of the natural language, spread of web documents across various domains, localities and dialects. All these issues render to the inadequacy of the manually maintained semantic similarity resources (i.e. dictionaries). This article uses context sets of the words under consideration in multiple corpora to compute semantic similarity and provides credible and verifiable semantic similarity results directly usable for automated applications in the intelligent manner using fuzzy inference mechanism. It can also be used to strengthen the existing lexical resources by augmenting the context set and properly defined extent of semantic similarity.

2018 ◽  
Vol 14 (4) ◽  
pp. 92-109
Author(s):  
Mamta Kathuria ◽  
Chander Kumar Nagpal ◽  
Neelam Duhan

Precise semantic similarity measurement between words is vital from the viewpoint of many automated applications in the areas of word sense disambiguation, machine translation, information retrieval and data clustering, etc. Rapid growth of the automated resources and their diversified novel applications has further reinforced this requirement. However, accurate measurement of semantic similarity is a daunting task due to inherent ambiguities of the natural language, spread of web documents across various domains, localities and dialects. All these issues render to the inadequacy of the manually maintained semantic similarity resources (i.e. dictionaries). This article uses context sets of the words under consideration in multiple corpora to compute semantic similarity and provides credible and verifiable semantic similarity results directly usable for automated applications in the intelligent manner using fuzzy inference mechanism. It can also be used to strengthen the existing lexical resources by augmenting the context set and properly defined extent of semantic similarity.


2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Adrian-Gabriel Chifu ◽  
Radu-Tudor Ionescu

AbstractSuccess in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.


PLoS ONE ◽  
2021 ◽  
Vol 16 (2) ◽  
pp. e0246751
Author(s):  
Ponrudee Netisopakul ◽  
Gerhard Wohlgenannt ◽  
Aleksei Pulich ◽  
Zar Zar Hlaing

Research into semantic similarity has a long history in lexical semantics, and it has applications in many natural language processing (NLP) tasks like word sense disambiguation or machine translation. The task of calculating semantic similarity is usually presented in the form of datasets which contain word pairs and a human-assigned similarity score. Algorithms are then evaluated by their ability to approximate the gold standard similarity scores. Many such datasets, with different characteristics, have been created for English language. Recently, four of those were transformed to Thai language versions, namely WordSim-353, SimLex-999, SemEval-2017-500, and R&G-65. Given those four datasets, in this work we aim to improve the previous baseline evaluations for Thai semantic similarity and solve challenges of unsegmented Asian languages (particularly the high fraction of out-of-vocabulary (OOV) dataset terms). To this end we apply and integrate different strategies to compute similarity, including traditional word-level embeddings, subword-unit embeddings, and ontological or hybrid sources like WordNet and ConceptNet. With our best model, which combines self-trained fastText subword embeddings with ConceptNet Numberbatch, we managed to raise the state-of-the-art, measured with the harmonic mean of Pearson on Spearman ρ, by a large margin from 0.356 to 0.688 for TH-WordSim-353, from 0.286 to 0.769 for TH-SemEval-500, from 0.397 to 0.717 for TH-SimLex-999, and from 0.505 to 0.901 for TWS-65.


2020 ◽  
Vol 10 (3) ◽  
pp. 219
Author(s):  
Abdulfattah Omar ◽  
Mohammed Aldawsari

In recent years, both research and industry have shown an increasing interest in developing reliable information retrieval (IR) systems that can effectively address the growing demands of users worldwide. In spite of the relative success of IR systems in addressing the needs of users and even adapting to their environments, many problems remain unresolved. One main problem is lexical ambiguity which has negative impacts on the performance and reliability of IR systems. To date, lexical ambiguity has been one of the most frequently reported problems in the Arabic IR systems despite the development of different word sense disambiguation (WSD) techniques. This is largely attributed to the limitations of such techniques in addressing the issue of linguistic peculiarities. Hence, this study addresses these limitations by exploring the reasons for lexical ambiguity in IR applications in Arabic as one step towards reliable and practical solutions. For this purpose, the performances of six search engines Google, Bing, Baidu, Yahoo, Yandex, and Ask are evaluated. Results indicate that lexical ambiguities in Arabic IR applications are mainly due to the unique morphological and orthographic system of the Arabic language, in addition to its diglossia and the multiple colloquial dialects where sometimes mutual intelligibility is not achieved. For better disambiguation and IR performances in Arabic, this study proposes that clustering models based on supervised machine learning theory should be trained to address the morphological diversity of Arabic and its unique orthographic system. Search engines should also be adapted to the geographic location of the users in order to address the issue of vernacular dialects of Arabic. They should also be trained to automatically identify the different dialects. Finally, search engines should consider all varieties of Arabic and be able to interpret the queries regardless of the particular language adopted by the user.


Author(s):  
Lluís Màrquez ◽  
Mariona Taulé ◽  
Lluís Padró ◽  
Luis Villarejo ◽  
Maria Antònia Martí

Sign in / Sign up

Export Citation Format

Share Document