Lexical Ambiguity in Arabic Information Retrieval: The Case of Six Web-Based Search Engines

In recent years, both research and industry have shown an increasing interest in developing reliable information retrieval (IR) systems that can effectively address the growing demands of users worldwide. In spite of the relative success of IR systems in addressing the needs of users and even adapting to their environments, many problems remain unresolved. One main problem is lexical ambiguity which has negative impacts on the performance and reliability of IR systems. To date, lexical ambiguity has been one of the most frequently reported problems in the Arabic IR systems despite the development of different word sense disambiguation (WSD) techniques. This is largely attributed to the limitations of such techniques in addressing the issue of linguistic peculiarities. Hence, this study addresses these limitations by exploring the reasons for lexical ambiguity in IR applications in Arabic as one step towards reliable and practical solutions. For this purpose, the performances of six search engines Google, Bing, Baidu, Yahoo, Yandex, and Ask are evaluated. Results indicate that lexical ambiguities in Arabic IR applications are mainly due to the unique morphological and orthographic system of the Arabic language, in addition to its diglossia and the multiple colloquial dialects where sometimes mutual intelligibility is not achieved. For better disambiguation and IR performances in Arabic, this study proposes that clustering models based on supervised machine learning theory should be trained to address the morphological diversity of Arabic and its unique orthographic system. Search engines should also be adapted to the geographic location of the users in order to address the issue of vernacular dialects of Arabic. They should also be trained to automatically identify the different dialects. Finally, search engines should consider all varieties of Arabic and be able to interpret the queries regardless of the particular language adopted by the user.

Download Full-text

Sense-Based Arabic Information Retrieval Using Harmony Search Algorithm

Iraqi Journal for Computers and Informatics ◽

10.25195/ijci.v43i2.59 ◽

2017 ◽

Vol 43 (2) ◽

pp. 13-22

Author(s):

Alia Abdul Hassan ◽

Mustafa Hadi

Keyword(s):

Information Retrieval ◽

Search Algorithm ◽

Word Sense Disambiguation ◽

Harmony Search ◽

Arabic Language ◽

Harmony Search Algorithm ◽

Word Sense ◽

System Efficiency ◽

Traditional System ◽

The One

Information Retrieval (IR) is a field of computer science that deals with storing, searching, and retrievingdocuments that satisfy the user need. The modern standard Arabic language is rich in multiple meanings (senses) for manywords and this is substantially due to lack of diacritical marks. The task for finding appropriate meanings is a key demand inmost of the Arabic IR applications. Actually, the successful system should not be interested only in the retrieval quality andoblivious to the system efficiency. Thus, this paper contributes to improve the system effectiveness by finding appropriatestemming methodology, word sense disambiguation, and query expansion for addressing the retrieval quality of AIR. Also, itcontributes to improve the system efficiency through using a powerful metaheuristic search called Harmony Search (HS)algorithm inspired from the musical improvisation processes. The performance of the proposed system outperforms the one inthe traditional system in a rate of 19.5% while reduces the latency in an approximate rate of 0.077 second for each query.

Download Full-text

Evaluating Word Sense Disambiguation Tools for Information Retrieval Task

Lecture Notes in Computer Science - Evaluating Systems for Multilingual and Multimodal Information Access ◽

10.1007/978-3-642-04447-2_13 ◽

2009 ◽

pp. 113-117 ◽

Cited By ~ 3

Author(s):

Fernando Martínez-Santiago ◽

José M. Perea-Ortega ◽

Miguel A. García-Cumbreras

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Retrieval Task ◽

Sense Disambiguation

Download Full-text

A Comparative Analysis of Supervised Word Sense Disambiguation in Information Retrieval

Communication and Intelligent Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-1089-9_10 ◽

2021 ◽

pp. 111-120

Author(s):

Chandrakala Arya ◽

Manoj Diwakar ◽

Shobha Arya

Keyword(s):

Information Retrieval ◽

Comparative Analysis ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Word sense disambiguation to improve precision for ambiguous queries

Open Computer Science ◽

10.2478/s13537-012-0032-6 ◽

2012 ◽

Vol 2 (4) ◽

Cited By ~ 2

Author(s):

Adrian-Gabriel Chifu ◽

Radu-Tudor Ionescu

Keyword(s):

Information Retrieval ◽

Ad Hoc ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Word Sense ◽

Interdisciplinary Approaches ◽

Sense Disambiguation ◽

Ranking Technique

AbstractSuccess in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.

Download Full-text

Cross-Language Information Retrieval Using EuroWordNet and Word Sense Disambiguation

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-540-24752-4_24 ◽

2004 ◽

pp. 327-337 ◽

Cited By ~ 11

Author(s):

Paul Clough ◽

Mark Stevenson

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Cross Language Information Retrieval ◽

Sense Disambiguation ◽

Cross Language

Download Full-text

Book Information Retrieval Method Research Based on Word Sense Disambiguation

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/612/4/042027 ◽

2019 ◽

Vol 612 ◽

pp. 042027

Author(s):

Yahong Li

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Retrieval Method ◽

Sense Disambiguation

Download Full-text

Bio-molecular Event Trigger Extraction by Word Sense Disambiguation Based on Supervised Machine Learning Using Wordnet-Based Data Decomposition and Feature Selection

Advances in Intelligent Systems and Computing - Proceedings of the Global AI Congress 2019 ◽

10.1007/978-981-15-2188-1_31 ◽

2020 ◽

pp. 391-398

Author(s):

Amit Majumder ◽

Asif Ekbal ◽

Sudip Kumar Naskar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Word Sense Disambiguation ◽

Supervised Machine Learning ◽

Word Sense ◽

Molecular Event ◽

Data Decomposition ◽

Sense Disambiguation ◽

Event Trigger

Download Full-text

A Fuzzy Logic Based Synonym Resolution Approach for Automated Information Retrieval

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch040 ◽

2020 ◽

pp. 818-836

Author(s):

Mamta Kathuria ◽

Chander Kumar Nagpal ◽

Neelam Duhan

Keyword(s):

Information Retrieval ◽

Semantic Similarity ◽

Fuzzy Inference ◽

Word Sense Disambiguation ◽

Word Sense ◽

Lexical Resources ◽

Language Spread ◽

Semantic Similarity Measurement ◽

Fuzzy Inference Mechanism ◽

Novel Applications

Precise semantic similarity measurement between words is vital from the viewpoint of many automated applications in the areas of word sense disambiguation, machine translation, information retrieval and data clustering, etc. Rapid growth of the automated resources and their diversified novel applications has further reinforced this requirement. However, accurate measurement of semantic similarity is a daunting task due to inherent ambiguities of the natural language, spread of web documents across various domains, localities and dialects. All these issues render to the inadequacy of the manually maintained semantic similarity resources (i.e. dictionaries). This article uses context sets of the words under consideration in multiple corpora to compute semantic similarity and provides credible and verifiable semantic similarity results directly usable for automated applications in the intelligent manner using fuzzy inference mechanism. It can also be used to strengthen the existing lexical resources by augmenting the context set and properly defined extent of semantic similarity.

Download Full-text