TDSS: A New Word Sense Representation Framework for Information Retrieval

Developing the Persian Wordnet of Verbs Using Supervised Learning

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3450969 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-18

Author(s):

Zahra Mousavi ◽

Heshaam Faili

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Language Processing ◽

Supervised Classification ◽

Word Sense ◽

Direct Influence ◽

Training Set ◽

Bilingual Dictionary ◽

Automated Method ◽

Princeton Wordnet

Nowadays, wordnets are extensively used as a major resource in natural language processing and information retrieval tasks. Therefore, the accuracy of wordnets has a direct influence on the performance of the involved applications. This paper presents a fully-automated method for extending a previously developed Persian wordnet to cover more comprehensive and accurate verbal entries. At first, by using a bilingual dictionary, some Persian verbs are linked to Princeton WordNet synsets. A feature set related to the semantic behavior of compound verbs as the majority of Persian verbs is proposed. This feature set is employed in a supervised classification system to select the proper links for inclusion in the wordnet. We also benefit from a pre-existing Persian wordnet, FarsNet, and a similarity-based method to produce a training set. This is the largest automatically developed Persian wordnet with more than 27,000 words, 28,000 PWN synsets and 67,000 word-sense pairs that substantially outperforms the previous Persian wordnet with about 16,000 words, 22,000 PWN synsets and 38,000 word-sense pairs.

Download Full-text

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00034 ◽

2018 ◽

Vol 6 ◽

pp. 483-495 ◽

Cited By ~ 13

Author(s):

Sanjeev Arora ◽

Yuanzhi Li ◽

Yingyu Liang ◽

Tengyu Ma ◽

Andrej Risteski

Keyword(s):

Information Retrieval ◽

Random Walk ◽

Algebraic Structure ◽

Sparse Coding ◽

Word Sense ◽

Word Embeddings ◽

Linear Superposition ◽

Empirical Tests ◽

Embedding Methods ◽

Word Senses

Word embeddings are ubiquitous in NLP and information retrieval, but it is unclear what they represent when the word is polysemous. Here it is shown that multiple word senses reside in linear superposition within the word embedding and simple sparse coding can recover vectors that approximately capture the senses. The success of our approach, which applies to several embedding methods, is mathematically explained using a variant of the random walk on discourses model (Arora et al., 2016). A novel aspect of our technique is that each extracted word sense is accompanied by one of about 2000 “discourse atoms” that gives a succinct description of which other words co-occur with that word sense. Discourse atoms can be of independent interest, and make the method potentially more useful. Empirical tests are used to verify and support the theory.

Download Full-text

Evaluating Word Sense Disambiguation Tools for Information Retrieval Task

Lecture Notes in Computer Science - Evaluating Systems for Multilingual and Multimodal Information Access ◽

10.1007/978-3-642-04447-2_13 ◽

2009 ◽

pp. 113-117 ◽

Cited By ~ 3

Author(s):

Fernando Martínez-Santiago ◽

José M. Perea-Ortega ◽

Miguel A. García-Cumbreras

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Retrieval Task ◽

Sense Disambiguation

Download Full-text

A Comparative Analysis of Supervised Word Sense Disambiguation in Information Retrieval

Communication and Intelligent Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-1089-9_10 ◽

2021 ◽

pp. 111-120

Author(s):

Chandrakala Arya ◽

Manoj Diwakar ◽

Shobha Arya

Keyword(s):

Information Retrieval ◽

Comparative Analysis ◽

Word Sense Disambiguation ◽

Word Sense ◽

Sense Disambiguation

Download Full-text

Word Sense Language Model for Information Retrieval

Information Retrieval Technology - Lecture Notes in Computer Science ◽

10.1007/11880592_13 ◽

2006 ◽

pp. 158-171

Author(s):

Liqi Gao ◽

Yu Zhang ◽

Ting Liu ◽

Guiping Liu

Keyword(s):

Information Retrieval ◽

Language Model ◽

Word Sense

Download Full-text

Word Sense Representation based-method for Arabic Text Categorization

2018 9th International Symposium on Signal, Image, Video and Communications (ISIVC) ◽

10.1109/isivc.2018.8709234 ◽

2018 ◽

Cited By ~ 2

Author(s):

Fatima-Zahra El-Alami ◽

Said Ouatik El Alaoui

Keyword(s):

Text Categorization ◽

Arabic Text ◽

Word Sense ◽

Sense Representation

Download Full-text

Word sense disambiguation to improve precision for ambiguous queries

Open Computer Science ◽

10.2478/s13537-012-0032-6 ◽

2012 ◽

Vol 2 (4) ◽

Cited By ~ 2

Author(s):

Adrian-Gabriel Chifu ◽

Radu-Tudor Ionescu

Keyword(s):

Information Retrieval ◽

Ad Hoc ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Word Sense ◽

Interdisciplinary Approaches ◽

Sense Disambiguation ◽

Ranking Technique

AbstractSuccess in Information Retrieval (IR) depends on many variables. Several interdisciplinary approaches try to improve the quality of the results obtained by an IR system. In this paper we propose a new way of using word sense disambiguation (WSD) in IR. The method we develop is based on Naïve Bayes classification and can be used both as a filtering and as a re-ranking technique. We show on the TREC ad-hoc collection that WSD is useful in the case of queries which are difficult due to sense ambiguity. Our interest regards improving the precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30), respectively, for such lowest precision queries.

Download Full-text

Cross-Language Information Retrieval Using EuroWordNet and Word Sense Disambiguation

Lecture Notes in Computer Science - Advances in Information Retrieval ◽

10.1007/978-3-540-24752-4_24 ◽

2004 ◽

pp. 327-337 ◽

Cited By ~ 11

Author(s):

Paul Clough ◽

Mark Stevenson

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Cross Language Information Retrieval ◽

Sense Disambiguation ◽

Cross Language

Download Full-text

Lexical Ambiguity in Arabic Information Retrieval: The Case of Six Web-Based Search Engines

International Journal of English Linguistics ◽

10.5539/ijel.v10n3p219 ◽

2020 ◽

Vol 10 (3) ◽

pp. 219

Author(s):

Abdulfattah Omar ◽

Mohammed Aldawsari

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Word Sense Disambiguation ◽

Geographic Location ◽

Morphological Diversity ◽

Lexical Ambiguity ◽

Arabic Language ◽

Supervised Machine Learning ◽

Word Sense ◽

Negative Impacts

In recent years, both research and industry have shown an increasing interest in developing reliable information retrieval (IR) systems that can effectively address the growing demands of users worldwide. In spite of the relative success of IR systems in addressing the needs of users and even adapting to their environments, many problems remain unresolved. One main problem is lexical ambiguity which has negative impacts on the performance and reliability of IR systems. To date, lexical ambiguity has been one of the most frequently reported problems in the Arabic IR systems despite the development of different word sense disambiguation (WSD) techniques. This is largely attributed to the limitations of such techniques in addressing the issue of linguistic peculiarities. Hence, this study addresses these limitations by exploring the reasons for lexical ambiguity in IR applications in Arabic as one step towards reliable and practical solutions. For this purpose, the performances of six search engines Google, Bing, Baidu, Yahoo, Yandex, and Ask are evaluated. Results indicate that lexical ambiguities in Arabic IR applications are mainly due to the unique morphological and orthographic system of the Arabic language, in addition to its diglossia and the multiple colloquial dialects where sometimes mutual intelligibility is not achieved. For better disambiguation and IR performances in Arabic, this study proposes that clustering models based on supervised machine learning theory should be trained to address the morphological diversity of Arabic and its unique orthographic system. Search engines should also be adapted to the geographic location of the users in order to address the issue of vernacular dialects of Arabic. They should also be trained to automatically identify the different dialects. Finally, search engines should consider all varieties of Arabic and be able to interpret the queries regardless of the particular language adopted by the user.

Download Full-text

Book Information Retrieval Method Research Based on Word Sense Disambiguation

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/612/4/042027 ◽

2019 ◽

Vol 612 ◽

pp. 042027

Author(s):

Yahong Li

Keyword(s):

Information Retrieval ◽

Word Sense Disambiguation ◽

Word Sense ◽

Retrieval Method ◽

Sense Disambiguation

Download Full-text