Leaving No Stone Unturned: Flexible Retrieval of Idiomatic Expressions from a Large Text Corpus

Idioms are multi-word expressions whose meaning cannot always be deduced from the literal meaning of constituent words. A key feature of idioms that is central to this paper is their peculiar mixture of fixedness and variability, which poses challenges for their retrieval from large corpora using traditional search approaches. These challenges hinder insights into idiom usage, affecting users who are conducting linguistic research as well as those involved in language education. To facilitate access to idiom examples taken from real-world contexts, we introduce an information retrieval system designed specifically for idioms. Given a search query that represents an idiom, typically in its canonical form, the system expands it automatically to account for the most common types of idiom variation including inflection, open slots, adjectival or adverbial modification and passivisation. As a by-product of query expansion, other types of idiom variation captured include derivation, compounding, negation, distribution across multiple clauses as well as other unforeseen types of variation. The system was implemented on top of Elasticsearch, an open-source, distributed, scalable, real-time search engine. Flexible retrieval of idioms is supported by a combination of linguistic pre-processing of the search queries, their translation into a set of query clauses written in a query language called Query DSL, and analysis, an indexing process that involves tokenisation and normalisation. Our system outperformed the phrase search in terms of recall and outperformed the keyword search in terms of precision. Out of the three, our approach was found to provide the best balance between precision and recall. By providing a fast and easy way of finding idioms in large corpora, our approach can facilitate further developments in fields such as linguistics, language education and natural language processing.

Download Full-text

Query Expansion using Semantic Network for Information Retrieval in Telugu Language

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1586.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 874-877

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Query Expansion ◽

Retrieval System ◽

Semantic Network ◽

Word Sense Disambiguation ◽

Ambiguous Word ◽

Information Retrieval System ◽

Word Sense ◽

Sense Disambiguation

Now-a-days digital documents are playing a major role in all the areas /web, as such all the information is digitalised. Queries are used by the search engines to retrieve the information. Query plays a major role in information retrieval system, as a result relevant and non relevant documents are retrieved. Query expansion techniques will better the performance of the information retrieval system. Our proposed query expansion technique is Word Sense Disambiguation. This is to find the correct sense of the ambiguous word in regional Telugu language. In Query expansion, if the added query term is an ambiguous word, accuracy of relevant documents will be very less. So to avoid this, proposed method Word Sense Disambiguation (WSD) is used, which is related to NLP Natural Language Processing and Artificial Intelligence AI. WSD improves the accuracy of information retrieval system.

Download Full-text

An Ontology Based Information Retrieval System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b3781.079220 ◽

2020 ◽

Vol 9 (2) ◽

pp. 638-643

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Keyword Search ◽

Domain Ontology ◽

Information Retrieval System ◽

Search System ◽

Domain Specific ◽

System A ◽

Schema Graph

Ontology provide a structured way of describing knowledge. Ontology's are usually repositories of concepts and relations between them, so using them in information retrieval seems to be a reasonable goal. The main objective in this report is to provide efficient means to move from keyword-based to concept-based information retrieval utilizing ontology's for conceptual definitions [1]. In this paper, we present the skeleton of such an IR system which works on a collection of domain specific documents and exploits the use of a domain specific ontology to improve the overall number of relevant documents retrieved. In this system, a user enters a query from which the meaningful concepts are extracted; using these concepts and domain ontology, query expansion is performed. We propose a system that matches the query terms in the ontology/schema graph and exploits the surrounding knowledge to derive an enhanced query. The enhanced query is given to the underlying basic keyword search system LUCENE [2]. In this approach we try to make use of more ontological Knowledge than IS-A and HAS-A relationships and synonyms for information retrieval.

Download Full-text

Query expansion with a medical ontology to improve a multimodal information retrieval system

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2009.01.012 ◽

2009 ◽

Vol 39 (4) ◽

pp. 396-403 ◽

Cited By ~ 41

Author(s):

M.C. Díaz-Galiano ◽

M.T Martín-Valdivia ◽

L.A. Ureña-López

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Information Retrieval System ◽

Multimodal Information ◽

Multimodal Information Retrieval

Download Full-text

An Improved VSM Based Information Retrieval System and Fuzzy Query Expansion

Fuzzy Systems and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/11539506_68 ◽

2005 ◽

pp. 537-546 ◽

Cited By ~ 2

Author(s):

Jiangning Wu ◽

Hiroki Tanioka ◽

Shizhu Wang ◽

Donghua Pan ◽

Kenichi Yamamoto ◽

...

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Information Retrieval System ◽

Fuzzy Query

Download Full-text

Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch014 ◽

2012 ◽

pp. 216-242

Author(s):

Bilel Elayeb ◽

Ibrahim Bounhas ◽

Oussama Ben Khiroun ◽

Fabrice Evrard ◽

Narjès Bellamine-BenSaoud

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Possibility Theory ◽

Information Retrieval System ◽

Test Collection ◽

Expansion Process ◽

Semantic Query ◽

Linguistic Resources ◽

Relevance Measure

This paper presents a new possibilistic information retrieval system using semantic query expansion. The work is involved in query expansion strategies based on external linguistic resources. In this case, the authors exploited the French dictionary “Le Grand Robert”. First, they model the dictionary as a graph and compute similarities between query terms by exploiting the circuits in the graph. Second, the possibility theory is used by taking advantage of a double relevance measure (possibility and necessity) between the articles of the dictionary and query terms. Third, these two approaches are combined by using two different aggregation methods. The authors also benefit from an existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process. In order to assess and compare the approaches, the authors performed experiments on the standard ‘LeMonde94’ test collection.

Download Full-text

Luppar: An Information Retrieval System for Closed Document Collections

10.5753/eniac.2018.4478 ◽

2018 ◽

Author(s):

Fabiano Tavares Da Silva ◽

José Everardo Bessa Maia

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Retrieval System ◽

Information Retrieval System ◽

Local Context ◽

Semantic Model ◽

Context Analysis ◽

Document Collections

This article presents Luppar, an Information Retrieval tool for closed collections of documents which uses a local distributional semantic model associated to each corpus. The system performs automatic query expansion using a combination of distributional semantic model and local context analysis and supports relevancy feedback. The performance of the system was evaluated in databases of different domains and presented results equal to or higher than those published in the literature.

Download Full-text

SCALABLE INFORMATION RETRIEVAL SYSTEM IN SEMANTIC WEB BY QUERY EXPANSION AND ONTOLOGICAL BASED LSA RANKING SIMILARITY MEASUREMENT

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2020.10013899 ◽

2020 ◽

Vol 17 (2) ◽

pp. 1

Author(s):

Uma Devi M ◽

Meera Gandhi G

Keyword(s):

Information Retrieval ◽

Semantic Web ◽

Query Expansion ◽

Retrieval System ◽

Information Retrieval System ◽

Similarity Measurement

Download Full-text

Lexical Co-Occurrence and Contextual Window-Based Approach with Semantic Similarity for Query Expansion

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2017070104 ◽

2017 ◽

Vol 13 (3) ◽

pp. 57-78 ◽

Cited By ~ 5

Author(s):

Jagendra Singh ◽

Rakesh Kumar

Keyword(s):

Semantic Similarity ◽

Query Expansion ◽

Ad Hoc ◽

Retrieval System ◽

Hybrid Approach ◽

Information Retrieval System ◽

Query Reformulation ◽

Baseline Method ◽

Benchmark Datasets ◽

Pseudo Feedback

Query expansion (QE) is an efficient method for enhancing the efficiency of information retrieval system. In this work, we try to capture the limitations of pseudo-feedback based QE approach and propose a hybrid approach for enhancing the efficiency of feedback based QE by combining corpus-based, contextual based information of query terms, and semantic based knowledge of query terms. First of all, this paper explores the use of different corpus-based lexical co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using pseudo-feedback based QE. Next, we explore semantic similarity approach based on word2vec for ranking the QE terms obtained from top pseudo-feedback documents. Further, we combine co-occurrence statistics, contextual window statistics, and semantic similarity based approaches together to select the best expansion terms for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets. The statistics of our proposed experimental results show significant improvement over baseline method.

Download Full-text

Scalable information retrieval system in semantic web by query expansion and ontological-based LSA ranking similarity measurement

International Journal of Advanced Intelligence Paradigms ◽

10.1504/ijaip.2020.108759 ◽

2020 ◽

Vol 17 (1/2) ◽

pp. 44

Author(s):

M. Uma Devi ◽

G. Meera Gandhi

Keyword(s):

Information Retrieval ◽

Semantic Web ◽

Query Expansion ◽

Retrieval System ◽

Information Retrieval System ◽

Similarity Measurement

Download Full-text

A natural language system for retrieval of captioned images

Natural Language Engineering ◽

10.1017/s1351324901002571 ◽

2001 ◽

Vol 7 (2) ◽

pp. 117-142 ◽

Cited By ~ 1

Author(s):

DAVID ELWORTHY ◽

TONY ROSE ◽

AMANDA CLARE ◽

AARON KOTCHEFF

Keyword(s):

Natural Language ◽

Language Processing ◽

Retrieval System ◽

Contextual Information ◽

Information Retrieval System ◽

Matching Algorithm ◽

Successful Match ◽

Engineering Standards ◽

Dependency Structures ◽

Processing Techniques

ANVIL is an information retrieval system using natural language processing techniques, intended for retrieval of captioned images. It extracts dependency structures from the image captions and user queries, and then applies a high accuracy matching algorithm which recursively explores the dependency structures to determine their similarity. A further algorithm allows additional contextual information to be extracted following a successful match, with the intention of helping users understand and organise the retrieval results. ANVIL was developed to high engineering standards, and as well as looking at the research aspects of the system, we also look at some of the design and development issues. English and Japanese versions of the system have been developed.

Download Full-text