Word-embedding-based query expansion: Incorporating Deep Averaging Networks in Arabic document retrieval

One of the main issues associated with search engines is the query–document vocabulary mismatch problem, a long-standing problem in Information Retrieval (IR). This problem occurs when a user query does not match the content of stored documents, and it affects most search tasks. Automatic query expansion (AQE) is one of the most common approaches used to address this problem. Various AQE techniques have been proposed; these mainly involve finding synonyms or related words for the query terms. Word embedding (WE) is one of the methods that are currently receiving significant attention. Most of the existing AQE techniques focus on expanding the individual query terms rather the entire query during the expansion process, and this can lead to query drift if poor expansion terms are selected. In this article, we introduce Deep Averaging Networks (DANs), an architecture that feeds the average of the WE vectors produced by the Word2Vec toolkit for the terms in a query through several linear neural network layers. This average vector is assumed to represent the meaning of the query as a whole and can be used to find expansion terms that are relevant to the complete query. We explore the potential of DANs for AQE in Arabic document retrieval. We experiment with using DANs for AQE in the classic probabilistic BM25 model as well as for two recent expansion strategies: Embedding-Based Query Expansion approach (EQE1) and Prospect-Guided Query Expansion Strategy (V2Q). Although DANs did not improve all outcomes when used in the BM25 model, it outperformed all baselines when incorporated into the EQE1 and V2Q expansion strategies.

Download Full-text

Relevance Feedback Based Query Expansion Model Using Borda Count and Semantic Similarity Approach

Computational Intelligence and Neuroscience ◽

10.1155/2015/568197 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 14

Author(s):

Jagendra Singh ◽

Aditi Sharan

Keyword(s):

Semantic Similarity ◽

Relevance Feedback ◽

Query Expansion ◽

Selection Method ◽

Rank Aggregation ◽

Borda Count ◽

Selection Methods ◽

Expansion Term ◽

User Query ◽

The Individual

Pseudo-Relevance Feedback (PRF) is a well-known method of query expansion for improving the performance of information retrieval systems. All the terms of PRF documents are not important for expanding the user query. Therefore selection of proper expansion term is very important for improving system performance. Individual query expansion terms selection methods have been widely investigated for improving its performance. Every individual expansion term selection method has its own weaknesses and strengths. To overcome the weaknesses and to utilize the strengths of the individual method, we used multiple terms selection methods together. In this paper, first the possibility of improving the overall performance using individual query expansion terms selection methods has been explored. Second, Borda count rank aggregation approach is used for combining multiple query expansion terms selection methods. Third, the semantic similarity approach is used to select semantically similar terms with the query after applying Borda count ranks combining approach. Our experimental results demonstrated that our proposed approaches achieved a significant improvement over individual terms selection method and related state-of-the-art methods.

Download Full-text

A personalized query expansion approach for engineering document retrieval

Advanced Engineering Informatics ◽

10.1016/j.aei.2014.04.002 ◽

2014 ◽

Vol 28 (4) ◽

pp. 344-359 ◽

Cited By ~ 20

Author(s):

Gyeong June Hahm ◽

Mun Yong Yi ◽

Jae Hyun Lee ◽

Hyo Won Suh

Keyword(s):

Query Expansion ◽

Document Retrieval

Download Full-text

Open-vocabulary spoken-document retrieval based on query expansion using related web documents

10.21437/interspeech.2008-568 ◽

2008 ◽

Author(s):

Makoto Terao ◽

Takafumi Koshinaka ◽

Shinichi Ando ◽

Ryosuke Isotani ◽

Akitoshi Okumura

Keyword(s):

Query Expansion ◽

Document Retrieval ◽

Spoken Document Retrieval ◽

Web Documents

Download Full-text

A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01247-9 ◽

2019 ◽

Cited By ~ 7

Author(s):

Dilip Kumar Sharma ◽

Rajendra Pamula ◽

D. S. Chauhan

Keyword(s):

Evolutionary Algorithm ◽

Query Expansion ◽

Retrieval System ◽

Document Retrieval ◽

Hybrid Evolutionary Algorithm

Download Full-text

Math-word embedding in math search and semantic extraction

Scientometrics ◽

10.1007/s11192-020-03502-9 ◽

2020 ◽

Vol 125 (3) ◽

pp. 3017-3046 ◽

Cited By ~ 1

Author(s):

André Greiner-Petter ◽

Abdou Youssef ◽

Terry Ruas ◽

Bruce R. Miller ◽

Moritz Schubotz ◽

...

Keyword(s):

Machine Learning ◽

Information Retrieval ◽

Language Processing ◽

Digital Library ◽

Question Answering ◽

Semantic Knowledge ◽

Word Embedding ◽

Mathematical Functions ◽

Search Tasks ◽

Math Search

AbstractWord embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning to natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of natural text, as well as math expressions that similarly exhibit linear correlation and contextual characteristics, word embedding techniques can also be applied to math documents. However, while mathematics is a precise and accurate science, it is usually expressed through imprecise and less accurate descriptions, contributing to the relative dearth of machine learning applications for information retrieval in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in word embedding, it is worthwhile to explore their use and effectiveness in math information retrieval tasks, such as math language processing and semantic knowledge extraction. In this paper, we explore math embedding by testing it on several different scenarios, namely, (1) math-term similarity, (2) analogy, (3) numerical concept-modeling based on the centroid of the keywords that characterize a concept, (4) math search using query expansions, and (5) semantic extraction, i.e., extracting descriptive phrases for math expressions. Due to the lack of benchmarks, our investigations were performed using the arXiv collection of STEM documents and carefully selected illustrations on the Digital Library of Mathematical Functions (DLMF: NIST digital library of mathematical functions. Release 1.0.20 of 2018-09-1, 2018). Our results show that math embedding holds much promise for similarity, analogy, and search tasks. However, we also observed the need for more robust math embedding approaches. Moreover, we explore and discuss fundamental issues that we believe thwart the progress in mathematical information retrieval in the direction of machine learning.

Download Full-text

A Multi-Agent Personalized Query Refinement Approach for Academic Paper Retrieval in Big Data Environment

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0874 ◽

2012 ◽

Vol 16 (7) ◽

pp. 874-880 ◽

Cited By ~ 1

Author(s):

Qian Gao ◽

◽

Young Im Cho ◽

Keyword(s):

Big Data ◽

Query Expansion ◽

Query Refinement ◽

Retrieval Method ◽

Expansion Strategy ◽

Academic Paper ◽

Different Types ◽

Multi Agent ◽

Intelligent Devices ◽

Data Environment

This paper proposes a multi-agent query refinement approach to realize personalized query expansion effective for academic paper retrieval in a Big Data environment. First, we use Hadoop as a platform to develop a formalized model to represent different types of large caches of data in order to analyze and process Big Data efficiently. Second, we use a client agent to verify user identities and monitor whether a device is ready to run a query-expanded task. We then use a query expansion agent to determine the domain that the initial query belongs to by applying a knowledgebased query expansion strategy and comprehensively considering users’ interests according to the intelligent devices they use by implementing a user-device-based query expansion strategy and a weighted query expansion strategy in order to obtain the optimized query expansion set. We compare our method with the conceptual retrieval method as well as other two lexical methods for query expansion, and we prove that our method has better average recall and average precision ratios.

Download Full-text

Enhanced word embedding similarity measures using fuzzy rules for query expansion

2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) ◽

10.1109/fuzz-ieee.2017.8015482 ◽

2017 ◽

Cited By ~ 2

Author(s):

Qian Liu ◽

Heyan Huang ◽

Jie Lut ◽

Yang Gao ◽

Guangquan Zhang

Keyword(s):

Query Expansion ◽

Similarity Measures ◽

Word Embedding ◽

Fuzzy Rules

Download Full-text

Using UMLS-based Re-Weighting Terms as a Query Expansion Strategy

2006 IEEE International Conference on Granular Computing ◽

10.1109/grc.2006.1635786 ◽

2006 ◽

Cited By ~ 2

Author(s):

Weizhong Zhu ◽

Xuheng Xu ◽

Xiaohua Hu ◽

Il-Yeol Song ◽

R.B. Allen

Keyword(s):

Query Expansion ◽

Expansion Strategy

Download Full-text

Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE)

Jurnal Buana Informatika ◽

10.24002/jbi.v6i3.433 ◽

2015 ◽

Vol 6 (3) ◽

Author(s):

Resti Ludviani ◽

Khadijah F. Hayati ◽

Agus Zainal Arifin ◽

Diana Purwitasari

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval ◽

Retrieval Performance ◽

Term Weighting ◽

New Approach ◽

Term Selection ◽

Relevance Evaluation ◽

Feedback Module ◽

Pseudo Feedback

Abstract. An appropriate selection term for expanding a query is very important in query expansion. Therefore, term selection optimization is added to improve query expansion performance on document retrieval system. This study proposes a new approach named Term Relatedness to Query-Entropy based (TRQE) to optimize weight in query expansion by considering semantic and statistic aspects from relevance evaluation of pseudo feedback to improve document retrieval performance. The proposed method has 3 main modules, they are relevace feedback, pseudo feedback, and document retrieval. TRQE is implemented in pseudo feedback module to optimize weighting term in query expansion. The evaluation result shows that TRQE can retrieve document with the highest result at precission of 100% and recall of 22,22%. TRQE for weighting optimization of query expansion is proven to improve retrieval document.Â Â Â Â Keywords: TRQE, query expansion, term weighting, term relatedness to query, relevance feedbackÂ Abstrak..Pemilihan term yang tepat untuk memperluas queri merupakan hal yang penting pada query expansion. Oleh karena itu, perlu dilakukan optimasi penentuan term yang sesuai sehingga mampu meningkatkan performa query expansion pada system temu kembali dokumen. Penelitian ini mengajukan metode Term Relatedness to Query-Entropy based (TRQE), sebuah metode untuk mengoptimasi pembobotan pada query expansion dengan memperhatikan aspek semantic dan statistic dari penilaian relevansi suatu pseudo feedback sehingga mampu meningkatkan performa temukembali dokumen. Metode yang diusulkan memiliki 3 modul utama yaitu relevan feedback, pseudo feedback, dan document retrieval. TRQE diimplementasikan pada modul pseudo feedback untuk optimasi pembobotan term pada ekspansi query. Evaluasi hasil uji coba menunjukkan bahwa metode TRQE dapat melakukan temukembali dokumen dengan hasil terbaik pada precisionÂ 100% dan recall sebesar 22,22%.Metode TRQE untuk optimasi pembobotan pada query expansion terbukti memberikan pengaruh untuk meningkatkan relevansi pencarian dokumen.Kata Kunci: TRQE, ekspansi query, pembobotan term, term relatedness to query, relevance feedback

Download Full-text

Collective Evolutionary Concept Distance Based Query Expansion for Effective Web Document Retrieval

Lecture Notes in Computer Science - Computational Science and Its Applications – ICCSA 2013 ◽

10.1007/978-3-642-39649-6_47 ◽

2013 ◽

pp. 657-672 ◽

Cited By ~ 16

Author(s):

Clement H. C. Leung ◽

Yuanxi Li ◽

Alfredo Milani ◽

Valentina Franzoni

Keyword(s):

Query Expansion ◽

Document Retrieval ◽

Web Document

Download Full-text