Spatio-Temporal Based Personalization for Mobile Search

The fact that a lexeme can appear in various forms causes problems in information retrieval. As a solution to this problem, we have developed methods for automatic root lemmatization, automatic truncation and automatic splitting of compound words. All the methods have as their basis a set of rules which contain information regarding inflected and derived forms of words – and not a dictionary. The methods have been tested on several collections of texts, and have produced very good results. By controlled experiments in text retrieval, we have studied the effects on search results. These results show that both the method of automatic root lemmatization and the method of automatic truncation make a considerable improvement on search quality. The experiments with splitting of compound words did not give quite the same improvement, however, but all the same this experiment showed that such a method could contribute to a richer and more complete search request.

Download Full-text

User Models for Adaptive Information Retrieval on the Web

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/jaras.2012070101 ◽

2012 ◽

Vol 3 (3) ◽

pp. 1-19

Author(s):

Max Chevalier ◽

Christine Julien ◽

Chantal Soulé-Dupuy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

User Profile ◽

User Model ◽

User Models ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Adaptive Information Retrieval ◽

The Web

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.

Download Full-text

Large-scale image search with text for information retrieval

Journal of Innovations in Engineering Education ◽

10.3126/jiee.v4i1.35390 ◽

2021 ◽

Vol 4 (1) ◽

pp. 87-89

Author(s):

Janardan Bhatta

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Large Scale ◽

Image Feature ◽

Image Search ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Text Features ◽

Text Query

Searching images in a large database is a major requirement in Information Retrieval Systems. Expecting image search results based on a text query is a challenging task. In this paper, we leverage the power of Computer Vision and Natural Language Processing in Distributed Machines to lower the latency of search results. Image pixel features are computed based on contrastive loss function for image search. Text features are computed based on the Attention Mechanism for text search. These features are aligned together preserving the information in each text and image feature. Previously, the approach was tested only in multilingual models. However, we have tested it in image-text dataset and it enabled us to search in any form of text or images with high accuracy.

Download Full-text

Web Resources on Medical Tourism

Literacy Skill Development for Library Science Professionals - Advances in Library and Information Science ◽

10.4018/978-1-5225-7125-4.ch008 ◽

2019 ◽

pp. 174-195

Author(s):

S. Naseehath

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Medical Tourism ◽

Engine Performance ◽

Link Analysis ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

General Search

Webometric research has fallen into two main categories, namely link analysis and search engine evaluation. Search engines are also used to collect data for link analysis. A set of measurements is proposed for evaluating web search engine performance. Some measurements are adapted from the concepts of recall and precision, which are commonly used in evaluating traditional information retrieval systems. Others are newly developed to evaluate search engine stability, which is unique to web information retrieval systems. Overlapping of search results, annual growth of search results on each search engines, variation of results on search using synonyms are also used to evaluate the relative efficiency of search engines. In this study, the investigator attempts to conduct a webometric study on the topic medical tourism in Kerala using six search engines; these include three general search engines, namely Bing, Google, and Lycos, and three metasearch engines, namely Dogpile, ixquick, and WebCrawler.

Download Full-text

Adaptive Information Retrieval Based on Task Context

Systems Science and Collaborative Information Systems ◽

10.4018/978-1-61350-201-3.ch008 ◽

2011 ◽

pp. 161-184

Author(s):

Bich-Liên Doan ◽

Jean-Paul Sansonnet

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Contextual Information ◽

Task Context ◽

Research Directions ◽

Query Process ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Adaptive Information Retrieval ◽

Intelligent Assistant

This chapter discusses using context in Information Retrieval systems and Intelligent Assistant Agents in order to improve the performance of these systems. The notion of context is introduced and the state of the art in Contextual Information Retrieval is presented which illustrates various categories of contexts that can be taken into account when solving user queries. In this framework, the authors focus on the issue of task-based context which takes into account the current activity the user is involved in when he puts a query. Finally they introduce promising research directions that promote the use of Intelligent Assistant Agents capable of symbolic reasoning about users’ tasks for supporting the query process.

Download Full-text

Method of Lexical Enrichment in Information Retrieval System in Arabic

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2013100103 ◽

2013 ◽

Vol 3 (4) ◽

pp. 35-51 ◽

Cited By ~ 3

Author(s):

Souheyl Mallat ◽

Anis Zouaghi ◽

Emna Hkiri ◽

Mounir Zrigui

Keyword(s):

Information Retrieval ◽

Retrieval System ◽

Semantic Analysis ◽

Contextual Information ◽

Information Retrieval System ◽

Enrichment Method ◽

Weighting Functions ◽

Retrieval Systems ◽

Significant Term ◽

Information Retrieval Systems

In this paper, the authors propose a method for lexical enrichment of Arabic queries in order to improve the performance of the information retrieval systems SRI. This method has two types of enrichment: linguistic and contextual. The first one is based on the linguistic analysis (lemmatization, morphological, syntactic and semantic analysis), whose goal is to generate a descriptive list (list-desc). This list contains a set of linguistic lexicon assigned to each significant term in the query. The second enrichment consists in integrating contextual information derived from the corpus documents. It is based on statistical analysis using Salton weighting functions: TF-IDF and TF-IEF. The TF-IDF function is applied on the list-desc and documents in the corpus in order to identify relevant documents. TF-IEF function is made between the list-desc and sentences belonging to the relevant documents to identify relevant sentences. Then, terms in these sentences are weighted, and those with highest weights are considered rich in terms of informative and contextual importance are added to the original query. The authors' lexical enrichment method was evaluated on a corpus of documents belonging to a specialized domain and results show its interest in terms of precision and recall.

Download Full-text

On The Reuse of Past Searches in Information Retrieval

International Journal of Information System Modeling and Design ◽

10.4018/ijismd.2015040103 ◽

2015 ◽

Vol 6 (2) ◽

pp. 72-92 ◽

Cited By ~ 2

Author(s):

Claudio Gutiérrez-Soto ◽

Gilles Hubert

Keyword(s):

Monte Carlo ◽

Information Retrieval ◽

Search Results ◽

Monte Carlo Algorithms ◽

Log Files ◽

Retrieval Systems ◽

Cosine Measure ◽

Information Retrieval Systems ◽

Single User ◽

Baseline Approach

When using information retrieval systems, information related to searches is typically stored in files, which are well known as log files. By contrast, past search results of previously submitted queries are ignored most of the time. Nevertheless, past search results can be profitable for new searches. Some approaches in Information Retrieval exploit the previous searches in a customizable way for a single user. On the contrary, approaches that deal with past searches collectively are less common. This paper deals with such an approach, by using past results of similar past queries submitted by other users, to build the answers for new submitted queries. It proposes two Monte Carlo algorithms to build the result for a new query by selecting relevant documents associated to the most similar past query. Experiments were carried out to evaluate the effectiveness of the proposed algorithms using several dataset variants. These algorithms were also compared with the baseline approach based on the cosine measure, from which they reuse past results. Simulated datasets were designed for the experiments, following the Cranfield paradigm, well established in the Information Retrieval domain. The empirical results show the interest of our approach.

Download Full-text

On The Reuse of Past Searches in Information Retrieval

Business Intelligence ◽

10.4018/978-1-4666-9562-7.ch057 ◽

2016 ◽

pp. 1117-1137 ◽

Cited By ~ 1

Author(s):

Claudio Gutiérrez-Soto ◽

Gilles Hubert

Keyword(s):

Monte Carlo ◽

Information Retrieval ◽

Search Results ◽

Monte Carlo Algorithms ◽

Log Files ◽

Retrieval Systems ◽

Cosine Measure ◽

Information Retrieval Systems ◽

Single User ◽

Baseline Approach

When using information retrieval systems, information related to searches is typically stored in files, which are well known as log files. By contrast, past search results of previously submitted queries are ignored most of the time. Nevertheless, past search results can be profitable for new searches. Some approaches in Information Retrieval exploit the previous searches in a customizable way for a single user. On the contrary, approaches that deal with past searches collectively are less common. This paper deals with such an approach, by using past results of similar past queries submitted by other users, to build the answers for new submitted queries. It proposes two Monte Carlo algorithms to build the result for a new query by selecting relevant documents associated to the most similar past query. Experiments were carried out to evaluate the effectiveness of the proposed algorithms using several dataset variants. These algorithms were also compared with the baseline approach based on the cosine measure, from which they reuse past results. Simulated datasets were designed for the experiments, following the Cranfield paradigm, well established in the Information Retrieval domain. The empirical results show the interest of our approach.

Download Full-text

Domain-specific readability measures to improve information retrieval in the Persian language

The Electronic Library ◽

10.1108/el-01-2017-0007 ◽

2018 ◽

Vol 36 (3) ◽

pp. 430-444

Author(s):

Sholeh Arastoopoor

Keyword(s):

Information Retrieval ◽

Computer Science ◽

Web Search ◽

Content Type ◽

Domain Specific ◽

Search Results ◽

Persian Language ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Primary Focus

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.

Download Full-text

Improving the Retrieval of Arabic Web Search Results Using Enhanced k-Means Clustering Algorithm

Entropy ◽

10.3390/e23040449 ◽

2021 ◽

Vol 23 (4) ◽

pp. 449

Author(s):

Amjad F. Alsuhaim ◽

Aqil M. Azmi ◽

Muhammad Hussain

Keyword(s):

Information Retrieval ◽

Execution Time ◽

Clustering Algorithm ◽

Web Search ◽

Writing Style ◽

Search Query ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Ranked List

Traditional information retrieval systems return a ranked list of results to a user’s query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become ambiguous. For a search query, the user has to skim over the document to infer if the word has the same meaning they are after, which is a time-consuming task. It is hoped that clustering the retrieved documents will collate documents into clear and meaningful groups. In this paper, we use an enhanced k-means clustering algorithm, which yields a faster clustering time than the regular k-means. The algorithm uses the distance calculated from previous iterations to minimize the number of distance calculations. We propose a system to cluster Arabic search results using the enhanced k-means algorithm, labeling each cluster with the most frequent word in the cluster. This system will help Arabic web users identify each cluster’s topic and go directly to the required cluster. Experimentally, the enhanced k-means algorithm reduced the execution time by 60% for the stemmed dataset and 47% for the non-stemmed dataset when compared to the regular k-means, while slightly improving the purity.

Download Full-text