Geographical queries reformulation using a parallel association rules generator to build spatial taxonomies

Geographical queries need a special process of reformulation by information retrieval systems (IRS) due to their specificities and hierarchical structure. This fact is ignored by most of web search engines. In this paper, we propose an automatic approach for building a spatial taxonomy, that models’ the notion of adjacency that will be used in the reformulation of the spatial part of a geographical query. This approach exploits the documents that are in top of the retrieved list when submitting a spatial entity, which is composed of a spatial relation and a noun of a city. Then, a transactional database is constructed, considering each document extracted as a transaction that contains the nouns of the cities sharing the country of the submitted query’s city. The algorithm frequent pattern growth (FP-growth) is applied to this database in his parallel version (parallel FP-growth: PFP) in order to generate association rules, that will form the country’s taxonomy in a Big Data context. Experiments has been conducted on Spark and their results show that query reformulation using the taxonomy constructed based on our proposed approach improves the precision and the effectiveness of the IRS.

Download Full-text

Information Retrieval systems and Web Search Engines: A Survey

10.22161/ijaers/nctet.2017.25 ◽

2017 ◽

Author(s):

Arun Kumar ◽

M. A. Jabbar ◽

Y.V. Bhaskar Reddy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Web Search Engines

Download Full-text

Arabic Query Expansion Using WordNet and Association Rules

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2016070104 ◽

2016 ◽

Vol 12 (3) ◽

pp. 51-64 ◽

Cited By ~ 16

Author(s):

Ahmed Abbache ◽

Farid Meziane ◽

Ghalem Belalem ◽

Fatma Zohra Belkredim

Keyword(s):

Performance Improvement ◽

Association Rules ◽

Query Expansion ◽

Arabic Language ◽

Selection Method ◽

Retrieval Performance ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Significant Performance ◽

Selection Of

Query expansion is the process of adding additional relevant terms to the original queries to improve the performance of information retrieval systems. However, previous studies showed that automatic query expansion using WordNet do not lead to an improvement in the performance. One of the main challenges of query expansion is the selection of appropriate terms. In this paper, the authors review this problem using Arabic WordNet and Association Rules within the context of Arabic Language. The results obtained confirmed that with an appropriate selection method, the authors are able to exploit Arabic WordNet to improve the retrieval performance. Their empirical results on a sub-corpus from the Xinhua collection showed that their automatic selection method has achieved a significant performance improvement in terms of MAP and recall and a better precision with the first top retrieved documents.

Download Full-text

A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines

Intelligent Systems Reference Library - Multimedia Forensics and Security ◽

10.1007/978-3-319-44270-9_6 ◽

2016 ◽

pp. 117-135 ◽

Cited By ~ 3

Author(s):

Safaa I. Hajeer ◽

Rasha M. Ismail ◽

Nagwa L. Badr ◽

Mohamed Fahmy Tolba

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Efficient Information ◽

Web Search Engines

Download Full-text

Analysis of Document Viewing Patterns of Web Search Engine Users

Web Mining ◽

10.4018/978-1-59140-414-9.ch016 ◽

2011 ◽

pp. 339-354 ◽

Cited By ~ 6

Author(s):

Bernard J. Jansen ◽

Amanda Spink

Keyword(s):

Information Seeking ◽

Web Search ◽

Real Data ◽

Temporal Analysis ◽

Log Analysis ◽

Web Page ◽

Retrieval Systems ◽

Web Information ◽

Information Interaction ◽

Information Retrieval Systems

This chapter reviews the concepts of Web results page and Web page viewing patterns by users of Web search engines. It presents the advantages of using traditional transaction log analysis in identifying these patterns, serving as a basis for Web usage mining. The authors also present the results of a temporal analysis of Web page viewing, illustrating that the user — information interaction is extremely short. By using real data collected from real users interacting with real Web information retrieval systems, the authors aim to highlight one aspect of the complex environment of Web information seeking.

Download Full-text

Domain-specific readability measures to improve information retrieval in the Persian language

The Electronic Library ◽

10.1108/el-01-2017-0007 ◽

2018 ◽

Vol 36 (3) ◽

pp. 430-444

Author(s):

Sholeh Arastoopoor

Keyword(s):

Information Retrieval ◽

Computer Science ◽

Web Search ◽

Content Type ◽

Domain Specific ◽

Search Results ◽

Persian Language ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Primary Focus

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.

Download Full-text

Improving the Retrieval of Arabic Web Search Results Using Enhanced k-Means Clustering Algorithm

Entropy ◽

10.3390/e23040449 ◽

2021 ◽

Vol 23 (4) ◽

pp. 449

Author(s):

Amjad F. Alsuhaim ◽

Aqil M. Azmi ◽

Muhammad Hussain

Keyword(s):

Information Retrieval ◽

Execution Time ◽

Clustering Algorithm ◽

Web Search ◽

Writing Style ◽

Search Query ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Ranked List

Traditional information retrieval systems return a ranked list of results to a user’s query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become ambiguous. For a search query, the user has to skim over the document to infer if the word has the same meaning they are after, which is a time-consuming task. It is hoped that clustering the retrieved documents will collate documents into clear and meaningful groups. In this paper, we use an enhanced k-means clustering algorithm, which yields a faster clustering time than the regular k-means. The algorithm uses the distance calculated from previous iterations to minimize the number of distance calculations. We propose a system to cluster Arabic search results using the enhanced k-means algorithm, labeling each cluster with the most frequent word in the cluster. This system will help Arabic web users identify each cluster’s topic and go directly to the required cluster. Experimentally, the enhanced k-means algorithm reduced the execution time by 60% for the stemmed dataset and 47% for the non-stemmed dataset when compared to the regular k-means, while slightly improving the purity.

Download Full-text

Arabic Query Expansion Using WordNet and Association Rules

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch054 ◽

2018 ◽

pp. 1239-1254 ◽

Cited By ~ 1

Author(s):

Ahmed Abbache ◽

Farid Meziane ◽

Ghalem Belalem ◽

Fatma Zohra Belkredim

Keyword(s):

Information Retrieval ◽

Association Rules ◽

Query Expansion ◽

Arabic Language ◽

Selection Method ◽

Empirical Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Significant Performance ◽

Selection Of

Download Full-text

An Evaluation of Two Commercial Deep Learning-Based Information Retrieval Systems for COVID-19 Literature

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa271 ◽

2020 ◽

Cited By ~ 1

Author(s):

Sarvesh Soni ◽

Kirk Roberts

Keyword(s):

Search Engines ◽

Web Search ◽

Scientific Information ◽

Future Health ◽

Rigorous Evaluation ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Empirical Performance ◽

Search Market ◽

Specific Search

Abstract The COVID-19 pandemic has resulted in a tremendous need for access to the latest scientific information, leading to both corpora for COVID-19 literature and search engines to query such data. While most search engine research is performed in academia with rigorous evaluation, major commercial companies dominate the web search market. Thus, it is expected that commercial pandemic-specific search engines will gain much higher traction than academic alternatives, leading to questions about the empirical performance of these tools. This paper seeks to empirically evaluate two commercial search engines for COVID-19 (Google and Amazon) in comparison to academic prototypes evaluated in the TREC-COVID task. We performed several steps to reduce bias in the manual judgments to ensure a fair comparison of all systems. We find the commercial search engines sizably under-performed those evaluated under TREC-COVID. This has implications for trust in popular health search engines and developing biomedical search engines for future health crises.

Download Full-text

State-of-the-Art Review on Relevance of Genetic Algorithm to Internet Web Search

Applied Computational Intelligence and Soft Computing ◽

10.1155/2012/152385 ◽

2012 ◽

Vol 2012 ◽

pp. 1-7 ◽

Cited By ~ 1

Author(s):

Kehinde Agbele ◽

Ademola Adesina ◽

Daniel Ekong ◽

Oluwafemi Ayangbekun

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Information Retrieval ◽

Information Needs ◽

Web Search ◽

State Of The Art ◽

Retrieval Systems ◽

Online Databases ◽

Information Retrieval Systems ◽

Survival Of The Fittest

People use search engines to find information they desire with the aim that their information needs will be met. Information retrieval (IR) is a field that is concerned primarily with the searching and retrieving of information in the documents and also searching the search engine, online databases, and Internet. Genetic algorithms (GAs) are robust, efficient, and optimizated methods in a wide area of search problems motivated by Darwin’s principles of natural selection and survival of the fittest. This paper describes information retrieval systems (IRS) components. This paper looks at how GAs can be applied in the field of IR and specifically the relevance of genetic algorithms to internet web search. Finally, from the proposals surveyed it turns out that GA is applied to diverse problem fields of internet web search.

Download Full-text