An Evaluation of Two Commercial Deep Learning-Based Information Retrieval Systems for COVID-19 Literature

Abstract The COVID-19 pandemic has resulted in a tremendous need for access to the latest scientific information, leading to both corpora for COVID-19 literature and search engines to query such data. While most search engine research is performed in academia with rigorous evaluation, major commercial companies dominate the web search market. Thus, it is expected that commercial pandemic-specific search engines will gain much higher traction than academic alternatives, leading to questions about the empirical performance of these tools. This paper seeks to empirically evaluate two commercial search engines for COVID-19 (Google and Amazon) in comparison to academic prototypes evaluated in the TREC-COVID task. We performed several steps to reduce bias in the manual judgments to ensure a fair comparison of all systems. We find the commercial search engines sizably under-performed those evaluated under TREC-COVID. This has implications for trust in popular health search engines and developing biomedical search engines for future health crises.

Download Full-text

Information Retrieval systems and Web Search Engines: A Survey

10.22161/ijaers/nctet.2017.25 ◽

2017 ◽

Author(s):

Arun Kumar ◽

M. A. Jabbar ◽

Y.V. Bhaskar Reddy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Web Search Engines

Download Full-text

A New Stemming Algorithm for Efficient Information Retrieval Systems and Web Search Engines

Intelligent Systems Reference Library - Multimedia Forensics and Security ◽

10.1007/978-3-319-44270-9_6 ◽

2016 ◽

pp. 117-135 ◽

Cited By ~ 3

Author(s):

Safaa I. Hajeer ◽

Rasha M. Ismail ◽

Nagwa L. Badr ◽

Mohamed Fahmy Tolba

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Web Search ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Efficient Information ◽

Web Search Engines

Download Full-text

Search Variability and Interlibrary Services

Infolib ◽

10.47267/2181-8207/2020/3-026 ◽

2020 ◽

Vol 24 (4) ◽

pp. 16-21

Author(s):

Irina Krasilnikova ◽

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Electronic Resources ◽

Scientific Publications ◽

Retrieval Systems ◽

Electronic Catalogs ◽

Information Retrieval Systems ◽

Modern Information ◽

Foreign Materials ◽

Search For Information

The urgency of the problem is associated with an increase in the number of electronic resources in many information and library institutions, the need to search for information from any sources, including external ones, the provision of documents from a group of funds (corporations), the presence of electronic catalogs and search systems. Finding information from catalogs and other search engines has always preceded the execution of orders in the interlibrary service. Borrowing and using documents from different collections (provision of interlibrary services) is possible only if there is up-to-date metadata of modern information retrieval systems (ISS). The purpose of the article is to summarize the results of studying several types of search engines. At the same time, attention was drawn to new scientific publications on the topic under study. An analysis of domestic and foreign materials on the options for searching for information is presented, which is very necessary for users, including those who are remote in the provision of interlibrary services.

Download Full-text

User Models for Adaptive Information Retrieval on the Web

International Journal of Adaptive Resilient and Autonomic Systems ◽

10.4018/jaras.2012070101 ◽

2012 ◽

Vol 3 (3) ◽

pp. 1-19

Author(s):

Max Chevalier ◽

Christine Julien ◽

Chantal Soulé-Dupuy

Keyword(s):

Information Retrieval ◽

Search Engines ◽

User Profile ◽

User Model ◽

User Models ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Adaptive Information Retrieval ◽

The Web

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.

Download Full-text

Three approaches to measuring recall on the Web: a systematic review

The Electronic Library ◽

10.1108/el-12-2019-0287 ◽

2020 ◽

Vol 38 (3) ◽

pp. 477-492

Author(s):

Mahdi Zeynali Tazehkandi ◽

Mohsen Nowkarizi

Keyword(s):

Search Engines ◽

Design Methodology ◽

Retrieval Algorithm ◽

Content Type ◽

The Third ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Practical Implications ◽

The Web ◽

Group Recall

Purpose The purpose of this paper is to present a review on the use of the recall metric for evaluating information retrieval systems, especially search engines. Design/methodology/approach This paper investigates different researchers’ views about recall metrics. Findings Five different definitions for recall were identified. For the first group, recall refers to completeness, but it does not specify where all the relevant documents are located. For the second group, recall refers to retrieving all the relevant documents from the collection. However, it seems that the term “collection” is ambiguous. For the third group (first approach), collection means the index of search engines and, for the fourth group (second approach), collection refers to the Web. For the fifth group (third approach), ranking of the retrieved documents should also be accounted for in calculating recall. Practical implications It can be said that in the first, second and third approaches, the components of the retrieval algorithm, the retrieval algorithm and crawler, and the retrieval algorithm and crawler and ranker, respectively, are evaluated. To determine the effectiveness of search engines for the use of users, it is better to use the third approach in recall measurement. Originality/value The value of this paper is to collect, identify and analyse literature that is used in recall. In addition, different views of researchers about recall are identified.

Download Full-text

Web Resources on Medical Tourism

Literacy Skill Development for Library Science Professionals - Advances in Library and Information Science ◽

10.4018/978-1-5225-7125-4.ch008 ◽

2019 ◽

pp. 174-195

Author(s):

S. Naseehath

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Medical Tourism ◽

Engine Performance ◽

Link Analysis ◽

Search Results ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

General Search

Webometric research has fallen into two main categories, namely link analysis and search engine evaluation. Search engines are also used to collect data for link analysis. A set of measurements is proposed for evaluating web search engine performance. Some measurements are adapted from the concepts of recall and precision, which are commonly used in evaluating traditional information retrieval systems. Others are newly developed to evaluate search engine stability, which is unique to web information retrieval systems. Overlapping of search results, annual growth of search results on each search engines, variation of results on search using synonyms are also used to evaluate the relative efficiency of search engines. In this study, the investigator attempts to conduct a webometric study on the topic medical tourism in Kerala using six search engines; these include three general search engines, namely Bing, Google, and Lycos, and three metasearch engines, namely Dogpile, ixquick, and WebCrawler.

Download Full-text

Analysis of Document Viewing Patterns of Web Search Engine Users

Web Mining ◽

10.4018/978-1-59140-414-9.ch016 ◽

2011 ◽

pp. 339-354 ◽

Cited By ~ 6

Author(s):

Bernard J. Jansen ◽

Amanda Spink

Keyword(s):

Information Seeking ◽

Web Search ◽

Real Data ◽

Temporal Analysis ◽

Log Analysis ◽

Web Page ◽

Retrieval Systems ◽

Web Information ◽

Information Interaction ◽

Information Retrieval Systems

This chapter reviews the concepts of Web results page and Web page viewing patterns by users of Web search engines. It presents the advantages of using traditional transaction log analysis in identifying these patterns, serving as a basis for Web usage mining. The authors also present the results of a temporal analysis of Web page viewing, illustrating that the user — information interaction is extremely short. By using real data collected from real users interacting with real Web information retrieval systems, the authors aim to highlight one aspect of the complex environment of Web information seeking.

Download Full-text

Interactive Query Expansion with Automatically Generated Category-Specific Thesauri

Text Databases and Document Management ◽

10.4018/978-1-878289-93-3.ch005 ◽

2011 ◽

pp. 103-117

Author(s):

Fabrizio Sebastiani

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Information Seeking ◽

Query Expansion ◽

Document Collections ◽

Web Based ◽

Interactive Query ◽

Retrieval Systems ◽

Subject Specific ◽

Information Retrieval Systems

The categorization of documents into subject-specific categories is a useful enhancement for large document collections addressed by information retrieval systems, as a user can first browse a category tree in search of the category that best matches her interests and then issue a query for more specific documents “from within the category.” This approach combines two modalities in information seeking that are most popular in Web-based search engines, i.e., category-based site browsing (as exemplified by, e.g., Yahoo™) and keyword-based document querying (as exemplified by, e.g., AltaVista™). Appropriate query expansion tools need to be provided, though, in order to allow the user to incrementally refine her query through further retrieval passes, thus allowing the system to produce a series of subsequent document rankings that hopefully converge to the user’s expected ranking. In this work we propose that automatically generated, category-specific “associative” thesauri be used for such purpose. We discuss a method for their generation and discuss how the thesaurus specific to a given category may usefully be endowed with “gateways” to the thesauri specific to its parent and children categories.

Download Full-text

Mathematical Information Retrieval Trends and Techniques

Advances in Computational Intelligence and Robotics - Deep Natural Language Processing and AI Applications for Industry 5.0 ◽

10.4018/978-1-7998-7728-8.ch005 ◽

2021 ◽

pp. 74-92

Author(s):

Pankaj Dadure ◽

Partha Pakray ◽

Sivaji Bandyopadhyay

Keyword(s):

Information Retrieval ◽

Search Engines ◽

Recommendation System ◽

Plagiarism Detection ◽

Domain Specific ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Mathematical Formulas ◽

And Mathematics ◽

Mathematical Information Retrieval

Mathematical formulas are widely used to express ideas and fundamental principles of science, technology, engineering, and mathematics. The rapidly growing research in science and engineering leads to a generation of a huge number of scientific documents which contain both textual as well as mathematical terms. In a scientific document, the sense of mathematical formulae is conveyed through the context and the symbolic structure which follows the strong domain specific conventions. In contrast to textual information, developed mathematical information retrieval systems have demonstrated the unique and elite indexing and matching approaches which are beneficial to the retrieval of formulae and scientific term. This chapter discusses the recent advancement in formula-based search engines, various formula representation styles and indexing techniques, benefits of formula-based search engines in various future applications like plagiarism detection, math recommendation system, etc.

Download Full-text

Domain-specific readability measures to improve information retrieval in the Persian language

The Electronic Library ◽

10.1108/el-01-2017-0007 ◽

2018 ◽

Vol 36 (3) ◽

pp. 430-444

Author(s):

Sholeh Arastoopoor

Keyword(s):

Information Retrieval ◽

Computer Science ◽

Web Search ◽

Content Type ◽

Domain Specific ◽

Search Results ◽

Persian Language ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Primary Focus

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.

Download Full-text