scholarly journals An Evaluation of Two Commercial Deep Learning-Based Information Retrieval Systems for COVID-19 Literature

Author(s):  
Sarvesh Soni ◽  
Kirk Roberts

Abstract The COVID-19 pandemic has resulted in a tremendous need for access to the latest scientific information, leading to both corpora for COVID-19 literature and search engines to query such data. While most search engine research is performed in academia with rigorous evaluation, major commercial companies dominate the web search market. Thus, it is expected that commercial pandemic-specific search engines will gain much higher traction than academic alternatives, leading to questions about the empirical performance of these tools. This paper seeks to empirically evaluate two commercial search engines for COVID-19 (Google and Amazon) in comparison to academic prototypes evaluated in the TREC-COVID task. We performed several steps to reduce bias in the manual judgments to ensure a fair comparison of all systems. We find the commercial search engines sizably under-performed those evaluated under TREC-COVID. This has implications for trust in popular health search engines and developing biomedical search engines for future health crises.

Infolib ◽  
2020 ◽  
Vol 24 (4) ◽  
pp. 16-21
Author(s):  
Irina Krasilnikova ◽  

The urgency of the problem is associated with an increase in the number of electronic resources in many information and library institutions, the need to search for information from any sources, including external ones, the provision of documents from a group of funds (corporations), the presence of electronic catalogs and search systems. Finding information from catalogs and other search engines has always preceded the execution of orders in the interlibrary service. Borrowing and using documents from different collections (provision of interlibrary services) is possible only if there is up-to-date metadata of modern information retrieval systems (ISS). The purpose of the article is to summarize the results of studying several types of search engines. At the same time, attention was drawn to new scientific publications on the topic under study. An analysis of domestic and foreign materials on the options for searching for information is presented, which is very necessary for users, including those who are remote in the provision of interlibrary services.


Author(s):  
Max Chevalier ◽  
Christine Julien ◽  
Chantal Soulé-Dupuy

Searching information can be realized thanks to specific tools called Information Retrieval Systems IRS (also called “search engines”). To provide more accurate results to users, most of such systems offer personalization features. To do this, each system models a user in order to adapt search results that will be displayed. In a multi-application context (e.g., when using several search engines for a unique query), personalization techniques can be considered as limited because the user model (also called profile) is incomplete since it does not exploit actions/queries coming from other search engines. So, sharing user models between several search engines is a challenge in order to provide more efficient personalization techniques. A semantic architecture for user profile interoperability is proposed to reach this goal. This architecture is also important because it can be used in many other contexts to share various resources models, for instance a document model, between applications. It is also ensuring the possibility for every system to keep its own representation of each resource while providing a solution to easily share it.


2020 ◽  
Vol 38 (3) ◽  
pp. 477-492
Author(s):  
Mahdi Zeynali Tazehkandi ◽  
Mohsen Nowkarizi

Purpose The purpose of this paper is to present a review on the use of the recall metric for evaluating information retrieval systems, especially search engines. Design/methodology/approach This paper investigates different researchers’ views about recall metrics. Findings Five different definitions for recall were identified. For the first group, recall refers to completeness, but it does not specify where all the relevant documents are located. For the second group, recall refers to retrieving all the relevant documents from the collection. However, it seems that the term “collection” is ambiguous. For the third group (first approach), collection means the index of search engines and, for the fourth group (second approach), collection refers to the Web. For the fifth group (third approach), ranking of the retrieved documents should also be accounted for in calculating recall. Practical implications It can be said that in the first, second and third approaches, the components of the retrieval algorithm, the retrieval algorithm and crawler, and the retrieval algorithm and crawler and ranker, respectively, are evaluated. To determine the effectiveness of search engines for the use of users, it is better to use the third approach in recall measurement. Originality/value The value of this paper is to collect, identify and analyse literature that is used in recall. In addition, different views of researchers about recall are identified.


Author(s):  
S. Naseehath

Webometric research has fallen into two main categories, namely link analysis and search engine evaluation. Search engines are also used to collect data for link analysis. A set of measurements is proposed for evaluating web search engine performance. Some measurements are adapted from the concepts of recall and precision, which are commonly used in evaluating traditional information retrieval systems. Others are newly developed to evaluate search engine stability, which is unique to web information retrieval systems. Overlapping of search results, annual growth of search results on each search engines, variation of results on search using synonyms are also used to evaluate the relative efficiency of search engines. In this study, the investigator attempts to conduct a webometric study on the topic medical tourism in Kerala using six search engines; these include three general search engines, namely Bing, Google, and Lycos, and three metasearch engines, namely Dogpile, ixquick, and WebCrawler.


Web Mining ◽  
2011 ◽  
pp. 339-354 ◽  
Author(s):  
Bernard J. Jansen ◽  
Amanda Spink

This chapter reviews the concepts of Web results page and Web page viewing patterns by users of Web search engines. It presents the advantages of using traditional transaction log analysis in identifying these patterns, serving as a basis for Web usage mining. The authors also present the results of a temporal analysis of Web page viewing, illustrating that the user — information interaction is extremely short. By using real data collected from real users interacting with real Web information retrieval systems, the authors aim to highlight one aspect of the complex environment of Web information seeking.


Author(s):  
Fabrizio Sebastiani

The categorization of documents into subject-specific categories is a useful enhancement for large document collections addressed by information retrieval systems, as a user can first browse a category tree in search of the category that best matches her interests and then issue a query for more specific documents “from within the category.” This approach combines two modalities in information seeking that are most popular in Web-based search engines, i.e., category-based site browsing (as exemplified by, e.g., Yahoo™) and keyword-based document querying (as exemplified by, e.g., AltaVista™). Appropriate query expansion tools need to be provided, though, in order to allow the user to incrementally refine her query through further retrieval passes, thus allowing the system to produce a series of subsequent document rankings that hopefully converge to the user’s expected ranking. In this work we propose that automatically generated, category-specific “associative” thesauri be used for such purpose. We discuss a method for their generation and discuss how the thesaurus specific to a given category may usefully be endowed with “gateways” to the thesauri specific to its parent and children categories.


Author(s):  
Pankaj Dadure ◽  
Partha Pakray ◽  
Sivaji Bandyopadhyay

Mathematical formulas are widely used to express ideas and fundamental principles of science, technology, engineering, and mathematics. The rapidly growing research in science and engineering leads to a generation of a huge number of scientific documents which contain both textual as well as mathematical terms. In a scientific document, the sense of mathematical formulae is conveyed through the context and the symbolic structure which follows the strong domain specific conventions. In contrast to textual information, developed mathematical information retrieval systems have demonstrated the unique and elite indexing and matching approaches which are beneficial to the retrieval of formulae and scientific term. This chapter discusses the recent advancement in formula-based search engines, various formula representation styles and indexing techniques, benefits of formula-based search engines in various future applications like plagiarism detection, math recommendation system, etc.


2018 ◽  
Vol 36 (3) ◽  
pp. 430-444
Author(s):  
Sholeh Arastoopoor

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.


Sign in / Sign up

Export Citation Format

Share Document