scholarly journals TREC-COVID: Building a Pandemic Retrieval Test Collection

2021 ◽  
Vol 17 (2) ◽  
pp. 8-10
Author(s):  
Ellen Voorhees ◽  
Evangelos Kanoulas

Assessing how good is a search engine has been an active area of development for more than three decades. During the COVID-19 pandemic however the rate of change in what people are interested in, and the availableinformation online has introduced further challenges for search. TREC-COVID introduces a benchmark collectionto evaluate search engines and provide the means to improve them under the special circumstances of a pandemic.

2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2020 ◽  
Vol 19 (10) ◽  
pp. 1602-1618 ◽  
Author(s):  
Thibault Robin ◽  
Julien Mariethoz ◽  
Frédérique Lisacek

A key point in achieving accurate intact glycopeptide identification is the definition of the glycan composition file that is used to match experimental with theoretical masses by a glycoproteomics search engine. At present, these files are mainly built from searching the literature and/or querying data sources focused on posttranslational modifications. Most glycoproteomics search engines include a default composition file that is readily used when processing MS data. We introduce here a glycan composition visualizing and comparative tool associated with the GlyConnect database and called GlyConnect Compozitor. It offers a web interface through which the database can be queried to bring out contextual information relative to a set of glycan compositions. The tool takes advantage of compositions being related to one another through shared monosaccharide counts and outputs interactive graphs summarizing information searched in the database. These results provide a guide for selecting or deselecting compositions in a file in order to reflect the context of a study as closely as possible. They also confirm the consistency of a set of compositions based on the content of the GlyConnect database. As part of the tool collection of the Glycomics@ExPASy initiative, Compozitor is hosted at https://glyconnect.expasy.org/compozitor/ where it can be run as a web application. It is also directly accessible from the GlyConnect database.


2001 ◽  
Vol 1 (3) ◽  
pp. 28-31 ◽  
Author(s):  
Valerie Stevenson

Looking back to 1999, there were a number of search engines which performed equally well. I recommended defining the search strategy very carefully, using Boolean logic and field search techniques, and always running the search in more than one search engine. Numerous articles and Web columns comparing the performance of different search engines came to different conclusions on the ‘best’ search engines. Over the last year, however, all the speakers at conferences and seminars I have attended have recommended Google as their preferred tool for locating all kinds of information on the Web. I confess that I have now abandoned most of my carefully worked out search strategies and comparison tests, and use Google for most of my own Web searches.


2010 ◽  
Vol 44-47 ◽  
pp. 4041-4049 ◽  
Author(s):  
Hong Zhao ◽  
Chen Sheng Bai ◽  
Song Zhu

Search engines can bring a lot of benefit to the website. For a site, each page’s search engine ranking is very important. To make web page ranking in search engine ahead, Search engine optimization (SEO) make effect on the ranking. Web page needs to set the keywords as “keywords" to use SEO. The paper focuses on the content of a given word, and extracts the keywords of each page by calculating the word frequency. The algorithm is implemented by C # language. Keywords setting of webpage are of great importance on the information and products


2019 ◽  
Vol 71 (1) ◽  
pp. 54-71 ◽  
Author(s):  
Artur Strzelecki

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.


Author(s):  
Novario Jaya Perdana

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.


2016 ◽  
Author(s):  
Paolo Corti ◽  
Benjamin G Lewis ◽  
Tom Kralidis ◽  
Jude Mwenda

A Spatial Database Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software component of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogs and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalog, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.


2016 ◽  
Author(s):  
Paolo Corti ◽  
Benjamin G Lewis ◽  
Tom Kralidis ◽  
Jude Mwenda

A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software components of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogues and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalogue, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.


2016 ◽  
Author(s):  
Paolo Corti ◽  
Benjamin G Lewis ◽  
Tom Kralidis ◽  
Jude Mwenda

A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software components of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogues and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalogue, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.


Author(s):  
Cai Fu ◽  
Zhaokang Ke ◽  
Yunhe Zhang ◽  
Xiwu Chen ◽  
Liqing Cao ◽  
...  

With the popularization of computers and the development of information engineering, the emergence of search engines makes it possible to get the information needed from big data quickly and efficiently. However, in recent years, a multiplicity of new viruses have been propagated by search engines. Many researchers choose to cut off the source of virus propagation, ignoring the virus immunization strategy based on the search engine. In this paper, we analyze the impact of search engines on virus propagation. First, considering the immune effect and cost, two kinds of immune mechanisms based on the search engine that have greater practicability are defined. Second, immune mechanisms based on the search engine are theoretically analyzed by the iteration method and the dynamic method. The results show that this immunization strategy can slow down or eliminate the propagation of a virus to a certain extent. Third, three real social network data sets are used to simulate and analyze the immune mechanism. We find that when the proportion of nodes being infected and the proportion of infected nodes being identified by the search engine satisfy a certain relationship, our immune mechanism can inhibit the spread of viruses, which confirms our theoretical analysis results.


Sign in / Sign up

Export Citation Format

Share Document