TREC-COVID: Building a Pandemic Retrieval Test Collection

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

Examining and Fine-tuning the Selection of Glycan Compositions with GlyConnect Compozitor

Molecular & Cellular Proteomics ◽

10.1074/mcp.ra120.002041 ◽

2020 ◽

Vol 19 (10) ◽

pp. 1602-1618 ◽

Cited By ~ 1

Author(s):

Thibault Robin ◽

Julien Mariethoz ◽

Frédérique Lisacek

Keyword(s):

Search Engine ◽

Posttranslational Modifications ◽

Search Engines ◽

Web Application ◽

Contextual Information ◽

Fine Tuning ◽

Data Sources ◽

Web Interface ◽

Definition Of ◽

Selection Of

A key point in achieving accurate intact glycopeptide identification is the definition of the glycan composition file that is used to match experimental with theoretical masses by a glycoproteomics search engine. At present, these files are mainly built from searching the literature and/or querying data sources focused on posttranslational modifications. Most glycoproteomics search engines include a default composition file that is readily used when processing MS data. We introduce here a glycan composition visualizing and comparative tool associated with the GlyConnect database and called GlyConnect Compozitor. It offers a web interface through which the database can be queried to bring out contextual information relative to a set of glycan compositions. The tool takes advantage of compositions being related to one another through shared monosaccharide counts and outputs interactive graphs summarizing information searched in the database. These results provide a guide for selecting or deselecting compositions in a file in order to reflect the context of a study as closely as possible. They also confirm the consistency of a set of compositions based on the content of the GlyConnect database. As part of the tool collection of the Glycomics@ExPASy initiative, Compozitor is hosted at https://glyconnect.expasy.org/compozitor/ where it can be run as a web application. It is also directly accessible from the GlyConnect database.

Download Full-text

Search Engine Update

Legal Information Management ◽

10.1017/s1472669600000566 ◽

2001 ◽

Vol 1 (3) ◽

pp. 28-31 ◽

Cited By ~ 1

Author(s):

Valerie Stevenson

Keyword(s):

Search Engine ◽

Search Engines ◽

Search Strategy ◽

Search Strategies ◽

Boolean Logic ◽

Web Searches ◽

Search Techniques ◽

Looking Back ◽

The Web

Looking back to 1999, there were a number of search engines which performed equally well. I recommended defining the search strategy very carefully, using Boolean logic and field search techniques, and always running the search in more than one search engine. Numerous articles and Web columns comparing the performance of different search engines came to different conclusions on the ‘best’ search engines. Over the last year, however, all the speakers at conferences and seminars I have attended have recommended Google as their preferred tool for locating all kinds of information on the Web. I confess that I have now abandoned most of my carefully worked out search strategies and comparison tests, and use Google for most of my own Web searches.

Download Full-text

Automatic Keyword Extraction Algorithm and Implementation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.4041 ◽

2010 ◽

Vol 44-47 ◽

pp. 4041-4049 ◽

Cited By ~ 1

Author(s):

Hong Zhao ◽

Chen Sheng Bai ◽

Song Zhu

Keyword(s):

Search Engine ◽

Word Frequency ◽

Search Engines ◽

Keyword Extraction ◽

Web Page ◽

C Language ◽

Search Engine Optimization ◽

Page Ranking ◽

Extraction Algorithm ◽

A Site

Search engines can bring a lot of benefit to the website. For a site, each page’s search engine ranking is very important. To make web page ranking in search engine ahead, Search engine optimization (SEO) make effect on the ranking. Web page needs to set the keywords as “keywords" to use SEO. The paper focuses on the content of a given word, and extracts the keywords of each page by calculating the word frequency. The algorithm is implemented by C # language. Keywords setting of webpage are of great importance on the information and products

Download Full-text

Website removal from search engines due to copyright violation

Aslib Journal of Information Management ◽

10.1108/ajim-05-2018-0108 ◽

2019 ◽

Vol 71 (1) ◽

pp. 54-71 ◽

Cited By ~ 7

Author(s):

Artur Strzelecki

Keyword(s):

Search Engine ◽

Search Engines ◽

Design Methodology ◽

Global Analysis ◽

Domain Name ◽

Content Type ◽

Search Results ◽

Internet Users ◽

Copyright Violation

Purpose The purpose of this paper is to clarify how many removal requests are made, how often, and who makes these requests, as well as which websites are reported to search engines so they can be removed from the search results. Design/methodology/approach Undertakes a deep analysis of more than 3.2bn removed pages from Google’s search results requested by reporting organizations from 2011 to 2018 and over 460m removed pages from Bing’s search results requested by reporting organizations from 2015 to 2017. The paper focuses on pages that belong to the .pl country coded top-level domain (ccTLD). Findings Although the number of requests to remove data from search results has been growing year on year, fewer URLs have been reported in recent years. Some of the requests are, however, unjustified and are rejected by teams representing the search engines. In terms of reporting copyright violations, one company in particular stands out (AudioLock.Net), accounting for 28.1 percent of all reports sent to Google (the top ten companies combined were responsible for 61.3 percent of the total number of reports). Research limitations/implications As not every request can be published, the study is based only what is publicly available. Also, the data assigned to Poland is only based on the ccTLD domain name (.pl); other domain extensions for Polish internet users were not considered. Originality/value This is first global analysis of data from transparency reports published by search engine companies as prior research has been based on specific notices.

Download Full-text

IMPLEMENTASI ALGORITMA GOOGLE LATENT SEMANTIC DISTANCE UNTUK EKSTRAKSI RANGKAIAN KATA KUNCI ARTIKEL JURNAL ILMIAH

Computatio : Journal of Computer Science and Information Systems ◽

10.24912/computatio.v2i2.2569 ◽

2018 ◽

Vol 2 (2) ◽

pp. 186

Author(s):

Novario Jaya Perdana

Keyword(s):

Search Engine ◽

Search Engines ◽

Semantic Distance ◽

Relevant Information ◽

High Accuracy ◽

Hard Work ◽

The Internet ◽

Search Results ◽

Search Result

The accuracy of search result using search engine depends on the keywords that are used. Lack of the information provided on the keywords can lead to reduced accuracy of the search result. This means searching information on the internet is a hard work. In this research, a software has been built to create document keywords sequences. The software uses Google Latent Semantic Distance which can extract relevant information from the document. The information is expressed in the form of specific words sequences which could be used as keyword recommendations in search engines. The result shows that the implementation of the method for creating document keyword recommendation achieved high accuracy and could finds the most relevant information in the top search results.

Download Full-text

Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures

10.7287/peerj.preprints.2238v2 ◽

2016 ◽

Author(s):

Paolo Corti ◽

Benjamin G Lewis ◽

Tom Kralidis ◽

Jude Mwenda

Keyword(s):

Open Source ◽

Search Engine ◽

Language Processing ◽

Spatial Data ◽

Search Engines ◽

Spatial Information ◽

Text Search ◽

Advanced Search ◽

Data Infrastructures ◽

Spatio Temporal

A Spatial Database Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software component of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogs and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalog, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.

Download Full-text

Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures

10.7287/peerj.preprints.2238v3 ◽

2016 ◽

Author(s):

Paolo Corti ◽

Benjamin G Lewis ◽

Tom Kralidis ◽

Jude Mwenda

Keyword(s):

Open Source ◽

Search Engine ◽

Language Processing ◽

Spatial Data ◽

Search Engines ◽

Spatial Information ◽

Data Infrastructure ◽

Advanced Search ◽

Data Infrastructures ◽

Spatio Temporal

A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software components of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogues and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalogue, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.

Download Full-text

Implementing an open source spatio-temporal search platform for Spatial Data Infrastructures

10.7287/peerj.preprints.2238v4 ◽

2016 ◽

Author(s):

Paolo Corti ◽

Benjamin G Lewis ◽

Tom Kralidis ◽

Jude Mwenda

Keyword(s):

Open Source ◽

Search Engine ◽

Language Processing ◽

Spatial Data ◽

Search Engines ◽

Spatial Information ◽

Data Infrastructure ◽

Advanced Search ◽

Data Infrastructures ◽

Spatio Temporal

A Spatial Data Infrastructure (SDI) is a framework of geospatial data, metadata, users and tools intended to provide the most efficient and flexible way to use spatial information. One of the key software components of a SDI is the catalogue service, needed to discover, query and manage the metadata. Catalogue services in a SDI are typically based on the Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) standard, that defines common interfaces to access the metadata information. A search engine is a software system able to perform very fast and reliable search, with features such as full text search, natural language processing, weighted results, fuzzy tolerance results, faceting, hit highlighting and many others. The Centre of Geographic Analysis (CGA) at Harvard University is trying to integrate within its public domain SDI (named WorldMap), the benefits of both worlds (OGC catalogues and search engines). Harvard Hypermap (HHypermap) is a component that will be part of WorldMap, totally built on an open source stack, implementing an OGC catalogue, based on pycsw, to provide access to metadata in a standard way, and a search engine, based on Solr/Lucene, to provide the advanced search features typically found in search engines.

Download Full-text

The Impact of Search Engines on Virus Propagation

Journal of Circuits System and Computers ◽

10.1142/s0218126621502303 ◽

2021 ◽

pp. 2150230

Author(s):

Cai Fu ◽

Zhaokang Ke ◽

Yunhe Zhang ◽

Xiwu Chen ◽

Liqing Cao ◽

...

Keyword(s):

Search Engine ◽

Search Engines ◽

Data Sets ◽

Immune Mechanism ◽

Virus Propagation ◽

Immunization Strategy ◽

Immune Mechanisms ◽

Social Network Data ◽

Information Engineering ◽

The Impact

With the popularization of computers and the development of information engineering, the emergence of search engines makes it possible to get the information needed from big data quickly and efficiently. However, in recent years, a multiplicity of new viruses have been propagated by search engines. Many researchers choose to cut off the source of virus propagation, ignoring the virus immunization strategy based on the search engine. In this paper, we analyze the impact of search engines on virus propagation. First, considering the immune effect and cost, two kinds of immune mechanisms based on the search engine that have greater practicability are defined. Second, immune mechanisms based on the search engine are theoretically analyzed by the iteration method and the dynamic method. The results show that this immunization strategy can slow down or eliminate the propagation of a virus to a certain extent. Third, three real social network data sets are used to simulate and analyze the immune mechanism. We find that when the proportion of nodes being infected and the proportion of infected nodes being identified by the search engine satisfy a certain relationship, our immune mechanism can inhibit the spread of viruses, which confirms our theoretical analysis results.

Download Full-text