Improving large-scale search engines with semantic annotations

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text

A Query-Dependent Duplicate Detection Approach for Large Scale Search Engines

Advanced Web Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-540-24655-8_6 ◽

2004 ◽

pp. 48-58 ◽

Cited By ~ 2

Author(s):

Shaozhi Ye ◽

Ruihua Song ◽

Ji-Rong Wen ◽

Wei-Ying Ma

Keyword(s):

Search Engines ◽

Large Scale ◽

Duplicate Detection ◽

Detection Approach

Download Full-text

I/O-Conscious Data Preparation for Large-Scale Web Search Engines

VLDB '02: Proceedings of the 28th International Conference on Very Large Databases ◽

10.1016/b978-155860869-6/50041-x ◽

2002 ◽

pp. 382-393 ◽

Cited By ~ 1

Author(s):

Maxim Lifantsev ◽

Tzi-cker Chiueh

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Data Preparation ◽

Web Search Engines

Download Full-text

Deep Web

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch062 ◽

2009 ◽

pp. 581-588 ◽

Cited By ~ 5

Author(s):

Denis Shestakov

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Web Database ◽

Web Search Engine ◽

Search Form ◽

Complete Set ◽

Web Crawlers ◽

Pass Through ◽

The Web

Finding information on the Web using a web search engine is one of the primary activities of today’s web users. For a majority of users results returned by conventional search engines are an essentially complete set of links to all pages on the Web relevant to their queries. However, currentday searchers do not crawl and index a significant portion of the Web and, hence, web users relying on search engines only are unable to discover and access a large amount of information from the nonindexable part of the Web. Specifically, dynamic pages generated based on parameters provided by a user via web search forms are not indexed by search engines and cannot be found in searchers’ results. Such search interfaces provide web users with an online access to myriads of databases on the Web. In order to obtain some information from a web database of interest, a user issues his/her query by specifying query terms in a search form and receives the query results, a set of dynamic pages which embed required information from a database. At the same time, issuing a query via an arbitrary search interface is an extremely complex task for any kind of automatic agents including web crawlers, which, at least up to the present day, do not even attempt to pass through web forms on a large scale.

Download Full-text

Scalability and efficiency challenges in large-scale web search engines

Proceedings of the 23rd International Conference on World Wide Web - WWW '14 Companion ◽

10.1145/2567948.2577271 ◽

2014 ◽

Cited By ~ 4

Author(s):

Ricardo Baeza-Yates ◽

B. Barla Cambazoglu

Keyword(s):

Search Engines ◽

Large Scale ◽

Web Search ◽

Web Search Engines

Download Full-text

Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines

PROTEOMICS ◽

10.1002/pmic.200900074 ◽

2010 ◽

Vol 10 (6) ◽

pp. 1172-1189 ◽

Cited By ~ 29

Author(s):

Wen Yu ◽

J. Alex Taylor ◽

Michael T. Davis ◽

Leo E. Bonilla ◽

Kimberly A. Lee ◽

...

Keyword(s):

Search Engines ◽

Large Scale ◽

Peptide Identification

Download Full-text

Resource-Efficient Index Shard Replication in Large Scale Search Engines

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2019.2924423 ◽

2019 ◽

Vol 30 (12) ◽

pp. 2820-2835

Author(s):

Yusen Li ◽

Xueyan Tang ◽

Wentong Cai ◽

Jiancong Tong ◽

Xiaoguang Liu ◽

...

Keyword(s):

Search Engines ◽

Large Scale

Download Full-text

Template detection for large scale search engines

Proceedings of the 2006 ACM symposium on Applied computing - SAC '06 ◽

10.1145/1141277.1141534 ◽

2006 ◽

Cited By ~ 18

Author(s):

Liang Chen ◽

Shaozhi Ye ◽

Xing Li

Keyword(s):

Search Engines ◽

Large Scale

Download Full-text

Contextualized Clustering in Exploratory Web Search

Emerging Technologies of Text Mining ◽

10.4018/978-1-59904-373-9.ch009 ◽

2008 ◽

pp. 184-207 ◽

Cited By ~ 3

Author(s):

Jon Atle Gulla ◽

Hans Olaf Borch ◽

Jon Espen Ingvaldsen

Keyword(s):

Search Engines ◽

Information Needs ◽

Large Scale ◽

Suffix Tree ◽

Web Search ◽

Amount Of Information ◽

Acceptable Quality ◽

Inherent Problem ◽

Web Search Engines ◽

The Web

Due to the large amount of information on the web and the difficulties of relating user’s expressed information needs to document content, large-scale web search engines tend to return thousands of ranked documents. This chapter discusses the use of clustering to help users navigate through the result sets and explore the domain. A newly developed system, HOBSearch, makes use of suffix tree clustering to overcome many of the weaknesses of traditional clustering approaches. Using result snippets rather than full documents, HOBSearch both speeds up clustering substantially and manages to tailor the clustering to the topics indicated in user’s query. An inherent problem with clustering, though, is the choice of cluster labels. Our experiments with HOBSearch show that cluster labels of an acceptable quality can be generated with no upervision or predefined structures and within the constraints given by large-scale web search.

Download Full-text

XML Mining for Semantic Web

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch014 ◽

2011 ◽

pp. 317-342

Author(s):

Rafael Berlanga ◽

Victoria Nebot

Keyword(s):

Data Mining ◽

Semantic Web ◽

Large Scale ◽

Huge Amount ◽

Semantic Annotations ◽

Data Formats ◽

Wide Acceptance ◽

Domain Ontologies ◽

The Web

This chapter describes the convergence of two influential technologies in the last decade, namely data mining (DM) and the Semantic Web (SW). The wide acceptance of new SW formats for describing semantics-aware and semistructured contents have spurred on the massive generation of semantic annotations and large-scale domain ontologies for conceptualizing their concepts. As a result, a huge amount of both knowledge and semantic-annotated data is available in the web. DM methods have been very successful in discovering interesting patterns which are hidden in very large amounts of data. However, DM methods have been largely based on simple and flat data formats which are far from those available in the SW. This chapter reviews and discusses the main DM approaches proposed so far to mine SW data as well as those that have taken into account the SW resources and tools to define semantics-aware methods.

Download Full-text