An investigation of biases in web search engine query suggestions

Purpose Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. The purpose of this paper is to analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. The authors test the approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. Design/methodology/approach This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test the framework, the authors collected data from the Google, Bing and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party or age and with regards to the stability of the suggestions over time. Findings By using the framework, the authors located three semantic clusters within the data set: suggestions related to politics and economics, location information and personal and other miscellaneous topics. Among other effects, the results of the analysis show a small bias in the form that male politicians receive slightly fewer suggestions on “personal and misc” topics. The stability analysis of the suggested terms over time shows that some suggestions are prevalent most of the time, while other suggestions fluctuate more often. Originality/value This study proposes a novel framework to automatically identify biases in web search engine query suggestions for person-related searches. Applying this framework on a set of person-related query suggestions shows first insights into the influence search engines can have on the query process of users that seek out information on politicians.

Download Full-text

“Outside the industry, nobody knows what we do” SEO as seen by search engine optimizers and content providers

Journal of Documentation ◽

10.1108/jd-07-2020-0127 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sebastian Schultheiß ◽

Dirk Lewandowski

Keyword(s):

Search Engine ◽

Search Engines ◽

Design Methodology ◽

Web Search ◽

Expert Interviews ◽

Content Type ◽

Search Engine Optimization ◽

Stakeholder Groups ◽

Web Search Engine ◽

The Impact

PurposeIn commercial web search engine results rankings, four stakeholder groups are involved: search engine providers, users, content providers and search engine optimizers. Search engine optimization (SEO) is a multi-billion-dollar industry and responsible for making content visible through search engines. Despite this importance, little is known about its role in the interaction of the stakeholder groups.Design/methodology/approachWe conducted expert interviews with 15 German search engine optimizers and content providers, the latter represented by content managers and online journalists. The interviewees were asked about their perspectives on SEO and how they assess the views of users about SEO.FindingsSEO was considered necessary for content providers to ensure visibility, which is why dependencies between both stakeholder groups have evolved. Despite its importance, SEO was seen as largely unknown to users. Therefore, it is assumed that users cannot realistically assess the impact SEO has and that user opinions about SEO depend heavily on their knowledge of the topic.Originality/valueThis study investigated search engine optimization from the perspective of those involved in the optimization business: content providers, online journalists and search engine optimization professionals. The study therefore contributes to a more nuanced view on and a deeper understanding of the SEO domain.

Download Full-text

Associating Searching on Search Engines to Subsequent Searching on Sites

International Journal of Information Systems in the Service Sector ◽

10.4018/ijisss.2016040103 ◽

2016 ◽

Vol 8 (2) ◽

pp. 30-43

Author(s):

Adan Ortiz-Cordova ◽

Bernard J. Jansen

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Research Study ◽

Search Queries ◽

Web Search Engine ◽

Search Patterns ◽

Search Information

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.

Download Full-text

Web Search Engine Architectures and their Performance Analysis

Handbook of Research on Web Information Systems Quality ◽

10.4018/978-1-59904-847-5.ch028 ◽

2011 ◽

pp. 491-509

Author(s):

Xiannong Meng

Keyword(s):

Performance Analysis ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

General Purpose ◽

Performance Measurements ◽

Web Documents ◽

System Architectures ◽

Web Search Engine ◽

And Performance

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.

Download Full-text

Eysenbach, Tuische and Diepgen’s Evaluation of Web Searching for Identifying Unpublished Studies for Systematic Reviews: An Innovative Study Which is Still Relevant Today

Evidence Based Library and Information Practice ◽

10.18438/b8f049 ◽

2016 ◽

Vol 11 (3) ◽

pp. 108

Author(s):

Simon Briscoe

Keyword(s):

Systematic Review ◽

Clinical Trials ◽

Search Engine ◽

Systematic Reviews ◽

Search Engines ◽

Web Search ◽

Web Searching ◽

Web Searches ◽

Web Search Engine ◽

Unpublished Studies

A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001). Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3), 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The relevance of a further three studies could not be clearly ascertained. This left nine studies which were considered relevant to a systematic review. In addition to this main finding, the pilot study to identify suitable search engines found that AltaVista was the only search engine able to handle the complex searches required to search for unpublished studies. Conclusion –Web searches using a search engine have the potential to identify studies for systematic reviews. Web search engines have considerable limitations which impede the identification of studies.

Download Full-text

Design and Implementation of a Cache System in Web Search Engines

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.1106 ◽

2013 ◽

Vol 462-463 ◽

pp. 1106-1109

Author(s):

Hong Yuan Ma

Keyword(s):

Search Engine ◽

Search Engines ◽

Web Search ◽

Communication Architecture ◽

Design And Implementation ◽

Web Search Engine ◽

Inverted List ◽

Web Search Engines ◽

Application Processor ◽

Cache System

Web search engine caches the results which is frequently queried by users. It is an effective approach to improve the efficiency of Web search engines. In this paper, we give some valuable experience in our design and implementation of a Web search engine cache system. We present there design principles: logical layer processing, event-based communication architecture and avoiding frequent data copy. We also introduce the architecture presented in practice, including connection processor, application processor, query results caching processor, inverted list caching processor and list intersection caching processor. Experiments are conducted in our cache system using a real Web search engine query log.

Download Full-text

Study of Search Engine Transaction Logs Shows Little Change in How Users use Search Engines

Evidence Based Library and Information Practice ◽

10.18438/b80014 ◽

2006 ◽

Vol 1 (3) ◽

pp. 67

Author(s):

David Hook

Keyword(s):

Search Engine ◽

Search Engines ◽

World Wide ◽

Web Search ◽

Result Page ◽

Significant Difference ◽

The Us ◽

The World ◽

The U.S ◽

Over Time

A review of: Jansen, Bernard J., and Amanda Spink. “How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Information Processing & Management 42.1 (2006): 248-263. Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American) over a seven-year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one-term queries, operator (and, or, not, etc.) usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.-based or the European-based search engines. As well, there was little change observed in the percentage of one-term queries over the years of study for either the U.S.-based or the European-based search engines. Few users (generally less than 20%) use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.-based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, one-term sessions and operator usage remained stable. The increase in single result page viewing implies that users are tending to view fewer results per web query. There was also a significant difference in the percentage of queries using Boolean operators between the US-based and the European-based search engines. One of the study’s findings was that results from a study of a particular search engine cannot necessarily be applied to all search engines. Finally, web search topics show a trend towards information or commerce searching rather than entertainment.

Download Full-text

Users’ Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017300022 ◽

2017 ◽

Vol 26 (06) ◽

pp. 1730002 ◽

Cited By ~ 3

Author(s):

T. Dhiliphan Rajkumar ◽

S. P. Raja ◽

A. Suruliandi

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

Irrelevant Information ◽

Experimental Results ◽

Agglomerative Clustering ◽

Semantic Level ◽

Web Search Engine ◽

The Web

Short and ambiguous queries are the major problems in search engines which lead to irrelevant information retrieval for the users’ input. The increasing nature of the information on the web also makes various difficulties for the search engine to provide the users needed results. The web search engine experience the ill effects of ambiguity, since the queries are looked at on a rational level rather than the semantic level. In this paper, for improving the performance of search engine as of the users’ interest, personalization is based on the users’ clicks and bookmarking is proposed. Modified agglomerative clustering is used in this work for clustering the results. The experimental results prove that the proposed work scores better precision, recall and F-score.

Download Full-text

Privacy Proliferation of Customized Web Search Engine

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.32.15391 ◽

2018 ◽

Vol 7 (2.32) ◽

pp. 150

Author(s):

N Arunachalam ◽

S Radjou ◽

P Aravindan ◽

T Sivagurunathan

Keyword(s):

Search Engine ◽

Search Engines ◽

Information Disclosure ◽

Web Search ◽

User Privacy ◽

Search System ◽

Amount Of Information ◽

Web Search Engine ◽

Web Search Engines

In last few years the illegal disclosure of user privacy in web search engine has become more serious. Protecting and Pre-venting user privacy from illegal disclosure is attracting the interest among researchers in recent times. Existing web search engines do not consider the privacy of the users. Search engines tend to collect all the information from the user. A system to ensure the privacy of the user is essential. Hence, the Personalized Web Search (PWS) method was put forward to take control over the amount of information that the user can provide to the search engines. This PWS provides privacy protec-tion in web search system and minimize the information disclosure of the user related to privacy through a customizable web-search.

Download Full-text

Semantic Web Search Engines : A Comparative Survey

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195115 ◽

2019 ◽

pp. 107-115

Author(s):

Anita Kumari ◽

Jawahar Thakur

Keyword(s):

Semantic Web ◽

Search Engine ◽

Search Engines ◽

Web Search ◽

Retrieval System ◽

Relevant Information ◽

Semantic Web Technologies ◽

Web Technologies ◽

Web Search Engine ◽

Comparative Survey

Search engines play important role in the success of the Web. Search engine helps the users to find the relevant information on the internet. Due to many problems in traditional search engines has led to the development of semantic web. Semantic web technologies are playing a crucial role in enhancing traditional search, as it work to create machines readable data and focus on metadata. However, it will not replace traditional search engines. In the environment of semantic web, search engine should be more useful and efficient for searching the relevant web information. It is a way to increase the accuracy of information retrieval system. This is possible because semantic web uses software agents; these agents collect the information, perform relevant transactions and interact with physical devices. This paper includes the survey on the prevalent Semantic Search Engines based on their advantages, working and disadvantages and presents a comparative study based on techniques, type of results, crawling, and indexing.

Download Full-text

Research and Improvement on Content-Based Web Search Engine

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1282 ◽

2012 ◽

Vol 532-533 ◽

pp. 1282-1286

Author(s):

Zhi Chao Lin ◽

Lei Sun ◽

Xiao Liu

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Full Text ◽

Search Engines ◽

Query Expansion ◽

Web Search ◽

Text Search ◽

Web Search Engine ◽

Query Word ◽

The Web

There is a lot of information contained in the World Wide Web. It has become a research focus to obtain the required related resources quickly and accurately from the web through the content-based search engines. Most current tools of full text web search engine, such as Lucene which is a widely used open source retrieval library in information retrieval field, are purely keyword based. This may not sufficient for users to retrieve in the web. In this paper, we employ a method to overcome the limitations of current full text search engines in represent of Lucene. We propose a Query Expansion and Information Retrieval approach which can help users to acquire more accurate contents from the web. The Query Expansion component finds expanded candidate words of the query word through WordNet which contains synonyms in several different senses; In the Information Retrieval component, the query word and its candidate words are used together as the input of the search module to get the result items. Furthermore, we can put the result items into different classes based on the expansion. Some experiments and the results are described in the late part of this paper.

Download Full-text