scholarly journals An investigation of biases in web search engine query suggestions

2019 ◽  
Vol 44 (2) ◽  
pp. 365-381 ◽  
Author(s):  
Malte Bonart ◽  
Anastasiia Samokhina ◽  
Gernot Heisenberg ◽  
Philipp Schaer

Purpose Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. The purpose of this paper is to analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. The authors test the approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. Design/methodology/approach This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test the framework, the authors collected data from the Google, Bing and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party or age and with regards to the stability of the suggestions over time. Findings By using the framework, the authors located three semantic clusters within the data set: suggestions related to politics and economics, location information and personal and other miscellaneous topics. Among other effects, the results of the analysis show a small bias in the form that male politicians receive slightly fewer suggestions on “personal and misc” topics. The stability analysis of the suggested terms over time shows that some suggestions are prevalent most of the time, while other suggestions fluctuate more often. Originality/value This study proposes a novel framework to automatically identify biases in web search engine query suggestions for person-related searches. Applying this framework on a set of person-related query suggestions shows first insights into the influence search engines can have on the query process of users that seek out information on politicians.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sebastian Schultheiß ◽  
Dirk Lewandowski

PurposeIn commercial web search engine results rankings, four stakeholder groups are involved: search engine providers, users, content providers and search engine optimizers. Search engine optimization (SEO) is a multi-billion-dollar industry and responsible for making content visible through search engines. Despite this importance, little is known about its role in the interaction of the stakeholder groups.Design/methodology/approachWe conducted expert interviews with 15 German search engine optimizers and content providers, the latter represented by content managers and online journalists. The interviewees were asked about their perspectives on SEO and how they assess the views of users about SEO.FindingsSEO was considered necessary for content providers to ensure visibility, which is why dependencies between both stakeholder groups have evolved. Despite its importance, SEO was seen as largely unknown to users. Therefore, it is assumed that users cannot realistically assess the impact SEO has and that user opinions about SEO depend heavily on their knowledge of the topic.Originality/valueThis study investigated search engine optimization from the perspective of those involved in the optimization business: content providers, online journalists and search engine optimization professionals. The study therefore contributes to a more nuanced view on and a deeper understanding of the SEO domain.


Author(s):  
Adan Ortiz-Cordova ◽  
Bernard J. Jansen

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.


Author(s):  
Xiannong Meng

This chapter surveys various technologies involved in a Web search engine with an emphasis on performance analysis issues. The aspects of a general-purpose search engine covered in this survey include system architectures, information retrieval theories as the basis of Web search, indexing and ranking of Web documents, relevance feedback and machine learning, personalization, and performance measurements. The objectives of the chapter are to review the theories and technologies pertaining to Web search, and help us understand how Web search engines work and how to use the search engines more effectively and efficiently.


2016 ◽  
Vol 11 (3) ◽  
pp. 108
Author(s):  
Simon Briscoe

A Review of: Eysenbach, G., Tuische, J. & Diepgen, T.L. (2001). Evaluation of the usefulness of Internet searches to identify unpublished clinical trials for systematic reviews. Medical Informatics and the Internet in Medicine, 26(3), 203-218. http://dx.doi.org/10.1080/14639230110075459 Objective – To consider whether web searching is a useful method for identifying unpublished studies for inclusion in systematic reviews. Design – Retrospective web searches using the AltaVista search engine were conducted to identify unpublished studies – specifically, clinical trials – for systematic reviews which did not use a web search engine. Setting – The Department of Clinical Social Medicine, University of Heidelberg, Germany. Subjects – n/a Methods – Pilot testing of 11 web search engines was carried out to determine which could handle complex search queries. Pre-specified search requirements included the ability to handle Boolean and proximity operators, and truncation searching. A total of seven Cochrane systematic reviews were randomly selected from the Cochrane Library Issue 2, 1998, and their bibliographic database search strategies were adapted for the web search engine, AltaVista. Each adaptation combined search terms for the intervention, problem, and study type in the systematic review. Hints to planned, ongoing, or unpublished studies retrieved by the search engine, which were not cited in the systematic reviews, were followed up by visiting websites and contacting authors for further details when required. The authors of the systematic reviews were then contacted and asked to comment on the potential relevance of the identified studies. Main Results – Hints to 14 unpublished and potentially relevant studies, corresponding to 4 of the 7 randomly selected Cochrane systematic reviews, were identified. Out of the 14 studies, 2 were considered irrelevant to the corresponding systematic review by the systematic review authors. The relevance of a further three studies could not be clearly ascertained. This left nine studies which were considered relevant to a systematic review. In addition to this main finding, the pilot study to identify suitable search engines found that AltaVista was the only search engine able to handle the complex searches required to search for unpublished studies. Conclusion –Web searches using a search engine have the potential to identify studies for systematic reviews. Web search engines have considerable limitations which impede the identification of studies.


2013 ◽  
Vol 462-463 ◽  
pp. 1106-1109
Author(s):  
Hong Yuan Ma

Web search engine caches the results which is frequently queried by users. It is an effective approach to improve the efficiency of Web search engines. In this paper, we give some valuable experience in our design and implementation of a Web search engine cache system. We present there design principles: logical layer processing, event-based communication architecture and avoiding frequent data copy. We also introduce the architecture presented in practice, including connection processor, application processor, query results caching processor, inverted list caching processor and list intersection caching processor. Experiments are conducted in our cache system using a real Web search engine query log.


2006 ◽  
Vol 1 (3) ◽  
pp. 67
Author(s):  
David Hook

A review of: Jansen, Bernard J., and Amanda Spink. “How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Transaction Logs.” Information Processing & Management 42.1 (2006): 248-263. Objective – To examine the interactions between users and search engines, and how they have changed over time. Design – Comparative analysis of search engine transaction logs. Setting – Nine major analyses of search engine transaction logs. Subjects – Nine web search engine studies (4 European, 5 American) over a seven-year period, covering the search engines Excite, Fireball, AltaVista, BWIE and AllTheWeb. Methods – The results from individual studies are compared by year of study for percentages of single query sessions, one-term queries, operator (and, or, not, etc.) usage and single result page viewing. As well, the authors group the search queries into eleven different topical categories and compare how the breakdown has changed over time. Main Results – Based on the percentage of single query sessions, it does not appear that the complexity of interactions has changed significantly for either the U.S.-based or the European-based search engines. As well, there was little change observed in the percentage of one-term queries over the years of study for either the U.S.-based or the European-based search engines. Few users (generally less than 20%) use Boolean or other operators in their queries, and these percentages have remained relatively stable. One area of noticeable change is in the percentage of users viewing only one results page, which has increased over the years of study. Based on the studies of the U.S.-based search engines, the topical categories of ‘People, Place or Things’ and ‘Commerce, Travel, Employment or Economy’ are becoming more popular, while the categories of ‘Sex and Pornography’ and ‘Entertainment or Recreation’ are declining. Conclusions – The percentage of users viewing only one results page increased during the years of the study, while the percentages of single query sessions, one-term sessions and operator usage remained stable. The increase in single result page viewing implies that users are tending to view fewer results per web query. There was also a significant difference in the percentage of queries using Boolean operators between the US-based and the European-based search engines. One of the study’s findings was that results from a study of a particular search engine cannot necessarily be applied to all search engines. Finally, web search topics show a trend towards information or commerce searching rather than entertainment.


2017 ◽  
Vol 26 (06) ◽  
pp. 1730002 ◽  
Author(s):  
T. Dhiliphan Rajkumar ◽  
S. P. Raja ◽  
A. Suruliandi

Short and ambiguous queries are the major problems in search engines which lead to irrelevant information retrieval for the users’ input. The increasing nature of the information on the web also makes various difficulties for the search engine to provide the users needed results. The web search engine experience the ill effects of ambiguity, since the queries are looked at on a rational level rather than the semantic level. In this paper, for improving the performance of search engine as of the users’ interest, personalization is based on the users’ clicks and bookmarking is proposed. Modified agglomerative clustering is used in this work for clustering the results. The experimental results prove that the proposed work scores better precision, recall and F-score.


2018 ◽  
Vol 7 (2.32) ◽  
pp. 150
Author(s):  
N Arunachalam ◽  
S Radjou ◽  
P Aravindan ◽  
T Sivagurunathan

In last few years the illegal disclosure of user privacy in web search engine has become more serious. Protecting and Pre-venting user privacy from illegal disclosure is attracting the interest among researchers in recent times. Existing web search engines do not consider the privacy of the users. Search engines tend to collect all the information from the user. A system to ensure the privacy of the user is essential. Hence, the Personalized Web Search (PWS) method was put forward to take control over the amount of information that the user can provide to the search engines. This PWS provides privacy protec-tion in web search system and minimize the information disclosure of the user related to privacy through a customizable web-search.  


Author(s):  
Anita Kumari ◽  
Jawahar Thakur

Search engines play important role in the success of the Web. Search engine helps the users to find the relevant information on the internet. Due to many problems in traditional search engines has led to the development of semantic web. Semantic web technologies are playing a crucial role in enhancing traditional search, as it work to create machines readable data and focus on metadata. However, it will not replace traditional search engines. In the environment of semantic web, search engine should be more useful and efficient for searching the relevant web information. It is a way to increase the accuracy of information retrieval system. This is possible because semantic web uses software agents; these agents collect the information, perform relevant transactions and interact with physical devices. This paper includes the survey on the prevalent Semantic Search Engines based on their advantages, working and disadvantages and presents a comparative study based on techniques, type of results, crawling, and indexing.


2012 ◽  
Vol 532-533 ◽  
pp. 1282-1286
Author(s):  
Zhi Chao Lin ◽  
Lei Sun ◽  
Xiao Liu

There is a lot of information contained in the World Wide Web. It has become a research focus to obtain the required related resources quickly and accurately from the web through the content-based search engines. Most current tools of full text web search engine, such as Lucene which is a widely used open source retrieval library in information retrieval field, are purely keyword based. This may not sufficient for users to retrieve in the web. In this paper, we employ a method to overcome the limitations of current full text search engines in represent of Lucene. We propose a Query Expansion and Information Retrieval approach which can help users to acquire more accurate contents from the web. The Query Expansion component finds expanded candidate words of the query word through WordNet which contains synonyms in several different senses; In the Information Retrieval component, the query word and its candidate words are used together as the input of the search module to get the result items. Furthermore, we can put the result items into different classes based on the expansion. Some experiments and the results are described in the late part of this paper.


Sign in / Sign up

Export Citation Format

Share Document