Data Set for An Empirical Analysis of Search Engines� Response to Web Search Queries Associated with the Classroom Setting

Author(s):  
Oghenemaro Anuyah ◽  
Ashlee Milton ◽  
Michael Green ◽  
Maria Pera
Author(s):  
Adan Ortiz-Cordova ◽  
Bernard J. Jansen

In this research study, the authors investigate the association between external searching, which is searching on a web search engine, and internal searching, which is searching on a website. They classify 295,571 external – internal searches where each search is composed of a search engine query that is submitted to a web search engine and then one or more subsequent queries submitted to a commercial website by the same user. The authors examine 891,453 queries from all searches, of which 295,571 were external search queries and 595,882 were internal search queries. They algorithmically classify all queries into states, and then clustered the searching episodes into major searching configurations and identify the most commonly occurring search patterns for both external, internal, and external-to-internal searching episodes. The research implications of this study are that external sessions and internal sessions must be considered as part of a continuous search episode and that online businesses can leverage external search information to more effectively target potential consumers.


2016 ◽  
Vol 6 (2) ◽  
pp. 66-85
Author(s):  
Wael K. Hanna ◽  
Aziza Saad Asem ◽  
M. B. Senousy

The users that used search engines are obligated to express their goals in few words (queries). Sometimes search queries are ambiguous. Moreover, the users' intents are dynamically evolving. This paper analyzes the user's query logs to classify the related queries, the related intent topic categories and the related intent types and use this classification to dynamically predict the users' future queries, its intent topic and its intent type. AOL Search Query Log is taken as an experimental data set. Then use evaluation metrics to evaluate the prediction results.


Author(s):  
Anselm Spoerri

This paper analyzes which pages and topics are the most popular on Wikipedia and why. For the period of September 2006 to January 2007, the 100 most visited Wikipedia pages in a month are identified and categorized in terms of the major topics of interest. The observed topics are compared with search behavior on the Web. Search queries, which are identical to the titles of the most popular Wikipedia pages, are submitted to major search engines and the positions of popular Wikipedia pages in the top 10 search results are determined. The presented data helps to explain how search engines, and Google in particular, fuel the growth and shape what is popular on Wikipedia.


2019 ◽  
Vol 44 (2) ◽  
pp. 365-381 ◽  
Author(s):  
Malte Bonart ◽  
Anastasiia Samokhina ◽  
Gernot Heisenberg ◽  
Philipp Schaer

Purpose Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. The purpose of this paper is to analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. The authors test the approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. Design/methodology/approach This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test the framework, the authors collected data from the Google, Bing and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party or age and with regards to the stability of the suggestions over time. Findings By using the framework, the authors located three semantic clusters within the data set: suggestions related to politics and economics, location information and personal and other miscellaneous topics. Among other effects, the results of the analysis show a small bias in the form that male politicians receive slightly fewer suggestions on “personal and misc” topics. The stability analysis of the suggested terms over time shows that some suggestions are prevalent most of the time, while other suggestions fluctuate more often. Originality/value This study proposes a novel framework to automatically identify biases in web search engine query suggestions for person-related searches. Applying this framework on a set of person-related query suggestions shows first insights into the influence search engines can have on the query process of users that seek out information on politicians.


2018 ◽  
Author(s):  
James Grimmelmann

53 New York Law School Law Review 939 (2009)Web search is critical to our ability to use the Internet. Whoever controls search engines has enormous influence on all of us; whoever controls the search engines, perhaps, controls the Internet itself. This short essay (based on talks given in January and April 2008) uses the stories of five famous search queries to illustrate the conflicts over search and the enormous power Google wields in choosing whose voices are heard on the Internet.


2019 ◽  
Vol 72 (1) ◽  
pp. 88-111
Author(s):  
Oghenemaro Anuyah ◽  
Ashlee Milton ◽  
Michael Green ◽  
Maria Soledad Pera

Purpose The purpose of this paper is to examine strengths and limitations that search engines (SEs) exhibit when responding to web search queries associated with the grade school curriculum Design/methodology/approach The authors employed a simulation-based experimental approach to conduct an in-depth empirical examination of SEs and used web search queries that capture information needs in different search scenarios. Findings Outcomes from this study highlight that child-oriented SEs are more effective than traditional ones when filtering inappropriate resources, but often fail to retrieve educational materials. All SEs examined offered resources at reading levels higher than that of the target audience and often prioritized resources with popular top-level domain (e.g. “.com”). Practical implications Findings have implications for human intervention, search literacy in schools, and the enhancement of existing SEs. Results shed light on the impact on children’s education that result from introducing misconception about SEs when these tools either retrieve no results or offer irrelevant resources, in response to web search queries pertinent to the grade school curriculum. Originality/value The authors examined child-oriented and popular SEs retrieval of resources aligning with task objectives and user capabilities–resources that match user reading skills, do not contain hate-speech and sexually-explicit content, are non-opinionated, and are curriculum-relevant. Findings identified limitations of existing SEs (both directly or indirectly supporting young users) and demonstrate the need to improve SE filtering and ranking algorithms.


2019 ◽  
Vol 26 (1) ◽  
pp. 3-29 ◽  
Author(s):  
Suzan Verberne ◽  
Emiel Krahmer ◽  
Sander Wubben ◽  
Antal van den Bosch

AbstractIn this paper, we address query-based summarization of discussion threads. New users can profit from the information shared in the forum, Please check if the inserted city and country names in the affiliations are correct. if they can find back the previously posted information. However, discussion threads on a single topic can easily comprise dozens or hundreds of individual posts. Our aim is to summarize forum threads given real web search queries. We created a data set with search queries from a discussion forum’s search engine log and the discussion threads that were clicked by the user who entered the query. For 120 thread–query combinations, a reference summary was made by five different human raters. We compared two methods for automatic summarization of the threads: a query-independent method based on post features, and Maximum Marginal Relevance (MMR), a method that takes the query into account. We also compared four different word embeddings representations as alternative for standard word vectors in extractive summarization. We find (1) that the agreement between human summarizers does not improve when a query is provided that: (2) the query-independent post features as well as a centroid-based baseline outperform MMR by a large margin; (3) combining the post features with query similarity gives a small improvement over the use of post features alone; and (4) for the word embeddings, a match in domain appears to be more important than corpus size and dimensionality. However, the differences between the models were not reflected by differences in quality of the summaries created with help of these models. We conclude that query-based summarization with web queries is challenging because the queries are short, and a click on a result is not a direct indicator for the relevance of the result.


2017 ◽  
pp. 030-050
Author(s):  
J.V. Rogushina ◽  

Problems associated with the improve ment of information retrieval for open environment are considered and the need for it’s semantization is grounded. Thecurrent state and prospects of development of semantic search engines that are focused on the Web information resources processing are analysed, the criteria for the classification of such systems are reviewed. In this analysis the significant attention is paid to the semantic search use of ontologies that contain knowledge about the subject area and the search users. The sources of ontological knowledge and methods of their processing for the improvement of the search procedures are considered. Examples of semantic search systems that use structured query languages (eg, SPARQL), lists of keywords and queries in natural language are proposed. Such criteria for the classification of semantic search engines like architecture, coupling, transparency, user context, modification requests, ontology structure, etc. are considered. Different ways of support of semantic and otology based modification of user queries that improve the completeness and accuracy of the search are analyzed. On base of analysis of the properties of existing semantic search engines in terms of these criteria, the areas for further improvement of these systems are selected: the development of metasearch systems, semantic modification of user requests, the determination of an user-acceptable transparency level of the search procedures, flexibility of domain knowledge management tools, increasing productivity and scalability. In addition, the development of means of semantic Web search needs in use of some external knowledge base which contains knowledge about the domain of user information needs, and in providing the users with the ability to independent selection of knowledge that is used in the search process. There is necessary to take into account the history of user interaction with the retrieval system and the search context for personalization of the query results and their ordering in accordance with the user information needs. All these aspects were taken into account in the design and implementation of semantic search engine "MAIPS" that is based on an ontological model of users and resources cooperation into the Web.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Sign in / Sign up

Export Citation Format

Share Document