Towards Improving Web Search: A Large-Scale Exploratory Study of Selected Aspects of User Search Behavior

Author(s):  
Hiroaki Ohshima ◽  
Adam Jatowt ◽  
Satoshi Oyama ◽  
Satoshi Nakamura ◽  
Katsumi Tanaka
2018 ◽  
Vol 7 (1.7) ◽  
pp. 91
Author(s):  
L LeemaPriyadharshini ◽  
S Florence ◽  
K Prema ◽  
C Shyamala Kumari

Search engines provide ranked information based on the query given by the user. Understanding user search behavior is an important task for satisfaction of the users with the needed information. Understanding user search behaviors and recommending more information or more sites to the user is an emerging task. The work is based on the queries given by the user, the amount of time the user spending on the particular page, the number of clicks done by the user particular URL. These details will be available in the dataset of web search log. The web search log is nothing but the log which contains the user searching activities and other details like machine ID, browser ID, timestamp, query given by the user, URL accessed etc., four things considered as the important: 1) Extraction of tasks from the sequence of queries given by the user 2) suggesting some similar query to the user 3) ranking URLs based on the implicit user behaviors 4) increasing web page utilities based on the implicit behaviors. For increasing the web page utility and ranking the URLs predicting implicit user behavior is a needed task. For each of these four things designing and implementation of some algorithms and techniques are needed to increase the efficiency and effectiveness.


2017 ◽  
Vol 35 (3) ◽  
pp. 360-367
Author(s):  
Scott Hanrath ◽  
Erik Radio

Purpose The purpose of this paper is to investigate the search behavior of institutional repository (IR) users in regard to subjects as a means of estimating the potential impact of applying a controlled subject vocabulary to an IR. Design/methodology/approach Google Analytics data were used to record cases where users arrived at an IR item page from an external web search and subsequently downloaded content. Search queries were compared against the Faceted Application of Subject Terminology (FAST) schema to determine the topical nature of the queries. Queries were also compared against the item’s metadata values for title and subject using approximate string matching to determine the alignment of the queries with current metadata values. Findings A substantial portion of successful user search queries to an IR appear to be topical in nature. User search queries matched values from FAST at a higher rate than existing subject metadata. Increased attention to subject description in IR records may provide an opportunity to improve the search visibility of the content. Research limitations/implications The study is limited to a particular IR. Data from Google Analytics does not provide comprehensive search query data. Originality/value The study presents a novel method for analyzing user search behavior to assist IR managers in determining whether to invest in applying controlled subject vocabularies to IR content.


2017 ◽  
Vol 35 (4) ◽  
pp. 650-666 ◽  
Author(s):  
Dan Wu ◽  
Renmin Bi

Purpose This paper discusses the differences in search pattern transitions for mobile phone, tablet and desktop devices by mining the transaction log data of a library online public access catalogue (OPAC). We aimed to analyze the impacts of different devices on user search behavior and provide constructive suggestions for the development of library OPACs on different devices. Design/methodology/approach Based on transaction logs which are 9 GB in size and contain 16,140,509 records of a university library OPAC, statistics and clustering were used to analyze the differences in search pattern transitions on different devices in terms of two aspects: search field transition patterns and query reformulation patterns. Findings Search field transition patterns are influenced by the input function and user interfaces of different devices. As reformulation times increase, the differences between query reformulation patterns among different devices decrease. Practical implications Mobile-side libraries need to optimize user interfaces, for example by setting web page labels and improving input capabilities. Desk-side libraries can add more suggestive content on the interface. Originality/value Unlike previous studies, which have analyzed web search, this paper focuses on library OPAC search. The search function of mobile phones, tablets and desktops were found to be asymptotic, which was a good illustration of how devices have a large impact on user search behavior.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Author(s):  
Nikitha Rao ◽  
Chetan Bansal ◽  
Thomas Zimmermann ◽  
Ahmed Hassan Awadallah ◽  
Nachiappan Nagappan

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4804
Author(s):  
Marcin Piekarczyk ◽  
Olaf Bar ◽  
Łukasz Bibrzycki ◽  
Michał Niedźwiecki ◽  
Krzysztof Rzecki ◽  
...  

Gamification is known to enhance users’ participation in education and research projects that follow the citizen science paradigm. The Cosmic Ray Extremely Distributed Observatory (CREDO) experiment is designed for the large-scale study of various radiation forms that continuously reach the Earth from space, collectively known as cosmic rays. The CREDO Detector app relies on a network of involved users and is now working worldwide across phones and other CMOS sensor-equipped devices. To broaden the user base and activate current users, CREDO extensively uses the gamification solutions like the periodical Particle Hunters Competition. However, the adverse effect of gamification is that the number of artefacts, i.e., signals unrelated to cosmic ray detection or openly related to cheating, substantially increases. To tag the artefacts appearing in the CREDO database we propose the method based on machine learning. The approach involves training the Convolutional Neural Network (CNN) to recognise the morphological difference between signals and artefacts. As a result we obtain the CNN-based trigger which is able to mimic the signal vs. artefact assignments of human annotators as closely as possible. To enhance the method, the input image signal is adaptively thresholded and then transformed using Daubechies wavelets. In this exploratory study, we use wavelet transforms to amplify distinctive image features. As a result, we obtain a very good recognition ratio of almost 99% for both signal and artefacts. The proposed solution allows eliminating the manual supervision of the competition process.


Sign in / Sign up

Export Citation Format

Share Document