scholarly journals An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities

Author(s):  
Jiyang Chen ◽  
Osmar R. Zaïane ◽  
Randy Goebel
Author(s):  
Yan Chen ◽  
Yan-Qing Zhang

For most Web searching applications, queries are commonly ambiguous because words or phrases have different linguistic meanings for different Web users. The conventional keyword-based search engines cannot disambiguate queries to provide relevant results matching Web users’ intents. Traditional Word Sense Disambiguation (WSD) methods use statistic models or ontology-based knowledge systems to measure associations among words. The contexts of queries are used for disambiguation in these methods. However, due to the fact that numerous combinations of words may appear in queries and documents, it is difficult to extract concepts’ relations for all possible combinations. Moreover, queries are usually short, so contexts in queries do not always provide enough information to disambiguate queries. Therefore, the traditional WSD methods are not sufficient to provide accurate search results for ambiguous queries. In this chapter, a new model, Granular Semantic Tree (GST), is introduced for more conveniently representing associations among concepts than the traditional WSD methods. Additionally, users’ preferences are used to provide personalized search results that better adapt to users’ unique intents. Fuzzy logic is used to determine the most appropriate concepts related to queries based on contexts and users’ preferences. Finally, Web pages are analyzed by the GST model. The concepts of pages for the queries are evaluated, and the pages are re-ranked according to similarities of concepts between pages and queries.


2013 ◽  
Vol 39 (3) ◽  
pp. 709-754 ◽  
Author(s):  
Antonio Di Marco ◽  
Roberto Navigli

Web search result clustering aims to facilitate information search on the Web. Rather than the results of a query being presented as a flat list, they are grouped on the basis of their similarity and subsequently shown to the user as a list of clusters. Each cluster is intended to represent a different meaning of the input query, thus taking into account the lexical ambiguity (i.e., polysemy) issue. Existing Web clustering methods typically rely on some shallow notion of textual similarity between search result snippets, however. As a result, text snippets with no word in common tend to be clustered separately even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this article we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction. Key to our approach is to first acquire the various senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on data sets of ambiguous queries, show that our approach outperforms both Web clustering and search engines.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Sign in / Sign up

Export Citation Format

Share Document