Clustering web search results using semantic information

Author(s):  
Han Wen ◽  
Guo-Shun Huang ◽  
Zhao Li
2011 ◽  
Vol 55-57 ◽  
pp. 1418-1423
Author(s):  
Ying Zhao ◽  
Ya Jun Du ◽  
Qiang Qiang Peng

Clustering web search results is a kind of solution which help user to find the interested topic by grouping the search results. This paper presents an improved method for clustering search results focused on Chinese web pages. The main contributions of this paper are the following: First, in this paper, a method which identifies the complete semantic information phrase by comparing the attributes of base clusters in the suffix tree document model and the overlap of their document sets is presented. Second, by analyzing the content and structure of title and snippet of Chinese web search results, one way of sentence segmentation is designed and implemented to constructing suffix tree. Third, In order to better respond to the associate degree of terms, a novel method is proposed which compute the distance in sentence-grain of terms' co-occurrences. Finally, the experiment illustrates that the new clustering method provides an efficient and effective way for user browsing and locating sought information.


2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


2014 ◽  
Vol 08 (04) ◽  
pp. 515-544 ◽  
Author(s):  
Pavlos Fafalios ◽  
Panagiotis Papadakos ◽  
Yannis Tzitzikas

The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for offering advanced exploratory search services which provide an overview of the search space and allow the users to explore the related LOD. We use named entities identified in the search results for automatically connecting search hits with LOD and we consider a scenario where this entity-based integration is performed at query time with no human effort and no a-priori indexing which is beneficial in terms of configurability and freshness. However, the number of identified entities can be high and the same is true for the semantic information about these entities that can be fetched from the available LOD. To this end, in this paper we propose a Link Analysis-based method which is used for ranking (and thus selecting to show) the more important semantic information related to the search results. We report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (PageRank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.


Sign in / Sign up

Export Citation Format

Share Document