scholarly journals Enhanced DBSCAN with Hierarchical Tree for Web Rule Mining

2020 ◽  
Vol 21 (2) ◽  
pp. 189-202
Author(s):  
Neelima Gullipalli ◽  
Sireesha Rodda

Like other mining, web mining is also necessary to increase the power of web search engine to identify the intended web page and web document. While processing with large datasets, there arises several issues associated with space availability, similarity relationships between different webpage’s and running time. Hence, this paper intends to develop an enhanced web mining model based on two contributions. At first, the hierarchical tree is framed, which produces different categories of the searching queries (different web pages). Next, to hierarchical tree model, enhanced Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique model is developed by modifying the traditional DBSCAN. This technique results in proper session identification from raw data. Moreover, this technique offers the optimal level of clusters necessitated for hierarchical clustering. After hierarchical clustering, the rule mining is adopted. The traditional rule mining technique is generally based on the frequency; however, this paper intends to enhance the traditional rule mining based on utility factor as the second contribution. Hence the proposed model for web rule mining is termed as Enhanced DBSCAN-based Hierarchical Tree (EDBHT). It benefits in providing the search results depending on high level information (e.g., location), so that the ability of search engine in providing the interesting association rules can be improved. Next, to the implementation, the performance of proposed EDBHT is found to be enhanced when compared over several traditional models.

Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


Author(s):  
Ji-Rong Wen

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.


2009 ◽  
Vol 14 (2) ◽  
pp. 115-118 ◽  
Author(s):  
Hao Chen ◽  
Beiji Zou ◽  
Naizheng Bian

2021 ◽  
pp. 089443932110068
Author(s):  
Aleksandra Urman ◽  
Mykola Makhortykh ◽  
Roberto Ulloa

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.


Sign in / Sign up

Export Citation Format

Share Document