Enhanced DBSCAN with Hierarchical Tree for Web Rule Mining

Neelima Gullipalli; Sireesha Rodda

doi:10.12694/scpe.v21i2.1645

Enhanced DBSCAN with Hierarchical Tree for Web Rule Mining

Scalable Computing Practice and Experience ◽

10.12694/scpe.v21i2.1645 ◽

2020 ◽

Vol 21 (2) ◽

pp. 189-202

Author(s):

Neelima Gullipalli ◽

Sireesha Rodda

Keyword(s):

Search Engine ◽

Hierarchical Clustering ◽

Web Mining ◽

Web Search ◽

Spatial Clustering ◽

Optimal Level ◽

Rule Mining ◽

Tree Model ◽

Hierarchical Tree ◽

High Level

Like other mining, web mining is also necessary to increase the power of web search engine to identify the intended web page and web document. While processing with large datasets, there arises several issues associated with space availability, similarity relationships between different webpage’s and running time. Hence, this paper intends to develop an enhanced web mining model based on two contributions. At first, the hierarchical tree is framed, which produces different categories of the searching queries (different web pages). Next, to hierarchical tree model, enhanced Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique model is developed by modifying the traditional DBSCAN. This technique results in proper session identification from raw data. Moreover, this technique offers the optimal level of clusters necessitated for hierarchical clustering. After hierarchical clustering, the rule mining is adopted. The traditional rule mining technique is generally based on the frequency; however, this paper intends to enhance the traditional rule mining based on utility factor as the second contribution. Hence the proposed model for web rule mining is termed as Enhanced DBSCAN-based Hierarchical Tree (EDBHT). It benefits in providing the search results depending on high level information (e.g., location), so that the ability of search engine in providing the interesting association rules can be improved. Next, to the implementation, the performance of proposed EDBHT is found to be enhanced when compared over several traditional models.

Download Full-text

Enhancing Web Search through Query Log Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch083 ◽

2011 ◽

pp. 438-442

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Web Mining ◽

Web Search ◽

Information Source ◽

Query Log ◽

Additional Information ◽

Query Logs ◽

Query Log Mining ◽

The Web

Web query log is a type of file keeping track of the activities of the users who are utilizing a search engine. Compared to traditional information retrieval setting in which documents are the only information source available, query logs are an additional information source in the Web search setting. Based on query logs, a set of Web mining techniques, such as log-based query clustering, log-based query expansion, collaborative filtering and personalized search, could be employed to improve the performance of Web search.

Download Full-text

Enhancing Web Search through Query Log Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch117 ◽

2011 ◽

pp. 758-763 ◽

Cited By ~ 2

Author(s):

Ji-Rong Wen

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Web Mining ◽

Web Search ◽

Information Source ◽

Query Log ◽

Additional Information ◽

Query Logs ◽

Query Log Mining ◽

The Web

Download Full-text

Applications of web mining - from web search engine to P2P filtering

International Conference on Informatics Research for Development of Knowledge Society Infrastructure, 2004. ICKS 2004. ◽

10.1109/icks.2004.1313420 ◽

2004 ◽

Cited By ~ 1

Author(s):

H. Kawano

Keyword(s):

Search Engine ◽

Web Mining ◽

Web Search ◽

Web Search Engine

Download Full-text

Mining related queries from Web search engine query logs using an improved association rule mining model

Journal of the American Society for Information Science and Technology ◽

10.1002/asi.20632 ◽

2007 ◽

Vol 58 (12) ◽

pp. 1871-1883 ◽

Cited By ~ 22

Author(s):

Xiaodong Shi ◽

Christopher C. Yang

Keyword(s):

Search Engine ◽

Association Rule ◽

Association Rule Mining ◽

Web Search ◽

Rule Mining ◽

Web Search Engine ◽

Query Logs ◽

Mining Model

Download Full-text

Optimization of web search engine and its application to web mining

Wuhan University Journal of Natural Sciences ◽

10.1007/s11859-009-0204-y ◽

2009 ◽

Vol 14 (2) ◽

pp. 115-118 ◽

Cited By ~ 1

Author(s):

Hao Chen ◽

Beiji Zou ◽

Naizheng Bian

Keyword(s):

Search Engine ◽

Web Mining ◽

Web Search ◽

Web Search Engine

Download Full-text

Content trust based web search engine

International Workshop on Communication Technology 2013 ◽

10.2495/cecnet130661 ◽

2014 ◽

Author(s):

Hong-Zhen Xu

Keyword(s):

Search Engine ◽

Web Search ◽

Web Search Engine

Download Full-text

A Point of Two Mode-Session Logs Based Web User Interest Prediction System From Web Search Engine

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i12.1522 ◽

2017 ◽

Vol 5 (12) ◽

pp. 15-22

Author(s):

R. Velmurugan ◽

◽

S.P. Victor

Keyword(s):

Search Engine ◽

Web Search ◽

Prediction System ◽

User Interest ◽

Web Search Engine ◽

Interest Prediction

Download Full-text

The Matter of Chance: Auditing Web Search Results Related to the 2020 U.S. Presidential Primary Elections Across Six Search Engines

Social Science Computer Review ◽

10.1177/08944393211006863 ◽

2021 ◽

pp. 089443932110068

Author(s):

Aleksandra Urman ◽

Mykola Makhortykh ◽

Roberto Ulloa

Keyword(s):

Search Engine ◽

Search Engines ◽

Large Scale ◽

Web Search ◽

Primary Elections ◽

Virtual Agents ◽

Search Results ◽

Presidential Primary ◽

Large Scale Analysis ◽

Algorithmic Information

We examine how six search engines filter and rank information in relation to the queries on the U.S. 2020 presidential primary elections under the default—that is nonpersonalized—conditions. For that, we utilize an algorithmic auditing methodology that uses virtual agents to conduct large-scale analysis of algorithmic information curation in a controlled environment. Specifically, we look at the text search results for “us elections,” “donald trump,” “joe biden,” “bernie sanders” queries on Google, Baidu, Bing, DuckDuckGo, Yahoo, and Yandex, during the 2020 primaries. Our findings indicate substantial differences in the search results between search engines and multiple discrepancies within the results generated for different agents using the same search engine. It highlights that whether users see certain information is decided by chance due to the inherent randomization of search results. We also find that some search engines prioritize different categories of information sources with respect to specific candidates. These observations demonstrate that algorithmic curation of political information can create information inequalities between the search engine users even under nonpersonalized conditions. Such inequalities are particularly troubling considering that search results are highly trusted by the public and can shift the opinions of undecided voters as demonstrated by previous research.

Download Full-text