web page clustering Latest Research Papers

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

Download Full-text

Techniques for Improving Web Search by Understanding Queries

10.26686/wgtn.16985482.v1 ◽

2021 ◽

Author(s):

◽

Daniel Wayne Crabtree

Keyword(s):

Search Engines ◽

Best Practice ◽

Web Search ◽

Special Focus ◽

Clustering Methods ◽

Web Page ◽

Clustering Method ◽

Evaluation Measures ◽

Search Results ◽

Web Page Clustering

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>

Download Full-text

Translation of news reports related to COVID-19 of Japanese Linguistics based on page link mining

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189296 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8981-8988

Author(s):

Xiaohua Liu

Keyword(s):

Clustering Algorithm ◽

Web Content ◽

Web Page ◽

News Reports ◽

The Face ◽

Public Emergency ◽

Japanese Linguistics ◽

Web Page Clustering ◽

Content Clustering ◽

Basic Content

In the face of the current epidemic situation, news reports are facing the problem of higher accuracy. The speed and accuracy of public emergency news depends on the accuracy of web page links and tags clustering. An improved web page clustering method based on the combination of topic clustering and structure clustering is proposed in this paper. The algorithm takes the result of web page structure clustering as the weight factor. Combined with the web content clustering by K-means algorithm, the basic content that meets the conditions is selected. Through the improved translator of clustering algorithm, it is translated into Chinese and compared with the target content to analyze the similarity. It realized the translation aim of new crown virus epidemic related news report of Japanese Linguistics based on page link mining.

Download Full-text

Meta-heuristic approach to enhance the performance of web crawler for web page clustering and link priority evaluation

Materials Today Proceedings ◽

10.1016/j.matpr.2020.09.342 ◽

2020 ◽

Author(s):

Vandana Shrivastava ◽

Harvir Singh ◽

Arvind K. Sharma

Keyword(s):

Heuristic Approach ◽

Web Page ◽

Web Crawler ◽

Web Page Clustering

Download Full-text

Arabic Web page clustering: A review

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2017.06.002 ◽

2019 ◽

Vol 31 (1) ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Hanan M. Alghamdi ◽

Ali Selamat

Keyword(s):

Web Page ◽

Web Page Clustering

Download Full-text

A Web Page Clustering Method Based on Formal Concept Analysis

Information ◽

10.3390/info9090228 ◽

2018 ◽

Vol 9 (9) ◽

pp. 228 ◽

Cited By ~ 1

Author(s):

Zuping Zhang ◽

Jing Zhao ◽

Xiping Yan

Keyword(s):

Formal Concept Analysis ◽

Concept Lattice ◽

Concept Analysis ◽

Formal Context ◽

Formal Concept ◽

Web Pages ◽

Web Page ◽

Data Links ◽

Web Page Clustering ◽

The Web

Web page clustering is an important technology for sorting network resources. By extraction and clustering based on the similarity of the Web page, a large amount of information on a Web page can be organized effectively. In this paper, after describing the extraction of Web feature words, calculation methods for the weighting of feature words are studied deeply. Taking Web pages as objects and Web feature words as attributes, a formal context is constructed for using formal concept analysis. An algorithm for constructing a concept lattice based on cross data links was proposed and was successfully applied. This method can be used to cluster the Web pages using the concept lattice hierarchy. Experimental results indicate that the proposed algorithm is better than previous competitors with regard to time consumption and the clustering effect.

Download Full-text