Research of Information Retrieval Based on Web Page Segmentation

2012 ◽  
Vol 204-208 ◽  
pp. 4928-4931
Author(s):  
Yang Xin Yu

A Web information retrieval algorithm based on Web page segment is designed, the key idea of which is to segment each Web page into different topic areas or segments according to its HTML tags and contents since Web pages are semi-structure. First, the algorithm builds a HTML tag tree, and then it combines nodes in the tree under the rule of content similarity and visual similarity. During the process of retrieval and ranking, the algorithm makes full use of the segmentation information to sequence the relevant pages. The experimental results show that this method is able to improve the precision in search significantly and it is also a good reference for the design of the future search engines.

Author(s):  
Cédric Pruski ◽  
Nicolas Guelfi ◽  
Chantal Reynaud

Finding relevant information on the Web is difficult for most users. Although Web search applications are improving, they must be more “intelligent” to adapt to the search domains targeted by queries, the evolution of these domains, and users’ characteristics. In this paper, the authors present the TARGET framework for Web Information Retrieval. The proposed approach relies on the use of ontologies of a particular nature, called adaptive ontologies, for representing both the search domain and a user’s profile. Unlike existing approaches on ontologies, the authors make adaptive ontologies adapt semi-automatically to the evolution of the modeled domain. The ontologies and their properties are exploited for domain specific Web search purposes. The authors propose graph-based data structures for enriching Web data in semantics, as well as define an automatic query expansion technique to adapt a query to users’ real needs. The enriched query is evaluated on the previously defined graph-based data structures representing a set of Web pages returned by a usual search engine in order to extract the most relevant information according to user needs. The overall TARGET framework is formalized using first-order logic and fully tool supported.


2011 ◽  
Vol 3 (3) ◽  
pp. 41-58 ◽  
Author(s):  
Cédric Pruski ◽  
Nicolas Guelfi ◽  
Chantal Reynaud

Finding relevant information on the Web is difficult for most users. Although Web search applications are improving, they must be more “intelligent” to adapt to the search domains targeted by queries, the evolution of these domains, and users’ characteristics. In this paper, the authors present the TARGET framework for Web Information Retrieval. The proposed approach relies on the use of ontologies of a particular nature, called adaptive ontologies, for representing both the search domain and a user’s profile. Unlike existing approaches on ontologies, the authors make adaptive ontologies adapt semi-automatically to the evolution of the modeled domain. The ontologies and their properties are exploited for domain specific Web search purposes. The authors propose graph-based data structures for enriching Web data in semantics, as well as define an automatic query expansion technique to adapt a query to users’ real needs. The enriched query is evaluated on the previously defined graph-based data structures representing a set of Web pages returned by a usual search engine in order to extract the most relevant information according to user needs. The overall TARGET framework is formalized using first-order logic and fully tool supported.


2013 ◽  
Vol 76 (1) ◽  
pp. 29-32
Author(s):  
Vikas Thada ◽  
Vivek Jaglan

Sign in / Sign up

Export Citation Format

Share Document