scholarly journals Tunneling enhanced by web page content block partition for focused crawling

2010 ◽  
Vol 22 (4) ◽  
pp. 538-539
Author(s):  
Tao Peng ◽  
Changli Zhang ◽  
Wanli Zuo
2011 ◽  
Vol 8 (3) ◽  
pp. 779-799 ◽  
Author(s):  
Ying Wang ◽  
Huilai Li ◽  
Wanli Zuo ◽  
Fengling He ◽  
Xin Wang ◽  
...  

Ontology plays an important role in locating Domain-Specific Deep Web contents, therefore, this paper presents a novel framework WFF for efficiently locating Domain-Specific Deep Web databases based on focused crawling and ontology by constructing Web Page Classifier(WPC), Form Structure Classifier(FSC) and Form Content Classifier(FCC) in a hierarchical fashion. Firstly, WPC discovers potentially interesting pages based on ontology-assisted focused crawler. Then, FSC analyzes the interesting pages and determines whether these pages subsume searchable forms based on structural characteristics. Lastly, FCC identifies searchable forms that belong to a given domain in the semantic level, and stores these URLs of Domain- Specific searchable forms to a database. Through a detailed experimental evaluation, WFF framework not only simplifies discovering process, but also effectively determines Domain-Specific databases.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Houqing Lu ◽  
Donghui Zhan ◽  
Lei Zhou ◽  
Dengchao He

A focused crawler is topic-specific and aims selectively to collect web pages that are relevant to a given topic from the Internet. However, the performance of the current focused crawling can easily suffer the impact of the environments of web pages and multiple topic web pages. In the crawling process, a highly relevant region may be ignored owing to the low overall relevance of that page, and anchor text or link-context may misguide crawlers. In order to solve these problems, this paper proposes a new focused crawler. First, we build a web page classifier based on improved term weighting approach (ITFIDF), in order to gain highly relevant web pages. In addition, this paper introduces an evaluation approach of the link, link priority evaluation (LPE), which combines web page content block partition algorithm and the strategy of joint feature evaluation (JFE), to better judge the relevance between URLs on the web page and the given topic. The experimental results demonstrate that the classifier using ITFIDF outperforms TFIDF, and our focused crawler is superior to other focused crawlers based on breadth-first, best-first, anchor text only, link-context only, and content block partition in terms of harvest rate and target recall. In conclusion, our methods are significant and effective for focused crawler.


2017 ◽  
Vol 53 ◽  
pp. 181-204 ◽  
Author(s):  
Ahmed I. Saleh ◽  
Arwa E. Abulwafa ◽  
Mohammed F. Al Rahmawy

2005 ◽  
Author(s):  
Aaron W. Bangor ◽  
James T. Miller
Keyword(s):  

2020 ◽  
Vol 140 (12) ◽  
pp. 1393-1401
Author(s):  
Hiroki Chinen ◽  
Hidehiro Ohki ◽  
Keiji Gyohten ◽  
Toshiya Takami

Author(s):  
Gursimran Singh ◽  
Harpreet Kaur

With the growth of website content it is become difficult to manage relations between Individual webpage and keep track of their hyperlinks within a website. This causes some Hyperlink become dead or broken. A broken Link  is a  link on a web page that no longer works. It is difficult to find out the broken link manually by checking each hyperlink individually because it is time consuming and tedious work. So to eliminate this we can use the selenium web driver tool and java code to automate testing of each hyperlink individually. The objective of this thesis is to automate finding of broken links using selenium web driver tool.


1997 ◽  
Vol 1 (3) ◽  
pp. 15-25
Author(s):  
Michael F. Hull
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document