Research on Web Intelligent Information Extraction Method

2014 ◽  
Vol 539 ◽  
pp. 464-468
Author(s):  
Zhi Min Wang

The paper introduces segmentation ideas in the pretreatment process of web page. By page segmentation technique to extract the accurate information in the extract region, the region was processed to extract according to the rules of ontology extraction , and ultimately get the information you need. Through experiments on two real datasets and compare with related work, experimental results show that this method can achieve good extraction results.

2013 ◽  
Vol 347-350 ◽  
pp. 2479-2482
Author(s):  
Yao Hui Li ◽  
Li Xia Wang ◽  
Jian Xiong Wang ◽  
Jie Yue ◽  
Ming Zhan Zhao

The Web has become the largest information source, but the noise content is an inevitable part in any web pages. The noise content reduces the nicety of search engine and increases the load of server. Information extraction technology has been developed. Information extraction technology is mostly based on page segmentation. Through analyzed the existing method of page segmentation, an approach of web page information extraction is provided. The block node is identified by analyzing attributes of HTML tags. This algorithm is easy to implementation. Experiments prove its good performance.


2014 ◽  
Vol 668-669 ◽  
pp. 1198-1201
Author(s):  
Hong Mei Zhu ◽  
Liang Zhang ◽  
Wei Sun

In semantic Web, extensive reuse of existing large ontology is one of the central ideas of ontology engineering. Ontology extraction should return relative sub-ontology that covers some sub-vocabulary. The efficiency of the existing ontology extraction algorithm is relatively low when they try to get a suitable ontology module from ontology at run time. This paper proposed a kind of ontology module extraction method. Related concepts and criterions of ontology modules extraction are studied; data structures and identification and evaluation methods of ontology module extraction are discussed; preliminary experimental results and the corresponding analysis are also shown.


2013 ◽  
Vol 427-429 ◽  
pp. 2489-2492 ◽  
Author(s):  
Tian Yu Zhao ◽  
Jian Yi Liu ◽  
Ru Zhang

Rich information is contributed to microblogs by millions of users all around the world. However, few work has been done on the study of microblog web page extraction so far. We proposed a unified structured information extraction method based on hierarchical clustering which is suitable for microblog web pages of any microblog websites. The experiment result on microblog web pages of some popular microblog service providers indicates the high performance of our method.


Sign in / Sign up

Export Citation Format

Share Document