Semi-supervised Graph-based Genre Classification for Web Pages

Effective Genre Classification - Understanding Url And Webpage Attributes For Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1191.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 2011-2016

Keyword(s):

The Internet ◽

Web Pages ◽

Web Page ◽

Exchange Method ◽

Genre Classification ◽

Internet Application ◽

Reliable Classification ◽

Time Eating ◽

The Web

With the boom in the number of internet pages, it is very hard to discover desired records effortlessly and fast out of heaps of web pages retrieved with the aid of a search engine. there may be a increasing requirement for automatic type strategies with more class accuracy. There are a few conditions these days in which it's far vital to have an green and reliable classification of a web-web page from the information contained within the URL (Uniform aid Locator) handiest, with out the want to go to the web page itself. We want to understand if the URL can be used by us while not having to look and visit the page due to numerous motives. Getting the web page content material and sorting them to discover the genre of the net web page is very time ingesting and calls for the consumer to recognize the shape of the web page which needs to be categorised. To avoid this time-eating technique we proposed an exchange method so one can help us get the genre of the entered URL based of the entered URL and the metadata i.e., description, keywords used in the website along side the title of the web site. This approach does not most effective rely upon URL however also content from the internet application. The proposed gadget can be evaluated using numerous available datasets.

Download Full-text

Using Visual Features for Fine-Grained Genre Classification of Web Pages

10.1109/hicss.2008.488 ◽

2008 ◽

Cited By ~ 8

Author(s):

Ryan Levering ◽

Michal Cutler ◽

Lei Yu

Keyword(s):

Visual Features ◽

Web Pages ◽

Fine Grained ◽

Genre Classification

Download Full-text

A Multi-label and Adaptive Genre Classification of Web Pages

2012 11th International Conference on Machine Learning and Applications ◽

10.1109/icmla.2012.106 ◽

2012 ◽

Cited By ~ 5

Author(s):

Chaker Jebari ◽

M. Arif Wani

Keyword(s):

Web Pages ◽

Genre Classification

Download Full-text

Multi-Label Genre Classification of Web Pages Using an Adaptive Centroid-Based Classifier

Journal of Information & Knowledge Management ◽

10.1142/s0219649216500088 ◽

2016 ◽

Vol 15 (01) ◽

pp. 1650008 ◽

Cited By ~ 2

Author(s):

Chaker Jebari

Keyword(s):

Computational Complexity ◽

Rapid Evolution ◽

Classification Method ◽

Training Dataset ◽

Complex Object ◽

Web Pages ◽

Web Page ◽

Adaptive Classification ◽

Genre Classification

This paper proposes an adaptive centroid-based classifier (ACC) for multi-label classification of web pages. Using a set of multi-genre training dataset, ACC constructs a centroid for each genre. To deal with the rapid evolution of web genres, ACC implements an adaptive classification method where web pages are classified one by one. For each web page, ACC calculated its similarity with all genre centroids. Based on this similarity, ACC either adjusts the genre centroid by including the new web page or discards it. A web page is a complex object that contains different sections belonging to different genres. To handle this complexity, ACC implements a multi-label classification where a web page can be assigned to multiple genres at the same time. To improve the performance of genre classification, we propose to aggregate the classifications produced using character n-grams extracted from URL, title, headings and anchors. Experiments conducted using a known multi-label dataset show that ACC outperforms many other multi-label classifiers and has the lowest computational complexity.

Download Full-text