Knowledge Based Deep Inception Model for Web Page Classification

Web Page ◽

Web Engineering ◽

Web Page Classification ◽

Domain Specific ◽

Multi Scale ◽

Knowledge Based ◽

Domain Specific Knowledge ◽

Web Page Classification is decisive for information retrieval and management task and plays an imperative role for natural language processing (NLP) problems in web engineering. Traditional machine learning algorithms excerpt covet features from web pages whereas deep leaning algorithms crave features as the network goes deeper. Pre-trained models such as BERT attains remarkable achievement for text classification and continue to show state-ofthe-art results. Knowledge Graphs can provide rich structured factual information for better language modelling and representation. In this study, we proposed an ensemble Knowledge Based Deep Inception (KBDI) approachfor web page classification by learning bidirectional contextual representation using pre-trained BERT incorporating Knowledge Graph embeddings and fine-tune the target task by applying Deep Inception network utilizing parallel multi-scale semantics. Proposed ensemble evaluates the efficacy of fusing domain specific knowledge embeddings with the pre-trained BERT model. Experimental interpretation exhibit that the proposed BERT fused KBDI model outperforms benchmark baselines and achieve better performance in contrast to other conventional approaches evaluated on web page classification datasets.

Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison

International Journal of Advanced Computer Science and Applications ◽

10.14569/ijacsa.2020.0110171 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Ansari Razali ◽

Salwani Mohd ◽

Nor Azan ◽

Faezehsadat Shahidi

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Web Page ◽

Web Page Classification ◽

Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015500011 ◽

2015 ◽

Vol 25 (01) ◽

pp. 147-168 ◽

Cited By ~ 4

Author(s):

Tao Peng ◽

Lu Liu

Keyword(s):

Information Retrieval ◽

Contextual Information ◽

Specific Information ◽

Specific Class ◽

Web Page ◽

Web Crawler ◽

Web Page Classification ◽

Domain Specific ◽

Anchor Text ◽

Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall.

Machine Learning Algorithms in Web Page Classification

Indian Journal of Science and Technology ◽

10.17485/ijst/2015/v8i31/88974 ◽

2015 ◽

Vol 8 (31) ◽

Author(s):

A. Rama ◽

B. Nagalakshmi

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Web Page ◽

Web Page Classification ◽

Machine Learning Algorithms in Web Page Classification

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2012.4508 ◽

2012 ◽

Vol 4 (5) ◽

pp. 93-101 ◽

Cited By ~ 3

Author(s):

W.A AWAD

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Web Page ◽

Web Page Classification ◽

2018 IEEE 5th International Congress on Information Science and Technology (CiSt) ◽

A Review of Machine Learning Algorithms for Web Page Classification

10.1109/cist.2018.8596420 ◽

2018 ◽

Cited By ~ 4

Author(s):

Lassri Safae ◽

Benlahmar El Habib ◽

Tragha Abderrahim

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Web Page ◽

Web Page Classification ◽

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

Comprehensive Analysis of Web Page Classifier for Fsocused Crawler

10.35940/ijitee.i7477.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 57-65

Keyword(s):

Comprehensive Analysis ◽

Support Vector ◽

Web Page ◽

Web Crawler ◽

Web Page Classification ◽

Domain Specific ◽

Classification Technique ◽

Text Page ◽

The Impact ◽

Focused Crawler collects domain specific web page from the internet. However, the performance of focused web crawler depends upon the multidimensional nature of the web page. This paper presents a comprehensive analysis of recent web page classifiers for focused crawlers and also explores the impact of web-based feature in collaboration with web classifier. It also evaluates the performance of classification technique such as Support vector machine, Naive Bayes, Linear Regression and Random Forest over web page classification. Along with that it examines the impact of web feature i.e. anchor text, Page content and link over web page classification. Finally the paper yield interesting result about the collective response of web feature and classification technique for web page classification as a relevant class and irrelevant class.