Feature Selection with Rough Sets for Web Page Classification

The World revolves around the web technology at present. Every year, the Web information are exponentially growing and this information are huge and complex. The web users are difficult to classify and extract useful information from the web, because the Webinformation are noisy, redundant and irrelevant and also misclassified.Many researchers don’t have strongknowledge about the process of web page classification, techniques and methods previously used. The objective of this survey is to convey an outline of the modern techniques of Web page classification. In this survey, the recent papers in this area are selected and explored.Thus this study will help the researchers to obtain the required knowledge about the current trends in web page classification

Download Full-text

Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Context-Aware Systems and Applications ◽

10.1007/978-3-642-36642-0_32 ◽

2013 ◽

pp. 324-334

Author(s):

Ngo Van Linh ◽

Nguyen Thi Kim Anh ◽

Cao Manh Dat

Keyword(s):

Feature Selection ◽

Label Propagation ◽

Web Page ◽

Web Page Classification ◽

Link Information ◽

Page Classification

Download Full-text

An Ant Colony Optimization Based Feature Selection for Web Page Classification

The Scientific World JOURNAL ◽

10.1155/2014/649260 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 22

Author(s):

Esra Saraç ◽

Selma Ayşe Özel

Keyword(s):

Feature Selection ◽

Ant Colony Optimization ◽

Information Gain ◽

Classification Systems ◽

Ant Colony ◽

Web Pages ◽

Web Page ◽

Web Page Classification ◽

The Web ◽

Page Classification

The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines’ performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, andknearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.

Download Full-text

A genetic algorithm based optimal feature selection for Web page classification

2011 International Symposium on Innovations in Intelligent Systems and Applications ◽

10.1109/inista.2011.5946076 ◽

2011 ◽

Cited By ~ 4

Author(s):

Selma Ayse Ozel

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Web Page ◽

Web Page Classification ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature ◽

Page Classification

Download Full-text