Web Image Classification Using an Optimized Feature Set
Redundant images currently abundant in World Wide Web pages need to be removed in order to transform or simplify the Web pages for suitable display in small-screened devices. Classifying removable images on the Web pages according to their uniqueness of content will allow simpler representation of Web pages. For such classification, machine learning based methods can be used to categorize images into two groups; eliminable and non-eliminable. We use two representative learning methods, the Naïve Bayesian classifier and C4.5 decision trees. For our Web image classification, we propose new features that have expressive power for Web images to be classified. We apply image samples to the two classifiers and analyze the results. In addition, we propose an algorithm to construct an optimized subset from a whole feature set, which includes most influential features for the purposes of classification. By using the optimized feature set, the accuracy of classification is found to improve markedly.