Effective Feature Set Selection and Centroid Classifier Algorithm for Web Services Discovery

Author(s):  
Venkatachalam K ◽  
Karthikeyan NK

<p>Text preprocessing and document classification plays a vital role in web services discovery. Nearest centroid classifiers were mostly employed in high-dimensional application including genomics. Feature selection is a major problem in all classifiers and in this paper we propose to use an effective feature selection procedure followed by web services discovery through Centroid classifier algorithm. The task here in this problem statement is to effectively assign a document to one or more classes. Besides being simple and robust, the centroid classifier s not effectively used for document classification due to the computational complexity and larger memory requirements. We address these problems through dimensionality reduction and effective feature set selection before training and testing the classifier. Our preliminary experimentation and results shows that the proposed method outperforms other algorithms mentioned in the literature including K-Nearest neighbors, Naive Bayes classifier and Support Vector Machines.</p>

Author(s):  
Nazila Darabi ◽  
Abdalhossein Rezai ◽  
Seyedeh Shahrbanoo Falahieh Hamidpour

Breast cancer is a common cancer in female. Accurate and early detection of breast cancer can play a vital role in treatment. This paper presents and evaluates a thermogram based Computer-Aided Detection (CAD) system for the detection of breast cancer. In this CAD system, the Random Subset Feature Selection (RSFS) algorithm and hybrid of minimum Redundancy Maximum Relevance (mRMR) algorithm and Genetic Algorithm (GA) with RSFS algorithm are utilized for feature selection. In addition, the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) algorithms are utilized as classifier algorithm. The proposed CAD system is verified using MATLAB 2017 and a dataset that is composed of breast images from 78 patients. The implementation results demonstrate that using RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 85.36% and 75%, and sensitivity of 94.11% and 79.31%, respectively. In addition, using hybrid GA and RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 83.87% and 69.56%, and sensitivity of 96% and 81.81%, respectively, and using hybrid mRMR and RSFS algorithms for feature selection and kNN and SVM algorithms as classifier have accuracy of 77.41% and 73.07%, and sensitivity of 98% and 72.72%, respectively.


Author(s):  
Nida Tariq ◽  
Iqra Ijaz ◽  
Muhammad Kamran Malik ◽  
Zubair Malik ◽  
Faisal Bukhari

Urdu literature has a rich tradition of poetry, with many forms, one of which is Ghazal. Urdu poetry structures are mainly of Arabic origin. It has complex and different sentence structure compared to our daily language which makes it hard to classify. Our research is focused on the identification of poets if given with ghazals as input. Previously, no one has done this type of work. Two main factors which help categorize and classify a given text are the contents and writing style. Urdu poets like Mirza Ghalib, Mir Taqi Mir, Iqbal and many others have a different writing style and the topic of interest. Our model caters these two factors, classify ghazals using different classification models such as SVM (Support Vector Machines), Decision Tree, Random forest, Naïve Bayes and KNN (K-Nearest Neighbors). Furthermore, we have also applied feature selection techniques like chi square model and L1 based feature selection. For experimentation, we have prepared a dataset of about 4000 Ghazals. We have also compared the accuracy of different classifiers and concluded the best results for the collected dataset of Ghazals.


Author(s):  
Hedieh Sajedi ◽  
Mehran Bahador

In this paper, a new approach for segmentation and recognition of Persian handwritten numbers is presented. This method utilizes the framing feature technique in combination with outer profile feature that we named this the adapted framing feature. In our proposed approach, segmentation of the numbers into digits has been carried out automatically. In the classification stage of the proposed method, Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) are used. Experimentations are conducted on the IFHCDB database consisting 17,740 numeral images and HODA database consisting 102,352 numeral images. In isolated digit level on IFHCDB, the recognition rate of 99.27%, is achieved by using SVM with polynomial kernel. Furthermore, in isolated digit level on HODA, the recognition rate of 99.07% is achieved by using SVM with polynomial kernel. The experiments illustrate that applying our proposed method resulted higher accuracy compared to previous researches.


2008 ◽  
Vol 15 (2) ◽  
pp. 203-218
Author(s):  
Luiz E. S. Oliveira ◽  
Paulo R. Cavalin ◽  
Alceu S. Britto Jr ◽  
Alessandro L. Koerich

This paper addresses the issue of detecting defects in Pine wood using features extracted from grayscale images. The feature set proposed here is based on the concept of texture and it is computed from the co-occurrence matrices. The features provide measures of properties such as smoothness, coarseness, and regularity. Comparative experiments using a color image based feature set extracted from percentile histograms are carried to demonstrate the efficiency of the proposed feature set. Two different learning paradigms, neural networks and support vector machines, and a feature selection algorithm based on multi-objective genetic algorithms were considered in our experiments. The experimental results show that after feature selection, the grayscale image based feature set achieves very competitive performance for the problem of wood defect detection relative to the color image based features.


: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.


Sign in / Sign up

Export Citation Format

Share Document