Classifiers fusion for improved vessel recognition with application in quantification of generalized arteriolar narrowing

This paper attempts to estimate diagnostically relevant measure, i.e., Arteriovenous Ratio with an improved retinal vessel classification using feature ranking strategies and multiple classifiers decision-combination scheme. The features exploited for retinal vessel characterization are based on statistical measures of histogram, different filter responses of images and local gradient information. The feature selection process is based on two feature ranking approaches (Pearson Correlation Coefficient technique and Relief-F method) to rank the features followed by use of maximum classification accuracy of three supervised classifiers (k-Nearest Neighbor, Support Vector Machine and Naïve Bayes) as a threshold for feature subset selection. Retinal vessels are labeled using the selected feature subset and proposed hybrid classification scheme, i.e., decision fusion of multiple classifiers. The comparative analysis shows an increase in vessel classification accuracy as well as Arteriovenous Ratio calculation performance. The system is tested on three databases, a local dataset of 44 images and two publically available databases, INSPIRE-AVR containing 40 images and VICAVR containing 58 images. The local database also contains images with pathologically diseased structures. The performance of the proposed system is assessed by comparing the experimental results with the gold standard estimations as well as with the results of previous methodologies. Overall, an accuracy of 90.45%, 93.90% and 87.82% is achieved in retinal blood vessel separation with 0.0565, 0.0650 and 0.0849 mean error in Arteriovenous Ratio calculation for Local, INSPIRE-AVR and VICAVR dataset, respectively.

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

A Hybrid Feature Selection Method for Improve the Accuracy of Medical Classification Process

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9624.1111121 ◽

2021 ◽

Vol 11 (1) ◽

pp. 50-55

Author(s):

Maria Mohammad Yousef ◽

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Dimensionality Reduction ◽

Classification Accuracy ◽

Fitness Function ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

High Dimensionality ◽

Support Vector ◽

Feature Subset

Generally, medical dataset classification has become one of the biggest problems in data mining research. Every database has a given number of features but it is observed that some of these features can be redundant and can be harmful as well as disrupt the process of classification and this problem is known as a high dimensionality problem. Dimensionality reduction in data preprocessing is critical for increasing the performance of machine learning algorithms. Besides the contribution of feature subset selection in dimensionality reduction gives a significant improvement in classification accuracy. In this paper, we proposed a new hybrid feature selection approach based on (GA assisted by KNN) to deal with issues of high dimensionality in biomedical data classification. The proposed method first applies the combination between GA and KNN for feature selection to find the optimal subset of features where the classification accuracy of the k-Nearest Neighbor (kNN) method is used as the fitness function for GA. After selecting the best-suggested subset of features, Support Vector Machine (SVM) are used as the classifiers. The proposed method experiments on five medical datasets of the UCI Machine Learning Repository. It is noted that the suggested technique performs admirably on these databases, achieving higher classification accuracy while using fewer features.

Download Full-text

Sentiment Analysis Using Hybrid Feature Selection Techniques

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v4n1y2020.pp29-40 ◽

2020 ◽

Vol 4 (1) ◽

pp. 29

Author(s):

Sasan Sarbast Abdulkhaliq ◽

Aso Mohammad Darwesh

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset

Nowadays, people from every part of the world use social media and social networks to express their feelings toward different topics and aspects. One of the trendiest social media is Twitter, which is a microblogging website that provides a platform for its users to share their views and feelings about products, services, events, etc., in public. Which makes Twitter one of the most valuable sources for collecting and analyzing data by researchers and developers to reveal people sentiment about different topics and services, such as products of commercial companies, services, well-known people such as politicians and athletes, through classifying those sentiments into positive and negative. Classification of people sentiment could be automated through using machine learning algorithms and could be enhanced through using appropriate feature selection methods. We collected most recent tweets about (Amazon, Trump, Chelsea FC, CR7) using Twitter-Application Programming Interface and assigned sentiment score using lexicon rule-based approach, then proposed a machine learning model to improve classification accuracy through using hybrid feature selection method, namely, filter-based feature selection method Chi-square (Chi-2) plus wrapper-based binary coordinate ascent (Chi-2 + BCA) to select optimal subset of features from term frequency-inverse document frequency (TF-IDF) generated features for classification through support vector machine (SVM), and Bag of words generated features for logistic regression (LR) classifiers using different n-gram ranges. After comparing the hybrid (Chi-2+BCA) method with (Chi-2) selected features, and also with the classifiers without feature subset selection, results show that the hybrid feature selection method increases classification accuracy in all cases. The maximum attained accuracy with LR is 86.55% using (1 + 2 + 3-g) range, with SVM is 85.575% using the unigram range, both in the CR7 dataset.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

A novel grey‐based feature ranking method for feature subset selection

Journal of the Chinese Institute of Engineers ◽

10.1080/02533839.2008.9671405 ◽

2008 ◽

Vol 31 (3) ◽

pp. 509-514

Author(s):

Chi‐Chun Huang ◽

Hsin‐Yun Chang ◽

Cheng‐Hong Yang

Keyword(s):

Subset Selection ◽

Feature Subset Selection ◽

Ranking Method ◽

Feature Ranking ◽

Feature Subset

Download Full-text

Feature subset selection for support vector machines by incremental regularized risk minimization

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) ◽

10.1109/ijcnn.2004.1380930 ◽

2005 ◽

Cited By ~ 5

Author(s):

H. Frohlich ◽

A. Zell

Keyword(s):

Support Vector Machines ◽

Subset Selection ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

Risk Minimization ◽

Vector Machines ◽

Selection For ◽

Regularized Risk Minimization

Download Full-text

Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/5 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Optimization Algorithm ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Subset ◽

Teaching Learning Based Optimization ◽

Teaching Learning

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

Exploring Advances in Interdisciplinary Data Mining and Analytics ◽

10.4018/978-1-61350-474-1.ch006 ◽

2011 ◽

pp. 91-109

Author(s):

Amit Saxena ◽

John Wang

Keyword(s):

Classification Accuracy ◽

Nearest Neighbor ◽

Fitness Function ◽

Synthetic Data ◽

Feature Subset Selection ◽

Second Phase ◽

Data Sets ◽

Feature Subset ◽

K Nearest Neighbor ◽

Two Phase

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.

Download Full-text

Optimized Feature Subset Selection and Relevance Feedback for Image Retrieval Based on Multiresolution Enhanced Orthogonal Polynomials Model

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2015040102 ◽

2015 ◽

Vol 6 (2) ◽

pp. 25-40

Author(s):

S. Sathiya Devi

Keyword(s):

Image Retrieval ◽

Orthogonal Polynomials ◽

Relevance Feedback ◽

Nearest Neighbor ◽

Image Features ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

K Nearest Neighbor ◽

Multi Objective Genetic Algorithm

In this paper, a simple image retrieval method incorporating relevance feedback based on the multiresolution enhanced orthogonal polynomials model is proposed. In the proposed method, the low level image features such as texture, shape and color are extracted from the reordered orthogonal polynomials model coefficients and linearly combined to form a multifeature set. Then the dimensionality of the multifeature set is reduced by utilizing multi objective Genetic Algorithm (GA) and multiclass binary Support Vector Machine (SVM). The obtained optimized multifeature set is used for image retrieval. In order to improve the retrieval accuracy and to bridge the semantic gap, a correlation based k-Nearest Neighbor (k-NN) method for relevance feedback is also proposed. In this method, an appropriate relevance score is computed for each image in the database based on relevant and non relevant set chosen by the user with correlation based k-NN method. The experiments are carried out with Corel and Caltech database images and the retrieval rates are computed. The proposed method with correlation based k-NN for relevance feedback gives an average retrieval rate of 94.67%.

Download Full-text

On-line Signature Verification Based on GA-SVM

International Journal of Online Engineering (iJOE) ◽

10.3991/ijoe.v11i6.5122 ◽

2015 ◽

Vol 11 (6) ◽

pp. 49 ◽

Cited By ~ 1

Author(s):

Dong Huang ◽

Jian Gao

Keyword(s):

Genetic Algorithm ◽

Feature Subset Selection ◽

Signature Verification ◽

Support Vector ◽

Svm Classifier ◽

Support Vector Data Description ◽

Feature Subset ◽

Dynamic Features ◽

On Line ◽

One Class Classifier

With the development of pen-based mobile device, on-line signature verification is gradually becoming a kind of important biometrics verification. This thesis proposes a method of verification of on-line handwritten signatures using both Support Vector Data Description (SVM) and Genetic Algorithm (GA). A 27-parameter feature set including shape and dynamic features is extracted from the on-line signatures data. The genuine signatures of each subject are treated as target data to train the SVM classifier. As a kernel based one-class classifier, SVM can accurately describe the feature distribution of the genuine signatures and detect the forgeries. To improving the performance of the authentication method, genetic algorithm (GA) is used to optimise classifier parameters and feature subset selection. Signature data form the SVC2013 database is used to carry out verification experiments. The proposed method can achieve an average Equal Error Rate (EER) of 4.93% of the skill forgery database.

Download Full-text