scholarly journals Genetic Algorithm Based Feature Selection In a Recognition Scheme Using Adaptive Neuro Fuzzy Techniques

Author(s):  
Mahua Bhattacharya ◽  
Arpita Das

The problem of feature selection consists of finding a significant feature subset of input training as well as test patterns that enable to describe all information required to classify a particular pattern. In present paper we focus in this particular problem which plays a key role in machine learning problems. In fact, before building a model for feature selection, our goal is to identify and to reject the features that degrade the classification performance of a classifier. This is especially true when the available input feature space is very large, and need exists to develop an efficient searching algorithm to combine these features spaces to a few significant one which are capable to represent that particular class. Presently, authors have described two approaches for combining the large feature spaces to efficient numbers using Genetic Algorithm and Fuzzy Clustering techniques. Finally the classification of patterns has been achieved using adaptive neuro-fuzzy techniques. The aim of entire work is to implement the recognition scheme for classification of tumor lesions appearing in human brain as space occupying lesions identified by CT and MR images. A part of the work has been presented in this paper. The proposed model indicates a promising direction for adaptation in a changing environment.

Author(s):  
Alok Kumar Shukla ◽  
Pradeep Singh ◽  
Manu Vardhan

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).


Author(s):  
J. O. Jooda ◽  
A. O. Oke ◽  
E. O. Omidiora ◽  
O. T. Adedeji

Unimodal biometrics system (UBS) drawbacks include noisy data, intra-class variance, inter-class similarities, non-universality, which all affect the system's classification performance. Intramodal fingerprint fusion can overcome the limitations imposed by UBS when features are fused at the feature level as it is a good approach to boost the performance of the biometric system. However, feature level fusion leads to high dimensionality of feature space which can be overcame by Feature Selection (FS). FS improves the performance of classification by selecting only relevant and useful information from extracted feature sets being an optimization problem. Artificial Bee Colony (ABC) is an optimizing algorithm that has been frequently used in solving FS problems because of its simple concept, use of few control parameters, easy implementation and good exploration characteristics. ABC was proposed for optimized feature selection prior to the classification of Fingerprint Intramodal Biometric System (FIBS). Performance evaluation of ABC-based FIBS showed the system had a Sensitivity of 97.69% and RA of 96.76%. The developed ABC optimized feature selection reduced the high dimensionality of features space prior to classification tasks thereby increasing sensitivity and recognition accuracy of FIBS.


2018 ◽  
Vol 7 (2.11) ◽  
pp. 27 ◽  
Author(s):  
Kahkashan Kouser ◽  
Amrita Priyam

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space.    Finally, the result of GA-HDClustering  is  compared  with  the  traditional  clustering  algorithm.  For comparison different validity  matrices  such  as  Sum  of  squared  error  (SSE),  Within  Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI),   are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm. 


2012 ◽  
Vol 57 (3) ◽  
pp. 829-835 ◽  
Author(s):  
Z. Głowacz ◽  
J. Kozik

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arijit Dey ◽  
Soham Chattopadhyay ◽  
Pawan Kumar Singh ◽  
Ali Ahmadian ◽  
Massimiliano Ferrara ◽  
...  

AbstractCOVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.


Author(s):  
Nor Idayu Mahat ◽  
Maz Jamilah Masnan ◽  
Ali Yeon Md Shakaff ◽  
Ammar Zakaria ◽  
Muhd Khairulzaman Abdul Kadir

This chapter overviews the issue of multicollinearity in electronic nose (e-nose) classification and investigates some analytical solutions to deal with the problem. Multicollinearity effect may harm classification analysis from producing good parameters estimate during the construction of the classification rule. The common approach to deal with multicollinearity is feature extraction. However, the criterion used in extracting the raw features based on variances may not be appropriate for the ultimate goal of classification accuracy. Alternatively, feature selection method would be advisable as it chooses only valuable features. Two distance-based criteria in determining the right features for classification purposes, Wilk's Lambda and bounded Mahalanobis distance, are applied. Classification with features determined by bounded Mahalanobis distance statistically performs better than Wilk's Lambda. This chapter suggests that classification of e-nose with feature selection is a good choice to limit the cost of experiments and maintain good classification performance.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


Information ◽  
2019 ◽  
Vol 10 (6) ◽  
pp. 187
Author(s):  
Rattanawadee Panthong ◽  
Anongnart Srivihok

Liver cancer data always consist of a large number of multidimensional datasets. A dataset that has huge features and multiple classes may be irrelevant to the pattern classification in machine learning. Hence, feature selection improves the performance of the classification model to achieve maximum classification accuracy. The aims of the present study were to find the best feature subset and to evaluate the classification performance of the predictive model. This paper proposed a hybrid feature selection approach by combining information gain and sequential forward selection based on the class-dependent technique (IGSFS-CD) for the liver cancer classification model. Two different classifiers (decision tree and naïve Bayes) were used to evaluate feature subsets. The liver cancer datasets were obtained from the Cancer Hospital Thailand database. Three ensemble methods (ensemble classifiers, bagging, and AdaBoost) were applied to improve the performance of classification. The IGSFS-CD method provided good accuracy of 78.36% (sensitivity 0.7841 and specificity 0.9159) on LC_dataset-1. In addition, LC_dataset II delivered the best performance with an accuracy of 84.82% (sensitivity 0.8481 and specificity 0.9437). The IGSFS-CD method achieved better classification performance compared to the class-independent method. Furthermore, the best feature subset selection could help reduce the complexity of the predictive model.


2020 ◽  
Vol 2020 ◽  
pp. 1-14 ◽  
Author(s):  
Yong Liu ◽  
Shenggen Ju ◽  
Junfeng Wang ◽  
Chong Su

Feature selection method is designed to select the representative feature subsets from the original feature set by different evaluation of feature relevance, which focuses on reducing the dimension of the features while maintaining the predictive accuracy of a classifier. In this study, we propose a feature selection method for text classification based on independent feature space search. Firstly, a relative document-term frequency difference (RDTFD) method is proposed to divide the features in all text documents into two independent feature sets according to the features’ ability to discriminate the positive and negative samples, which has two important functions: one is to improve the high class correlation of the features and reduce the correlation between the features and the other is to reduce the search range of feature space and maintain appropriate feature redundancy. Secondly, the feature search strategy is used to search the optimal feature subset in independent feature space, which can improve the performance of text classification. Finally, we evaluate several experiments conduced on six benchmark corpora, the experimental results show the RDTFD method based on independent feature space search is more robust than the other feature selection methods.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1143
Author(s):  
Zhenwu Wang ◽  
Tielin Wang ◽  
Benting Wan ◽  
Mengjie Han

Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.


Sign in / Sign up

Export Citation Format

Share Document