Enhancing CBIR Performance Using Evolutionary Algorithm-Assisted Significant Feature Selection: A Filter Approach

Author(s):  
Asha Gowda Karegowda ◽  
PT Bharathi
2020 ◽  
pp. 3397-3407
Author(s):  
Nur Syafiqah Mohd Nafis ◽  
Suryanti Awang

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arijit Dey ◽  
Soham Chattopadhyay ◽  
Pawan Kumar Singh ◽  
Ali Ahmadian ◽  
Massimiliano Ferrara ◽  
...  

AbstractCOVID-19 is a respiratory disease that causes infection in both lungs and the upper respiratory tract. The World Health Organization (WHO) has declared it a global pandemic because of its rapid spread across the globe. The most common way for COVID-19 diagnosis is real-time reverse transcription-polymerase chain reaction (RT-PCR) which takes a significant amount of time to get the result. Computer based medical image analysis is more beneficial for the diagnosis of such disease as it can give better results in less time. Computed Tomography (CT) scans are used to monitor lung diseases including COVID-19. In this work, a hybrid model for COVID-19 detection has developed which has two key stages. In the first stage, we have fine-tuned the parameters of the pre-trained convolutional neural networks (CNNs) to extract some features from the COVID-19 affected lungs. As pre-trained CNNs, we have used two standard CNNs namely, GoogleNet and ResNet18. Then, we have proposed a hybrid meta-heuristic feature selection (FS) algorithm, named as Manta Ray Foraging based Golden Ratio Optimizer (MRFGRO) to select the most significant feature subset. The proposed model is implemented over three publicly available datasets, namely, COVID-CT dataset, SARS-COV-2 dataset, and MOSMED dataset, and attains state-of-the-art classification accuracies of 99.15%, 99.42% and 95.57% respectively. Obtained results confirm that the proposed approach is quite efficient when compared to the local texture descriptors used for COVID-19 detection from chest CT-scan images.


Author(s):  
Ch. Sanjeev Kumar Dash ◽  
Ajit Kumar Behera ◽  
Sarat Chandra Nayak

This chapter presents a novel approach for classification of dataset by suitably tuning the parameters of radial basis function networks with an additional cost of feature selection. Inputting optimal and relevant set of features to a radial basis function may greatly enhance the network efficiency (in terms of accuracy) at the same time compact its size. In this chapter, the authors use information gain theory (a kind of filter approach) for reducing the features and differential evolution for tuning center and spread of radial basis functions. Different feature selection methods, handling missing values and removal of inconsistency to improve the classification accuracy of the proposed model are emphasized. The proposed approach is validated with a few benchmarking highly skewed and balanced dataset retrieved from University of California, Irvine (UCI) repository. The experimental study is encouraging to pursue further extensive research in highly skewed data.


Author(s):  
Barak Chizi ◽  
Lior Rokach ◽  
Oded Maimon

Dimensionality (i.e., the number of data set attributes or groups of attributes) constitutes a serious obstacle to the efficiency of most data mining algorithms (Maimon and Last, 2000). The main reason for this is that data mining algorithms are computationally intensive. This obstacle is sometimes known as the “curse of dimensionality” (Bellman, 1961). The objective of Feature Selection is to identify features in the data-set as important, and discard any other feature as irrelevant and redundant information. Since Feature Selection reduces the dimensionality of the data, data mining algorithms can be operated faster and more effectively by using Feature Selection. In some cases, as a result of feature selection, the performance of the data mining method can be improved. The reason for that is mainly a more compact, easily interpreted representation of the target concept. The filter approach (Kohavi , 1995; Kohavi and John ,1996) operates independently of the data mining method employed subsequently -- undesirable features are filtered out of the data before learning begins. These algorithms use heuristics based on general characteristics of the data to evaluate the merit of feature subsets. A sub-category of filter methods that will be refer to as rankers, are methods that employ some criterion to score each feature and provide a ranking. From this ordering, several feature subsets can be chosen by manually setting There are three main approaches for feature selection: wrapper, filter and embedded. The wrapper approach (Kohavi, 1995; Kohavi and John,1996), uses an inducer as a black box along with a statistical re-sampling technique such as cross-validation to select the best feature subset according to some predictive measure. The embedded approach (see for instance Guyon and Elisseeff, 2003) is similar to the wrapper approach in the sense that the features are specifically selected for a certain inducer, but it selects the features in the process of learning.


Author(s):  
Alok Kumar Shukla ◽  
Pradeep Singh ◽  
Manu Vardhan

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).


Sign in / Sign up

Export Citation Format

Share Document