A Comparative Study of Two State-of-the-Art Feature Selection Algorithms for Texture-Based Pixel-Labeling Task of Ancient Documents

Recently, texture features have been widely used for historical document image analysis. However, few studies have focused exclusively on feature selection algorithms for historical document image analysis. Indeed, an important need has emerged to use a feature selection algorithm in data mining and machine learning tasks, since it helps to reduce the data dimensionality and to increase the algorithm performance such as a pixel classification algorithm. Therefore, in this paper we propose a comparative study of two conventional feature selection algorithms, genetic algorithm and ReliefF algorithm, using a classical pixel-labeling scheme based on analyzing and selecting texture features. The two assessed feature selection algorithms in this study have been applied on a training set of the HBR dataset in order to deduce the most selected texture features of each analyzed texture-based feature set. The evaluated feature sets in this study consist of numerous state-of-the-art texture features (Tamura, local binary patterns, gray-level run-length matrix, auto-correlation function, gray-level co-occurrence matrix, Gabor filters, Three-level Haar wavelet transform, three-level wavelet transform using 3-tap Daubechies filter and three-level wavelet transform using 4-tap Daubechies filter). In our experiments, a public corpus of historical document images provided in the context of the historical book recognition contest (HBR2013 dataset: PRImA, Salford, UK) has been used. Qualitative and numerical experiments are given in this study in order to provide a set of comprehensive guidelines on the strengths and the weaknesses of each assessed feature selection algorithm according to the used texture feature set.

Download Full-text

An Enhancement of Feature Selection Algorithm for EDM: A Review

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i5.661 ◽

2018 ◽

Vol 8 (5) ◽

pp. 29

Author(s):

Manpreet Kaur ◽

Chamkaur Singh

Keyword(s):

Feature Selection ◽

Educational Data Mining ◽

Problem Formulation ◽

Research Area ◽

Education Quality ◽

Educational Institutions ◽

Selection Algorithm ◽

Positive Role ◽

Data Set ◽

Selection Algorithms

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.

Download Full-text

A NOVEL FEATURE SELECTION ALGORITHM WITH SUPERVISED MUTUAL INFORMATION FOR CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013500279 ◽

2013 ◽

Vol 22 (04) ◽

pp. 1350027

Author(s):

JAGANATHAN PALANICHAMY ◽

KUPPUCHAMY RAMASAMY

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Selection ◽

Mutual Information ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class A ◽

Selection Algorithms ◽

The Relationship ◽

Class Variable

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.

Download Full-text

A novel effective feature selection algorithm based on S-PCA and wavelet transform features in EEG signal classification

2011 IEEE 3rd International Conference on Communication Software and Networks ◽

10.1109/iccsn.2011.6014686 ◽

2011 ◽

Cited By ~ 7

Author(s):

Saadat Nasehi ◽

Hossein Pourghassem

Keyword(s):

Feature Selection ◽

Wavelet Transform ◽

Signal Classification ◽

Eeg Signal ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Eeg Signal Classification

Download Full-text

Performance Evaluation of Feature Selection Algorithms Applied to Online Learning in Concept Drift Environments

10.5753/eniac.2018.4438 ◽

2018 ◽

Author(s):

Matheus B. De Moraes ◽

André L. S. Gradvohl

Keyword(s):

Feature Selection ◽

Information Gain ◽

Concept Drift ◽

Computational Cost ◽

Selection Algorithm ◽

Information Need ◽

High Speeds ◽

Online Feature Selection ◽

Classification Tasks ◽

Selection Algorithms

Data streams are transmitted at high speeds with huge volume and may contain critical information need processing in real-time. Hence, to reduce computational cost and time, the system may apply a feature selection algorithm. However, this is not a trivial task due to the concept drift. In this work, we show that two feature selection algorithms, Information Gain and Online Feature Selection, present lower performance when compared to classification tasks without feature selection. Both algorithms presented more relevant results in one distinct scenario each, showing final accuracies up to 14% higher. The experiments using both real and artificial datasets present a potential for using these methods due to their better adaptability in some concept drift situations.

Download Full-text

Optimum Feature Subset for Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches

Applied Engineering in Agriculture ◽

10.13031/aea.12938 ◽

2019 ◽

Vol 35 (1) ◽

pp. 9-14 ◽

Cited By ~ 3

Author(s):

P. S. Maya Gopal ◽

R Bhargavi

Keyword(s):

Feature Selection ◽

Linear Regression ◽

Multiple Linear Regression ◽

Crop Yield ◽

Computational Time ◽

Feature Subset ◽

Yield Prediction ◽

Selection Algorithm ◽

Cultivable Land ◽

Selection Algorithms

Abstract. In agriculture, crop yield prediction is critical. Crop yield depends on various features which can be categorized as geographical, climatic, and biological. Geographical features consist of cultivable land in hectares, canal length to cover the cultivable land, number of tanks and tube wells available for irrigation. Climatic features consist of rainfall, temperature, and radiation. Biological features consist of seeds, minerals, and nutrients. In total, 15 features were considered for this study to understand features impact on paddy crop yield for all seasons of each year. For selecting vital features, five filter and wrapper approaches were applied. For predicting accuracy of features selection algorithm, Multiple Linear Regression (MLR) model was used. The RMSE, MAE, R, and RRMSE metrics were used to evaluate the performance of feature selection algorithms. Data used for the analysis was drawn from secondary sources of state Agriculture Department, Government of Tamil Nadu, India, for over 30 years. Seventy-five percent of data was used for training and 25% was used for testing. Low computational time was also considered for the selection of best feature subset. Outcome of all feature selection algorithms have given similar results in the RMSE, RRMSE, R, and MAE values. The adjusted R2 value was used to find the optimum feature subset despite all the deviations. The evaluation of the dataset used in this work shows that total area of cultivation, number of tanks and open wells used for irrigation, length of canals used for irrigation, and average maximum temperature during the season of the crop are the best features for better crop yield prediction on the study area. The MLR gives 85% of model accuracy for the selected features with low computational time. Keywords: Feature selection algorithm, Model validation, Multiple linear regression, Performance metrics.

Download Full-text

Revealing False Positive Features in Epileptic EEG Identification

International Journal of Neural Systems ◽

10.1142/s0129065720500173 ◽

2020 ◽

Vol 30 (11) ◽

pp. 2050017 ◽

Cited By ~ 1

Author(s):

Jian Lian ◽

Yunfeng Shi ◽

Yan Zhang ◽

Weikuan Jia ◽

Xiaojun Fan ◽

...

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

State Of The Art ◽

The State ◽

Support Vector ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Eeg Signals ◽

Eeg Classification

Feature selection plays a vital role in the detection and discrimination of epileptic seizures in electroencephalogram (EEG) signals. The state-of-the-art EEG classification techniques commonly entail the extraction of the multiple features that would be fed into classifiers. For some techniques, the feature selection strategies have been used to reduce the dimensionality of the entire feature space. However, most of these approaches focus on the performance of classifiers while neglecting the association between the feature and the EEG activity itself. To enhance the inner relationship between the feature subset and the epileptic EEG task with a promising classification accuracy, we propose a machine learning-based pipeline using a novel feature selection algorithm built upon a knockoff filter. First, a number of temporal, spectral, and spatial features are extracted from the raw EEG signals. Second, the proposed feature selection algorithm is exploited to obtain the optimal subgroup of features. Afterwards, three classifiers including [Formula: see text]-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) are used. The experimental results on the Bonn dataset demonstrate that the proposed approach outperforms the state-of-the-art techniques, with accuracy as high as 99.93% for normal and interictal EEG discrimination and 98.95% for interictal and ictal EEG classification. Meanwhile, it has achieved satisfactory sensitivity (95.67% in average), specificity (98.83% in average), and accuracy (98.89% in average) over the Freiburg dataset.

Download Full-text

Proportional Hybrid Mechanism for Population Based Feature Selection Algorithm

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622014500096 ◽

2017 ◽

Vol 16 (05) ◽

pp. 1309-1338 ◽

Cited By ~ 4

Author(s):

Pin Wang ◽

Yongming Li ◽

Bohan Chen ◽

Xianling Hu ◽

Jin Yan ◽

...

Keyword(s):

Feature Selection ◽

High Efficiency ◽

Search Algorithm ◽

Population Based ◽

Time Cost ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Filter Model ◽

Hybrid Mechanism ◽

Selection Algorithms

Feature selection is an important research field for pattern classification, data mining, etc. Population-based optimization algorithms (POA) have high parallelism and are widely used as search algorithm for feature selection. Population-based feature selection algorithms (PFSA) involve compromise between precision and time cost. In order to optimize the PFSA, the feature selection models need to be improved. Feature selection algorithms broadly fall into two categories: the filter model and the wrapper model. The filter model is fast but less precise; while the wrapper model is more precise but generally computationally more intensive. In this paper, we proposed a new mechanism — proportional hybrid mechanism (PHM) to combine the advantages of filter and wrapper models. The mechanism can be applied in PFSA to improve their performance. Genetic algorithm (GA) has been applied in many kinds of feature selection problems as search algorithm because of its high efficiency and implicit parallelism. Therefore, GAs are used in this paper. In order to validate the mechanism, seven datasets from university of California Irvine (UCI) database and artificial toy datasets are tested. The experiments are carried out for different GAs, classifiers, and evaluation criteria, the results show that with the introduction of PHM, the GA-based feature selection algorithm can be improved in both time cost and classification accuracy. Moreover, the comparison of GA-based, PSO-based and some other feature selection algorithms demonstrate that the PHM can be used in other population-based feature selection algorithms and obtain satisfying results.

Download Full-text

A Feature Selection Algorithm Integrating Maximum Classification Information and Minimum Interaction Feature Dependency Information

Computational Intelligence and Neuroscience ◽

10.1155/2021/3569632 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Li Zhang

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Information Gain ◽

Small Sample ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Class Labels ◽

Minimum Interaction ◽

Classification Information ◽

Selection Algorithms

Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.

Download Full-text

Feature Selection Algorithm Using Relative Odds for Data Mining Classification

Big Data Analytics for Sustainable Computing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9750-6.ch005 ◽

2020 ◽

pp. 81-106 ◽

Cited By ~ 3

Author(s):

Donald Douglas Atsa'am

Keyword(s):

Feature Selection ◽

Binary Classification ◽

Initial Step ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Classification Problems ◽

Odds Ratios ◽

Relative Odds ◽

Importance Ranking ◽

Selection Algorithms

A filter feature selection algorithm is developed and its performance tested. In the initial step, the algorithm dichotomizes the dataset then separately computes the association between each predictor and the class variable using relative odds (odds ratios). The value of the odds ratios becomes the importance ranking of the corresponding explanatory variable in determining the output. Logistic regression classification is deployed to test the performance of the new algorithm in comparison with three existing feature selection algorithms: the Fisher index, Pearson's correlation, and the varImp function. A number of experimental datasets are employed, and in most cases, the subsets selected by the new algorithm produced models with higher classification accuracy than the subsets suggested by the existing feature selection algorithms. Therefore, the proposed algorithm is a reliable alternative in filter feature selection for binary classification problems.

Download Full-text

An Efficient Cooperative Smearing Technique for Degraded Historical Document Image Segmentation

International Journal of Image and Graphics ◽

10.1142/s0219467821500121 ◽

2020 ◽

pp. 2150012

Author(s):

Omar Boudraa ◽

Walid Khaled Hidouci ◽

Dominique Michelucci

Keyword(s):

Image Analysis ◽

Image Segmentation ◽

Document Image ◽

Regions Of Interest ◽

Document Images ◽

Historical Document ◽

Document Image Analysis ◽

Degraded Document ◽

Document Image Segmentation

Segmentation is one of the critical steps in historical document image analysis systems that determines the quality of the search, understanding, recognition and interpretation processes. It allows isolating the objects to be considered and separating the regions of interest (paragraphs, lines, words and characters) from other entities (figures, graphs, tables, etc.). This stage follows the thresholding, which aims to improve the quality of the document and to extract its background from its foreground, also for detecting and correcting the skew that leads to redress the document. Here, a hybrid method is proposed in order to locate words and characters in both handwritten and printed documents. Numerical results prove the robustness and the high precision of our approach applied on old degraded document images over four common datasets, in which the pair (Recall, Precision) reaches approximately 97.7% and 97.9%.

Download Full-text