FUZZY UNORDERED RULE USING GREEDY HILL CLIMBING FEATURE SELECTION METHOD: AN APPLICATION TO DIABETES CLASSIFICATION

Hayder Naser Khraibet Al-Behadili; Ku Ruhana Ku-Mahamud

doi:10.32890/jict2021.20.3.5

A Hybrid Swarm and Gravitation-based feature selection algorithm for handwritten Indic script classification problem

Complex & Intelligent Systems ◽

10.1007/s40747-020-00237-1 ◽

2021 ◽

Author(s):

Ritam Guha ◽

Manosij Ghosh ◽

Pawan Kumar Singh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Classification Model ◽

Support Vector ◽

Intermediate Step ◽

Hybrid Swarm ◽

Feature Vectors ◽

Indic Script

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.

Download Full-text

Mining of Completion Rate of Higher Education Based on Fuzzy Feature Selection Model and Machine Learning Techniques

10.31219/osf.io/wjbfk ◽

2019 ◽

Author(s):

Tahseen A. Wotaifi

Keyword(s):

Higher Education ◽

Fuzzy Logic ◽

Feature Selection ◽

The United States ◽

Ratio Method ◽

Support Vector ◽

Completion Rates ◽

Academic Degree ◽

Attribute Evaluation ◽

Fuzzy Logic Technique

In the context of the great change in the labor market and the higher education sector, great attention is given to individuals with an academic degree or the so-called graduates class. However, each educational institution has a different approach towards students who wish to complete their university degree. This study aims at (1) identifying the most important factors that directly affect the completion, and (2) predicting the completion rates of students for university degrees according to the system of higher education in the United States. Unlike previous studies, this project contributes to the use of the fuzzy logic technique on three methods for feature selection, namely the Correlation Attribute Evaluation, Relief Attribute Evaluation, and Gain Ratio Method. Since these three methods give different weight to the same attribute, the fuzzy logic technique has been used to get one weight for the attribute. A great challenge faced throughout this study is the curse of dimensionality, because the college scorecard dataset launched by the US Department of Education contains approximately (8000) educational institutions and (1825) features. Applying the method used in this study to identify important features lead to their reduction to only (79). Accordingly, two models have been used to predict the completion rates of students for their university studies which are the Random Forest and the Support Vector Regression with a Mean Absolute Error (MAE) value of (0.068) and (0.097) respectively.

Download Full-text

Effect on speech emotion classification of a feature selection approach using a convolutional neural network

PeerJ Computer Science ◽

10.7717/peerj-cs.766 ◽

2021 ◽

Vol 7 ◽

pp. e766

Author(s):

Ammar Amjad ◽

Lal Khan ◽

Hsien-Tsung Chang

Keyword(s):

Neural Network ◽

Feature Selection ◽

Convolutional Neural Network ◽

Feature Selection Method ◽

Classification Problem ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotion Classification ◽

K Nearest Neighbors ◽

Feature Selection Technique

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.

Download Full-text

Thumbs up, thumbs down: non-verbal human-robot interaction through real-time EMG classification via inductive and supervised transductive transfer learning

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-01852-z ◽

2020 ◽

Vol 11 (12) ◽

pp. 6021-6031 ◽

Cited By ~ 3

Author(s):

Jhonatan Kobylarz ◽

Jordan J. Bird ◽

Diego R. Faria ◽

Eduardo Parente Ribeiro ◽

Anikó Ekárt

Keyword(s):

Feature Selection ◽

Random Forest ◽

Transfer Learning ◽

Feature Selection Method ◽

Classification Problem ◽

Human Robot Interaction ◽

Support Vector ◽

Unseen Data ◽

Gesture Classification ◽

Transfer Method

AbstractIn this study, we present a transfer learning method for gesture classification via an inductive and supervised transductive approach with an electromyographic dataset gathered via the Myo armband. A ternary gesture classification problem is presented by states of ’thumbs up’, ’thumbs down’, and ’relax’ in order to communicate in the affirmative or negative in a non-verbal fashion to a machine. Of the nine statistical learning paradigms benchmarked over 10-fold cross validation (with three methods of feature selection), an ensemble of Random Forest and Support Vector Machine through voting achieves the best score of 91.74% with a rule-based feature selection method. When new subjects are considered, this machine learning approach fails to generalise new data, and thus the processes of Inductive and Supervised Transductive Transfer Learning are introduced with a short calibration exercise (15 s). Failure of generalisation shows that 5 s of data per-class is the strongest for classification (versus one through seven seconds) with only an accuracy of 55%, but when a short 5 s per class calibration task is introduced via the suggested transfer method, a Random Forest can then classify unseen data from the calibrated subject at an accuracy of around 97%, outperforming the 83% accuracy boasted by the proprietary Myo system. Finally, a preliminary application is presented through social interaction with a humanoid Pepper robot, where the use of our approach and a most-common-class metaclassifier achieves 100% accuracy for all trials of a ‘20 Questions’ game.

Download Full-text

An Improved Network Traffic Classification Model Based on a Support Vector Machine

Symmetry ◽

10.3390/sym12020301 ◽

2020 ◽

Vol 12 (2) ◽

pp. 301 ◽

Cited By ~ 1

Author(s):

Jie Cao ◽

Da Wang ◽

Zhaoyang Qu ◽

Hongyu Sun ◽

Bin Li ◽

...

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Network Traffic ◽

Feature Selection Method ◽

Selection Method ◽

Classification Model ◽

Support Vector ◽

Traffic Classification ◽

Generalization Ability ◽

Network Traffic Classification

Network traffic classification based on machine learning is an important branch of pattern recognition in computer science. It is a key technology for dynamic intelligent network management and enhanced network controllability. However, the traffic classification methods still facing severe challenges: The optimal set of features is difficult to determine. The classification method is highly dependent on the effective characteristic combination. Meanwhile, it is also important to balance the experience risk and generalization ability of the classifier. In this paper, an improved network traffic classification model based on a support vector machine is proposed. First, a filter-wrapper hybrid feature selection method is proposed to solve the false deletion of combined features caused by a traditional feature selection method. Second, to balance the empirical risk and generalization ability of support vector machine (SVM) traffic classification model, an improved parameter optimization algorithm is proposed. The algorithm can dynamically adjust the quadratic search area, reduce the density of quadratic mesh generation, improve the search efficiency of the algorithm, and prevent the over-fitting while optimizing the parameters. The experiments show that the improved traffic classification model achieves higher classification accuracy, lower dimension and shorter elapsed time and performs significantly better than traditional SVM and the other three typical supervised ML algorithms.

Download Full-text

Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability

Remote Sensing ◽

10.3390/rs12010113 ◽

2020 ◽

Vol 12 (1) ◽

pp. 113 ◽

Cited By ~ 12

Author(s):

Andrew Hennessy ◽

Kenneth Clarke ◽

Megan Lewis

Keyword(s):

Feature Selection ◽

Spectral Feature ◽

Feature Selection Method ◽

Selection Method ◽

Classification Model ◽

Support Vector ◽

Stepwise Discriminant Analysis ◽

Frequency Variation ◽

Spectral Feature Selection ◽

Hyperspectral Classification

Hyperspectral sensing, measuring reflectance over visible to shortwave infrared wavelengths, has enabled the classification and mapping of vegetation at a range of taxonomic scales, often down to the species level. Classification with hyperspectral measurements, acquired by narrow band spectroradiometers or imaging sensors, has generally required some form of spectral feature selection to reduce the dimensionality of the data to a level suitable for the construction of a classification model. Despite the large number of hyperspectral plant classification studies, an in-depth review of feature selection methods and resultant waveband selections has not yet been performed. Here, we present a review of the last 22 years of hyperspectral vegetation classification literature that evaluates the overall waveband selection frequency, waveband selection frequency variation by taxonomic, structural, or functional group, and the influence of feature selection choice by comparing such methods as stepwise discriminant analysis (SDA), support vector machines (SVM), and random forests (RF). This review determined that all characteristics of hyperspectral plant studies influence the wavebands selected for classification. This includes the taxonomic, structural, and functional groups of the target samples, the methods, and scale at which hyperspectral measurements are recorded, as well as the feature selection method used. Furthermore, these influences do not appear to be consistent. Moreover, the considerable variability in waveband selection caused by the feature selectors effectively masks the analysis of any variability between studies related to plant groupings. Additionally, questions are raised about the suitability of SDA as a feature selection method, with it producing waveband selections at odds with the other feature selectors. Caution is recommended when choosing a feature selector for hyperspectral plant classification: We recommend multiple methods being performed. The resultant sets of selected spectral features can either be evaluated individually by multiple classification models or combined as an ensemble for evaluation by a single classifier. Additionally, we suggest caution when relying upon waveband recommendations from the literature to guide waveband selections or classifications for new plant discrimination applications, as such recommendations appear to be weakly generalizable between studies.

Download Full-text

Text classification model for methamphetamine-related tweets in Southeast Asia using dual data preprocessing techniques

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i4.pp3617-3628 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3617

Author(s):

Narongsak Chayangkoon ◽

Anongnart Srivihok

Keyword(s):

Feature Selection ◽

Southeast Asia ◽

Text Classification ◽

Matthews Correlation Coefficient ◽

Feature Selection Method ◽

Classification Model ◽

Support Vector ◽

Drug Addicts ◽

High Area ◽

Illegal Activities

<span>Methamphetamine addiction is a prominent problem in Southeast Asia. Drug addicts often discuss illegal activities on popular social networking services. These individuals spread messages on social media as a means of both buying and selling drugs online. This paper proposes a model, the “text classification model of methamphetamine tweets in Southeast Asia” (TMTA), to identify whether a tweet from Southeast Asia is related to methamphetamine abuse. The research addresses the weakness of bag of words (BoW) by introducing BoW and Word2Vec feature selection (BWF) techniques. A domain-based feature selection method was performed using the BoW dataset and Word2Vec. The BWF dataset provided a smaller number of features than the BoW and TF–IDF dataset. We experimented with three candidate classifiers: Support vector machine (SVM), decision tree (J48) and naive bayes (NB). We found that the J48 classifier with the BWF dataset provided the best performance for the TMTA in terms of accuracy (0.815), F-measure (0.818), Kappa (0.528), Matthews correlation coefficient (0.529) and high area under the ROC Curve (0.763). Moreover, TMTA provided the lowest runtime (3.480 seconds) using the J48 with the BWF dataset.</span>

Download Full-text

Mining of Completion Rate of Higher Education Based on Fuzzy Feature Selection Model and Machine Learning Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1067.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 393-400

Keyword(s):

Higher Education ◽

Fuzzy Logic ◽

Feature Selection ◽

The United States ◽

Ratio Method ◽

Support Vector ◽

Completion Rates ◽

Academic Degree ◽

Attribute Evaluation ◽

Fuzzy Logic Technique

In the context of the great change in the labor market and the higher education sector, great attention is given to individuals with an academic degree or the so-called graduates class. However, each educational institution has a different approach towards students who wish to complete their university degree. This study aims at (1) identifying the most important factors that directly affect the completion, and (2) predicting the completion rates of students for university degrees according to the system of higher education in the United States. Unlike previous studies, this project contributes to the use of the fuzzy logic technique on three methods for feature selection, namely the Correlation Attribute Evaluation, Relief Attribute Evaluation, and Gain Ratio Method. Since these three methods give different weight to the same attribute, the fuzzy logic technique has been used to get one weight for the attribute. A great challenge faced throughout this study is the curse of dimensionality, because the college scorecard dataset launched by the US Department of Education contains approximately (8000) educational institutions and (1825) features. Applying the method used in this study to identify important features lead to their reduction to only (79). Accordingly, two models have been used to predict the completion rates of students for their university studies which are the Random Forest and the Support Vector Regression with a Mean Absolute Error (MAE) value of (0.068) and (0.097) respectively.

Download Full-text

Effective Estimation of Hourly Global Solar Radiation Using Machine Learning Algorithms

International Journal of Photoenergy ◽

10.1155/2020/8843620 ◽

2020 ◽

Vol 2020 ◽

pp. 1-26

Author(s):

Abdurrahman Burak Guher ◽

Sakir Tasdemir ◽

Bulent Yaniktepe

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Solar Radiation ◽

Target Location ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Input Variables ◽

Target Locations ◽

Estimation Models

The precise estimation of solar radiation is of great importance in solar energy applications with respect to installation and capacity. In estimate modelling on selected target locations, various computer-based and experimental methods and techniques are employed. In the present study, the Multilayer Feed-Forward Neural Network (MFFNN), K -Nearest Neighbors ( K -NN), a Library for Support Vector Machines (LibSVM), and M5 rules algorithms, which are among the Machine Learning (ML) algorithms, were used to estimate the hourly average solar radiation of two geographic locations on the same latitude. The input variables that had the most impact on solar radiation were identified and grouped as a result of 29 different applications that were developed by using 6 different feature selection methods with Waikato Environment for Knowledge Analysis (WEKA) software. Estimation models were developed by using the selected data groups and all input variables for each target location. The results show that the estimations developed with the feature selection method were more successful for target locations, and the radiation potentials were similar. The performance of the estimation models was evaluated by comparing each model with different statistical indicators and with previous studies. According to the RMSE, MAE, R 2 , and SMAPE statistical scales, the results of the most successful estimation models that were developed with MFFNN were 0.0508-0.0536, 0.0341-0.0352, 0.9488-0.9656, and 7.77%-7.79%, respectively.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text