scholarly journals FUZZY UNORDERED RULE USING GREEDY HILL CLIMBING FEATURE SELECTION METHOD: AN APPLICATION TO DIABETES CLASSIFICATION

2021 ◽  
Vol 20 (Number 3) ◽  
pp. 391-422
Author(s):  
Hayder Naser Khraibet Al-Behadili ◽  
Ku Ruhana Ku-Mahamud

Diabetes classification is one of the most crucial applications of healthcare diagnosis. Even though various studies have been conducted in this application, the classification problem remains challenging. Fuzzy logic techniques have recently obtained impressive achievements in different application domains especially medical diagnosis. Fuzzy logic technique is not able to deal with data of a large number of input variables in constructing a classification model. In this research, a fuzzy logic technique using greedy hill climbing feature selection methods was proposed for the classification of diabetes. A dataset of 520 patients from the Hospital of Sylhet in Bangladesh was used to train and evaluate the proposed classifier. Six classification criteria were considered to authenticate the results of the proposed classifier. Comparative analysis proved the effectiveness of the proposed classifier against Naive Bayes, support vector machine, K-nearest neighbour, decision tree, and multilayer perceptron neural network classifiers. Results of the proposed classifier demonstrated the potential of fuzzy logic in analyzing diabetes patterns in all classification criteria.

Author(s):  
Ritam Guha ◽  
Manosij Ghosh ◽  
Pawan Kumar Singh ◽  
Ram Sarkar ◽  
Mita Nasipuri

AbstractIn any multi-script environment, handwritten script classification is an unavoidable pre-requisite before the document images are fed to their respective Optical Character Recognition (OCR) engines. Over the years, this complex pattern classification problem has been solved by researchers proposing various feature vectors mostly having large dimensions, thereby increasing the computation complexity of the whole classification model. Feature Selection (FS) can serve as an intermediate step to reduce the size of the feature vectors by restricting them only to the essential and relevant features. In the present work, we have addressed this issue by introducing a new FS algorithm, called Hybrid Swarm and Gravitation-based FS (HSGFS). This algorithm has been applied over three feature vectors introduced in the literature recently—Distance-Hough Transform (DHT), Histogram of Oriented Gradients (HOG), and Modified log-Gabor (MLG) filter Transform. Three state-of-the-art classifiers, namely, Multi-Layer Perceptron (MLP), K-Nearest Neighbour (KNN), and Support Vector Machine (SVM), are used to evaluate the optimal subset of features generated by the proposed FS model. Handwritten datasets at block, text line, and word level, consisting of officially recognized 12 Indic scripts, are prepared for experimentation. An average improvement in the range of 2–5% is achieved in the classification accuracy by utilizing only about 75–80% of the original feature vectors on all three datasets. The proposed method also shows better performance when compared to some popularly used FS models. The codes used for implementing HSGFS can be found in the following Github link: https://github.com/Ritam-Guha/HSGFS.


2019 ◽  
Author(s):  
Tahseen A. Wotaifi

In the context of the great change in the labor market and the higher education sector, great attention is given to individuals with an academic degree or the so-called graduates class. However, each educational institution has a different approach towards students who wish to complete their university degree. This study aims at (1) identifying the most important factors that directly affect the completion, and (2) predicting the completion rates of students for university degrees according to the system of higher education in the United States. Unlike previous studies, this project contributes to the use of the fuzzy logic technique on three methods for feature selection, namely the Correlation Attribute Evaluation, Relief Attribute Evaluation, and Gain Ratio Method. Since these three methods give different weight to the same attribute, the fuzzy logic technique has been used to get one weight for the attribute. A great challenge faced throughout this study is the curse of dimensionality, because the college scorecard dataset launched by the US Department of Education contains approximately (8000) educational institutions and (1825) features. Applying the method used in this study to identify important features lead to their reduction to only (79). Accordingly, two models have been used to predict the completion rates of students for their university studies which are the Random Forest and the Support Vector Regression with a Mean Absolute Error (MAE) value of (0.068) and (0.097) respectively.


2021 ◽  
Vol 7 ◽  
pp. e766
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Speech emotion recognition (SER) is a challenging issue because it is not clear which features are effective for classification. Emotionally related features are always extracted from speech signals for emotional classification. Handcrafted features are mainly used for emotional identification from audio signals. However, these features are not sufficient to correctly identify the emotional state of the speaker. The advantages of a deep convolutional neural network (DCNN) are investigated in the proposed work. A pretrained framework is used to extract the features from speech emotion databases. In this work, we adopt the feature selection (FS) approach to find the discriminative and most important features for SER. Many algorithms are used for the emotion classification problem. We use the random forest (RF), decision tree (DT), support vector machine (SVM), multilayer perceptron classifier (MLP), and k-nearest neighbors (KNN) to classify seven emotions. All experiments are performed by utilizing four different publicly accessible databases. Our method obtains accuracies of 92.02%, 88.77%, 93.61%, and 77.23% for Emo-DB, SAVEE, RAVDESS, and IEMOCAP, respectively, for speaker-dependent (SD) recognition with the feature selection method. Furthermore, compared to current handcrafted feature-based SER methods, the proposed method shows the best results for speaker-independent SER. For EMO-DB, all classifiers attain an accuracy of more than 80% with or without the feature selection technique.


2020 ◽  
Vol 11 (12) ◽  
pp. 6021-6031 ◽  
Author(s):  
Jhonatan Kobylarz ◽  
Jordan J. Bird ◽  
Diego R. Faria ◽  
Eduardo Parente Ribeiro ◽  
Anikó Ekárt

AbstractIn this study, we present a transfer learning method for gesture classification via an inductive and supervised transductive approach with an electromyographic dataset gathered via the Myo armband. A ternary gesture classification problem is presented by states of ’thumbs up’, ’thumbs down’, and ’relax’ in order to communicate in the affirmative or negative in a non-verbal fashion to a machine. Of the nine statistical learning paradigms benchmarked over 10-fold cross validation (with three methods of feature selection), an ensemble of Random Forest and Support Vector Machine through voting achieves the best score of 91.74% with a rule-based feature selection method. When new subjects are considered, this machine learning approach fails to generalise new data, and thus the processes of Inductive and Supervised Transductive Transfer Learning are introduced with a short calibration exercise (15 s). Failure of generalisation shows that 5 s of data per-class is the strongest for classification (versus one through seven seconds) with only an accuracy of 55%, but when a short 5 s per class calibration task is introduced via the suggested transfer method, a Random Forest can then classify unseen data from the calibrated subject at an accuracy of around 97%, outperforming the 83% accuracy boasted by the proprietary Myo system. Finally, a preliminary application is presented through social interaction with a humanoid Pepper robot, where the use of our approach and a most-common-class metaclassifier achieves 100% accuracy for all trials of a ‘20 Questions’ game.


Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 301 ◽  
Author(s):  
Jie Cao ◽  
Da Wang ◽  
Zhaoyang Qu ◽  
Hongyu Sun ◽  
Bin Li ◽  
...  

Network traffic classification based on machine learning is an important branch of pattern recognition in computer science. It is a key technology for dynamic intelligent network management and enhanced network controllability. However, the traffic classification methods still facing severe challenges: The optimal set of features is difficult to determine. The classification method is highly dependent on the effective characteristic combination. Meanwhile, it is also important to balance the experience risk and generalization ability of the classifier. In this paper, an improved network traffic classification model based on a support vector machine is proposed. First, a filter-wrapper hybrid feature selection method is proposed to solve the false deletion of combined features caused by a traditional feature selection method. Second, to balance the empirical risk and generalization ability of support vector machine (SVM) traffic classification model, an improved parameter optimization algorithm is proposed. The algorithm can dynamically adjust the quadratic search area, reduce the density of quadratic mesh generation, improve the search efficiency of the algorithm, and prevent the over-fitting while optimizing the parameters. The experiments show that the improved traffic classification model achieves higher classification accuracy, lower dimension and shorter elapsed time and performs significantly better than traditional SVM and the other three typical supervised ML algorithms.


2020 ◽  
Vol 12 (1) ◽  
pp. 113 ◽  
Author(s):  
Andrew Hennessy ◽  
Kenneth Clarke ◽  
Megan Lewis

Hyperspectral sensing, measuring reflectance over visible to shortwave infrared wavelengths, has enabled the classification and mapping of vegetation at a range of taxonomic scales, often down to the species level. Classification with hyperspectral measurements, acquired by narrow band spectroradiometers or imaging sensors, has generally required some form of spectral feature selection to reduce the dimensionality of the data to a level suitable for the construction of a classification model. Despite the large number of hyperspectral plant classification studies, an in-depth review of feature selection methods and resultant waveband selections has not yet been performed. Here, we present a review of the last 22 years of hyperspectral vegetation classification literature that evaluates the overall waveband selection frequency, waveband selection frequency variation by taxonomic, structural, or functional group, and the influence of feature selection choice by comparing such methods as stepwise discriminant analysis (SDA), support vector machines (SVM), and random forests (RF). This review determined that all characteristics of hyperspectral plant studies influence the wavebands selected for classification. This includes the taxonomic, structural, and functional groups of the target samples, the methods, and scale at which hyperspectral measurements are recorded, as well as the feature selection method used. Furthermore, these influences do not appear to be consistent. Moreover, the considerable variability in waveband selection caused by the feature selectors effectively masks the analysis of any variability between studies related to plant groupings. Additionally, questions are raised about the suitability of SDA as a feature selection method, with it producing waveband selections at odds with the other feature selectors. Caution is recommended when choosing a feature selector for hyperspectral plant classification: We recommend multiple methods being performed. The resultant sets of selected spectral features can either be evaluated individually by multiple classification models or combined as an ensemble for evaluation by a single classifier. Additionally, we suggest caution when relying upon waveband recommendations from the literature to guide waveband selections or classifications for new plant discrimination applications, as such recommendations appear to be weakly generalizable between studies.


Author(s):  
Narongsak Chayangkoon ◽  
Anongnart Srivihok

<span>Methamphetamine addiction is a prominent problem in Southeast Asia. Drug addicts often discuss illegal activities on popular social networking services. These individuals spread messages on social media as a means of both buying and selling drugs online. This paper proposes a model, the “text classification model of methamphetamine tweets in Southeast Asia” (TMTA), to identify whether a tweet from Southeast Asia is related to methamphetamine abuse. The research addresses the weakness of bag of words (BoW) by introducing BoW and Word2Vec feature selection (BWF) techniques. A domain-based feature selection method was performed using the BoW dataset and Word2Vec. The BWF dataset provided a smaller number of features than the BoW and TF–IDF dataset. We experimented with three candidate classifiers: Support vector machine (SVM), decision tree (J48) and naive bayes (NB). We found that the J48 classifier with the BWF dataset provided the best performance for the TMTA in terms of accuracy (0.815), F-measure (0.818), Kappa (0.528), Matthews correlation coefficient (0.529) and high area under the ROC Curve (0.763). Moreover, TMTA provided the lowest runtime (3.480 seconds) using the J48 with the BWF dataset.</span>


In the context of the great change in the labor market and the higher education sector, great attention is given to individuals with an academic degree or the so-called graduates class. However, each educational institution has a different approach towards students who wish to complete their university degree. This study aims at (1) identifying the most important factors that directly affect the completion, and (2) predicting the completion rates of students for university degrees according to the system of higher education in the United States. Unlike previous studies, this project contributes to the use of the fuzzy logic technique on three methods for feature selection, namely the Correlation Attribute Evaluation, Relief Attribute Evaluation, and Gain Ratio Method. Since these three methods give different weight to the same attribute, the fuzzy logic technique has been used to get one weight for the attribute. A great challenge faced throughout this study is the curse of dimensionality, because the college scorecard dataset launched by the US Department of Education contains approximately (8000) educational institutions and (1825) features. Applying the method used in this study to identify important features lead to their reduction to only (79). Accordingly, two models have been used to predict the completion rates of students for their university studies which are the Random Forest and the Support Vector Regression with a Mean Absolute Error (MAE) value of (0.068) and (0.097) respectively.


2020 ◽  
Vol 2020 ◽  
pp. 1-26
Author(s):  
Abdurrahman Burak Guher ◽  
Sakir Tasdemir ◽  
Bulent Yaniktepe

The precise estimation of solar radiation is of great importance in solar energy applications with respect to installation and capacity. In estimate modelling on selected target locations, various computer-based and experimental methods and techniques are employed. In the present study, the Multilayer Feed-Forward Neural Network (MFFNN), K -Nearest Neighbors ( K -NN), a Library for Support Vector Machines (LibSVM), and M5 rules algorithms, which are among the Machine Learning (ML) algorithms, were used to estimate the hourly average solar radiation of two geographic locations on the same latitude. The input variables that had the most impact on solar radiation were identified and grouped as a result of 29 different applications that were developed by using 6 different feature selection methods with Waikato Environment for Knowledge Analysis (WEKA) software. Estimation models were developed by using the selected data groups and all input variables for each target location. The results show that the estimations developed with the feature selection method were more successful for target locations, and the radiation potentials were similar. The performance of the estimation models was evaluated by comparing each model with different statistical indicators and with previous studies. According to the RMSE, MAE, R 2 , and SMAPE statistical scales, the results of the most successful estimation models that were developed with MFFNN were 0.0508-0.0536, 0.0341-0.0352, 0.9488-0.9656, and 7.77%-7.79%, respectively.


Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.


Sign in / Sign up

Export Citation Format

Share Document