Feature selection for human membrane protein type classification using filter methods

<span lang="EN-US">As the number of protein sequences in the database is increasing, effective and efficient techniques are needed to make these data meaningful. These protein sequences contain redundant and irrelevant features that cause lower classification accuracy and increase the running time of the computational algorithm. In this paper, we select the best features using Minimum Redundancy Maximum Relevance(mRMR) and Correlation-based feature selection(CFS) methods. Two datasets of human membrane protein are used, S1 and S2. After the features have been selected by mRMR and CFS, K-Nearest Neighbor(KNN) and Support Vector Machine(SVM) classifiers are used to classify these membrane proteins. The performance of these techniques is measured using accuracy, specificity and sensitivity. and F-measure. The proposed algorithm managed to achieve 76% accuracy for S1 and 73% accuracy for S2. Finally, our proposed methods present competitive results when compared with the previous works on membrane protein classification</span><span>.</span>

Download Full-text

Feature Selection Based on Cross-Correlation for the Intrusion Detection System

Security and Communication Networks ◽

10.1155/2020/8875404 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Gholamreza Farahani

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Computer Networks ◽

Cross Correlation ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Detection Systems ◽

Correlation Based Feature Selection

One of the important issues in the computer networks is security. Therefore, trusted communication of information in computer networks is a critical point. To have a safe communication, it is necessary that, in addition to the prevention mechanisms, intrusion detection systems (IDSs) are used. There are various approaches to utilize intrusion detection, but any of these systems is not complete. In this paper, a new cross-correlation-based feature selection (CCFS) method is proposed and compared with the cuttlefish algorithm (CFA) and mutual information-based feature selection (MIFS) features with use of four different classifiers: support vector machine (SVM), naive Bayes (NB), decision tree (DT), and K-nearest neighbor (KNN). The experimental results on the KDD Cup 99, NSL-KDD, AWID, and CIC-IDS2017 datasets show that the proposed method has a better performance in accuracy, precision, recall, and F1-score criteria in comparison with the other two methods in different classifiers. Also, the results on different classifiers show that the usage of the DT classifier for the proposed method is the best.

Download Full-text

Genetic Algorithm Ensemble Filter Methods on Kidney Disease Classification

International Journal of Innovative Computing ◽

10.11113/ijic.v11n2.345 ◽

2021 ◽

Vol 11 (2) ◽

pp. 73-80

Author(s):

Sharin Hazlin Huspi ◽

Chong Ke Ting

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Computational Cost ◽

Disease Classification ◽

Support Vector ◽

K Nearest Neighbor ◽

Fisher Score ◽

Filter Methods

Kidney failure will give effect to the human body, and it can lead to a series of seriously illness and even causing death. Machine learning plays important role in disease classification with high accuracy and shorter processing time as compared to clinical lab test. There are 24 attributes in the Chronic K idney Disease (CKD) clinical dataset, which is considered as too much of attributes. To improve the performance of the classification, filter feature selection methods used to reduce the dimensions of the feature and then the ensemble algorithm is used to identify the union features that selected from each filter feature selection. The filter feature selection that implemented in this research are Information Gain (IG), Chi-Squares, ReliefF and Fisher Score. Genetic Algorithm (GA) is used to select the best subset from the ensemble result of the filter feature selection. In this research, Random Forest (RF), XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes classification techniques were used to diagnose the CKD. The features subset that selected are different and specialised for each classifier. By implementing the proposed method irrelevant features through filter feature selection able to reduce the burden and computational cost for the genetic algorithm. Then, the genetic algorithm able to perform better and select the best subset that able to improve the performance of the classifier with less attributes. The proposed genetic algorithm union filter feature selections improve the performance of the classification algorithm. The accuracy of RF, XGBoost, KNN and SVM can achieve to 100% and NB can achieve to 99.17%. The proposed method successfully improves the performance of the classifier by using less features as compared to other previous work.

Download Full-text

A Comparison of the Analysis of Methods for Feature Extraction and Classification by Wavelet Transform in SSVEP BCIs

10.21203/rs.3.rs-82008/v1 ◽

2020 ◽

Author(s):

Hoda Heidari ◽

Zahra Einalou ◽

Mehrdad Dadgostar ◽

Hamidreza Hosseinzadeh

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Wavelet Transform ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Iir Filters ◽

Wide Range ◽

New Feature

Abstract Most of the studies in the field of Brain-Computer Interface (BCI) based on electroencephalography have a wide range of applications. Extracting Steady State Visual Evoked Potential (SSVEP) is regarded as one of the most useful tools in BCI systems. In this study, different methods such as feature extraction with different spectral methods (Shannon entropy, skewness, kurtosis, mean, variance) (bank of filters, narrow-bank IIR filters, and wavelet transform magnitude), feature selection performed by various methods (decision tree, principle component analysis (PCA), t-test, Wilcoxon, Receiver operating characteristic (ROC)), and classification step applying k nearest neighbor (k-NN), perceptron, support vector machines (SVM), Bayesian, multiple layer perceptron (MLP) were compared from the whole stream of signal processing. Through combining such methods, the effective overview of the study indicated the accuracy of classical methods. In addition, the present study relied on a rather new feature selection described by decision tree and PCA, which is used for the BCI-SSVEP systems. Finally, the obtained accuracies were calculated based on the four recorded frequencies representing four directions including right, left, up, and down.

Download Full-text

Recognition of Common Non-Normal Walking Actions Based on Relief-F Feature Selection and Relief-Bagging-SVM

Sensors ◽

10.3390/s20051447 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1447

Author(s):

Pan Huang ◽

Yanping Li ◽

Xiaoyi Lv ◽

Wen Chen ◽

Shuxian Liu

Keyword(s):

Feature Selection ◽

Action Recognition ◽

Nearest Neighbor ◽

Health Indicators ◽

Support Vector ◽

Normal Walking ◽

K Nearest Neighbor ◽

Recognition Algorithms ◽

Medical Health ◽

Improved Algorithm

Action recognition algorithms are widely used in the fields of medical health and pedestrian dead reckoning (PDR). The classification and recognition of non-normal walking actions and normal walking actions are very important for improving the accuracy of medical health indicators and PDR steps. Existing motion recognition algorithms focus on the recognition of normal walking actions, and the recognition of non-normal walking actions common to daily life is incomplete or inaccurate, resulting in a low overall recognition accuracy. This paper proposes a microelectromechanical system (MEMS) action recognition method based on Relief-F feature selection and relief-bagging-support vector machine (SVM). Feature selection using the Relief-F algorithm reduces the dimensions by 16 and reduces the optimization time by an average of 9.55 s. Experiments show that the improved algorithm for identifying non-normal walking actions has an accuracy of 96.63%; compared with Decision Tree (DT), it increased by 11.63%; compared with k-nearest neighbor (KNN), it increased by 26.62%; and compared with random forest (RF), it increased by 11.63%. The average Area Under Curve (AUC) of the improved algorithm improved by 0.1143 compared to KNN, by 0.0235 compared to DT, and by 0.04 compared to RF.

Download Full-text

TOWARDS AN AUTOMATIC DIAGNOSIS SYSTEM FOR LUMBAR DISC HERNIATION: THE SIGNIFICANCE OF LOCAL SUBSET FEATURE SELECTION

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237218500448 ◽

2018 ◽

Vol 30 (06) ◽

pp. 1850044 ◽

Cited By ~ 1

Author(s):

Elias Ebrahimzadeh ◽

Farahnaz Fayaz ◽

Mehran Nikravan ◽

Fereshteh Ahmadi ◽

Mohammadjavad Rahimi Dolatabad

Keyword(s):

Feature Selection ◽

Lumbar Disc Herniation ◽

Disc Herniation ◽

Nearest Neighbor ◽

Lumbar Disc ◽

Support Vector ◽

K Nearest Neighbor ◽

Daily Lives ◽

Automatic Diagnosis ◽

Cad System

Herniation in the lumbar area is one of the most common diseases which results in lower back pain (LBP) causing discomfort and inconvenience in the patients’ daily lives. A computer aided diagnosis (CAD) system can be of immense benefit as it generates diagnostic results within a short time while increasing precision of diagnosis and eliminating human errors. We have proposed a new method for automatic diagnosis of lumbar disc herniation based on clinical MRI data. We use T2-W sagittal and myelograph images. The presented method has been applied on 30 clinical cases, each containing 7 discs (210 lumbar discs) for the herniation diagnosis. We employ Otsu thresholding method to extract the spinal cord from MR images of lumbar disc. A third order polynomial is then aligned on the extracted spinal cords, and by the end of preprocessing stage, all the T2-W sagittal images will have been prepared for specifying disc boundary and labeling. Having extracted an ROI for each disc, we proceed to use intensity and shape features for classification. The extracted features have been selected by Local Subset Feature Selection. The results demonstrated 91.90%, 92.38% and 95.23% accuracy for artificial neural network, K-nearest neighbor and support vector machine (SVM) classifiers respectively, indicating the superiority of the proposed method to those mentioned in similar studies.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

An Ensemble-Based Feature Selection and Classification of Gene Expression using Support Vector Machine, K-Nearest Neighbor, Decision Tree

2019 International Conference on Communication and Electronics Systems (ICCES) ◽

10.1109/icces45898.2019.9002041 ◽

2019 ◽

Author(s):

Anu J Nair ◽

Rizwana Rasheed ◽

KM Maheeshma ◽

LS Aiswarya ◽

K R Kavitha

Keyword(s):

Gene Expression ◽

Support Vector Machine ◽

Feature Selection ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor

Download Full-text

Product Review Based Customer Sentiment Analysis using an Ensemble of mRMR and Forest Optimization Algorithm (FOA)

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2022010107 ◽

2022 ◽

Vol 13 (1) ◽

pp. 0-0

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Optimization Algorithm ◽

Nearest Neighbor ◽

Hybrid Approach ◽

Support Vector ◽

K Nearest Neighbor ◽

Feature Selection Technique ◽

Feature Selection Problem

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.

Download Full-text

FWHT-RF: A Novel Computational Approach to Predict Plant Protein-Protein Interactions via an Ensemble Learning Method

Scientific Programming ◽

10.1155/2021/1607946 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Chang-Qing Yu ◽

Zhu-Hong You ◽

Zhong-Hao Ren ◽

...

Keyword(s):

Protein Interactions ◽

Nearest Neighbor ◽

Protein Sequences ◽

Evolutionary Information ◽

Support Vector ◽

Protein Protein Interactions ◽

K Nearest Neighbor ◽

Novel Approach ◽

Knn Classifier ◽

Scoring Matrix

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.

Download Full-text

A comparison of the analysis of methods for feature extraction and classification by Wavelet transform in SSVEP BCIs

10.21203/rs.3.rs-82008/v2 ◽

2021 ◽

Author(s):

Hoda Heidari ◽

zahra einalou ◽

Mehrdad Dadgostar ◽

Hamidreza Hosseinzadeh

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Wavelet Transform ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Wide Range ◽

Vector Machines ◽

New Feature

Abstract Most of the studies in the field of Brain-Computer Interface (BCI) based on electroencephalography have a wide range of applications. Extracting Steady State Visual Evoked Potential (SSVEP) is regarded as one of the most useful tools in BCI systems. In this study, different methods which includes 1) feature extraction with different spectral methods (Shannon entropy, skewness, kurtosis, mean, variance) and wavelet transform magnitude, 2) feature selection performed by various methods (decision tree, principle component analysis (PCA), t-test, Wilcoxon, Receiver operating characteristic (ROC)), 3) classification step applying k nearest neighbor (k-NN), support vector machines (SVM), Bayesian, multiple layer perceptron (MLP) were compared from the whole stream of signal processing. Through combining such methods, the effective overview of the study indicated the accuracy of classical methods. In addition, the present study relied on a rather new feature selection described by decision tree and PCA, which is used for the BCI-SSVEP systems. Finally, the obtained accuracies were calculated based on the four recorded frequencies representing four directions including right, left, up, and down. The highest level of accuracy was obtained 91.39%.

Download Full-text