Genetic Algorithm Ensemble Filter Methods on  Kidney Disease Classification

Kidney failure will give effect to the human body, and it can lead to a series of seriously illness and even causing death. Machine learning plays important role in disease classification with high accuracy and shorter processing time as compared to clinical lab test. There are 24 attributes in the Chronic K idney Disease (CKD) clinical dataset, which is considered as too much of attributes. To improve the performance of the classification, filter feature selection methods used to reduce the dimensions of the feature and then the ensemble algorithm is used to identify the union features that selected from each filter feature selection. The filter feature selection that implemented in this research are Information Gain (IG), Chi-Squares, ReliefF and Fisher Score. Genetic Algorithm (GA) is used to select the best subset from the ensemble result of the filter feature selection. In this research, Random Forest (RF), XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Naïve Bayes classification techniques were used to diagnose the CKD. The features subset that selected are different and specialised for each classifier. By implementing the proposed method irrelevant features through filter feature selection able to reduce the burden and computational cost for the genetic algorithm. Then, the genetic algorithm able to perform better and select the best subset that able to improve the performance of the classifier with less attributes. The proposed genetic algorithm union filter feature selections improve the performance of the classification algorithm. The accuracy of RF, XGBoost, KNN and SVM can achieve to 100% and NB can achieve to 99.17%. The proposed method successfully improves the performance of the classifier by using less features as compared to other previous work.

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

Feature selection for human membrane protein type classification using filter methods

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v8.i4.pp375-381 ◽

2019 ◽

Vol 8 (4) ◽

pp. 375

Author(s):

Glenda Anak Kaya ◽

Nor Ashikin Mohamad Kamal

Keyword(s):

Feature Selection ◽

Membrane Protein ◽

Nearest Neighbor ◽

Protein Sequences ◽

Support Vector ◽

K Nearest Neighbor ◽

Specificity And Sensitivity ◽

Filter Methods ◽

Correlation Based Feature Selection ◽

Membrane Protein Type

As the number of protein sequences in the database is increasing, effective and efficient techniques are needed to make these data meaningful. These protein sequences contain redundant and irrelevant features that cause lower classification accuracy and increase the running time of the computational algorithm. In this paper, we select the best features using Minimum Redundancy Maximum Relevance(mRMR) and Correlation-based feature selection(CFS) methods. Two datasets of human membrane protein are used, S1 and S2. After the features have been selected by mRMR and CFS, K-Nearest Neighbor(KNN) and Support Vector Machine(SVM) classifiers are used to classify these membrane proteins. The performance of these techniques is measured using accuracy, specificity and sensitivity. and F-measure. The proposed algorithm managed to achieve 76% accuracy for S1 and 73% accuracy for S2. Finally, our proposed methods present competitive results when compared with the previous works on membrane protein classification.

Download Full-text

Analysis of gabor filter based features with PCA and GA for the detection of drusen in fundus images

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.8969 ◽

2018 ◽

Vol 7 (1) ◽

pp. 115

Author(s):

Sheela N. ◽

Basavaraj L.

Keyword(s):

Genetic Algorithm ◽

Nearest Neighbor ◽

Gabor Filter ◽

Age Related Macular Degeneration ◽

Misclassification Rate ◽

Support Vector ◽

K Nearest Neighbor ◽

Automated Method ◽

Age Related ◽

Predictive Rate

Human eye can be affected by different types of diseases. Age-Related Macular Degeneration (AMD) is one of the such diseases, and it mainly occurs after 50 years of age. This disease is characterized by the occurrence of yellow spots called as Drusen. In this work, an automated method for the detection of drusen in Fundus image has been developed, and it has been tested on 70 images consisting of 30 normal images and 40 images with drusen. Performance of the Support Vector Machine (SVM) and K Nearest Neighbor (KNN) classifier has been evaluated using Data's reduction using Principle Component Analysis (PCA) and Data's selection using Genetic Algorithm (GA).Performance evaluation has been done in terms of accuracy, sensitivity, specificity, misclassification rate, positive predictive rate, negative predictive rate and Youden’s Index. The proposed method has achieved highest accuracy of 98.7% when data selection using Genetic Algorithm has been applied.

Download Full-text

A Comparison of the Analysis of Methods for Feature Extraction and Classification by Wavelet Transform in SSVEP BCIs

10.21203/rs.3.rs-82008/v1 ◽

2020 ◽

Author(s):

Hoda Heidari ◽

Zahra Einalou ◽

Mehrdad Dadgostar ◽

Hamidreza Hosseinzadeh

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Wavelet Transform ◽

Decision Tree ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Iir Filters ◽

Wide Range ◽

New Feature

Abstract Most of the studies in the field of Brain-Computer Interface (BCI) based on electroencephalography have a wide range of applications. Extracting Steady State Visual Evoked Potential (SSVEP) is regarded as one of the most useful tools in BCI systems. In this study, different methods such as feature extraction with different spectral methods (Shannon entropy, skewness, kurtosis, mean, variance) (bank of filters, narrow-bank IIR filters, and wavelet transform magnitude), feature selection performed by various methods (decision tree, principle component analysis (PCA), t-test, Wilcoxon, Receiver operating characteristic (ROC)), and classification step applying k nearest neighbor (k-NN), perceptron, support vector machines (SVM), Bayesian, multiple layer perceptron (MLP) were compared from the whole stream of signal processing. Through combining such methods, the effective overview of the study indicated the accuracy of classical methods. In addition, the present study relied on a rather new feature selection described by decision tree and PCA, which is used for the BCI-SSVEP systems. Finally, the obtained accuracies were calculated based on the four recorded frequencies representing four directions including right, left, up, and down.

Download Full-text

Recognition of Common Non-Normal Walking Actions Based on Relief-F Feature Selection and Relief-Bagging-SVM

Sensors ◽

10.3390/s20051447 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1447

Author(s):

Pan Huang ◽

Yanping Li ◽

Xiaoyi Lv ◽

Wen Chen ◽

Shuxian Liu

Keyword(s):

Feature Selection ◽

Action Recognition ◽

Nearest Neighbor ◽

Health Indicators ◽

Support Vector ◽

Normal Walking ◽

K Nearest Neighbor ◽

Recognition Algorithms ◽

Medical Health ◽

Improved Algorithm

Action recognition algorithms are widely used in the fields of medical health and pedestrian dead reckoning (PDR). The classification and recognition of non-normal walking actions and normal walking actions are very important for improving the accuracy of medical health indicators and PDR steps. Existing motion recognition algorithms focus on the recognition of normal walking actions, and the recognition of non-normal walking actions common to daily life is incomplete or inaccurate, resulting in a low overall recognition accuracy. This paper proposes a microelectromechanical system (MEMS) action recognition method based on Relief-F feature selection and relief-bagging-support vector machine (SVM). Feature selection using the Relief-F algorithm reduces the dimensions by 16 and reduces the optimization time by an average of 9.55 s. Experiments show that the improved algorithm for identifying non-normal walking actions has an accuracy of 96.63%; compared with Decision Tree (DT), it increased by 11.63%; compared with k-nearest neighbor (KNN), it increased by 26.62%; and compared with random forest (RF), it increased by 11.63%. The average Area Under Curve (AUC) of the improved algorithm improved by 0.1143 compared to KNN, by 0.0235 compared to DT, and by 0.04 compared to RF.

Download Full-text

TOWARDS AN AUTOMATIC DIAGNOSIS SYSTEM FOR LUMBAR DISC HERNIATION: THE SIGNIFICANCE OF LOCAL SUBSET FEATURE SELECTION

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237218500448 ◽

2018 ◽

Vol 30 (06) ◽

pp. 1850044 ◽

Cited By ~ 1

Author(s):

Elias Ebrahimzadeh ◽

Farahnaz Fayaz ◽

Mehran Nikravan ◽

Fereshteh Ahmadi ◽

Mohammadjavad Rahimi Dolatabad

Keyword(s):

Feature Selection ◽

Lumbar Disc Herniation ◽

Disc Herniation ◽

Nearest Neighbor ◽

Lumbar Disc ◽

Support Vector ◽

K Nearest Neighbor ◽

Daily Lives ◽

Automatic Diagnosis ◽

Cad System

Herniation in the lumbar area is one of the most common diseases which results in lower back pain (LBP) causing discomfort and inconvenience in the patients’ daily lives. A computer aided diagnosis (CAD) system can be of immense benefit as it generates diagnostic results within a short time while increasing precision of diagnosis and eliminating human errors. We have proposed a new method for automatic diagnosis of lumbar disc herniation based on clinical MRI data. We use T2-W sagittal and myelograph images. The presented method has been applied on 30 clinical cases, each containing 7 discs (210 lumbar discs) for the herniation diagnosis. We employ Otsu thresholding method to extract the spinal cord from MR images of lumbar disc. A third order polynomial is then aligned on the extracted spinal cords, and by the end of preprocessing stage, all the T2-W sagittal images will have been prepared for specifying disc boundary and labeling. Having extracted an ROI for each disc, we proceed to use intensity and shape features for classification. The extracted features have been selected by Local Subset Feature Selection. The results demonstrated 91.90%, 92.38% and 95.23% accuracy for artificial neural network, K-nearest neighbor and support vector machine (SVM) classifiers respectively, indicating the superiority of the proposed method to those mentioned in similar studies.

Download Full-text

Plant disease prediction using classification algorithms

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp257-264 ◽

2021 ◽

Vol 10 (1) ◽

pp. 257

Author(s):

Maria Morgan ◽

Carla Blank ◽

Raed Seetan

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Nearest Neighbor ◽

Disease Classification ◽

Future Research ◽

Support Vector ◽

Classification Algorithms ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Artificial Neural

This paper investigates the capability of six existing classification algorithms (Artificial Neural Network, Naïve Bayes, k-Nearest Neighbor, Support Vector Machine, Decision Tree and Random Forest) in classifying and predicting diseases in soybean and mushroom datasets using datasets with numerical or categorical attributes. While many similar studies have been conducted on datasets of images to predict plant diseases, the main objective of this study is to suggest classification methods that can be used for disease classification and prediction in datasets that contain raw measurements instead of images. A fungus and a plant dataset, which had many differences, were chosen so that the findings in this paper could be applied to future research for disease prediction and classification in a variety of datasets which contain raw measurements. A key difference between the two datasets, other than one being a fungus and one being a plant, is that the mushroom dataset is balanced and only contained two classes while the soybean dataset is imbalanced and contained eighteen classes. All six algorithms performed well on the mushroom dataset, while the Artificial Neural Network and k-Nearest Neighbor algorithms performed best on the soybean dataset. The findings of this paper can be applied to future research on disease classification and prediction in a variety of dataset types such as fungi, plants, humans, and animals.

Download Full-text

Feature Selection and K-nearest Neighbor for Diagnosis Cow Disease

International journal of science, engineering, and information technology ◽

10.21107/ijseit.v5i02.10218 ◽

2021 ◽

Vol 5 (02) ◽

pp. 249-253

Author(s):

Yeni Kustiyahningsih

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Disease Classification ◽

Training Data ◽

Test Results ◽

K Nearest Neighbor ◽

Data Set ◽

Cattle Disease ◽

Cattle Diseases ◽

Cattle Breeders

The large number of cattle population that exists can increase the potential for developing cow disease. Lack of knowledge about various kinds of cattle diseases and their handling solutions is one of the causes of decreasing cow productivity. The aim of this research is to classify cattle disease quickly and accurately to assist cattle breeders in accelerating detection and handling of cattle disease. This study uses K-Nearest Neighbour (KNN) classification method with the F-Score feature selection. The KNN method is used for disease classification based on the distance between training data and test data, while F-Score feature selection is used to reduce the attribute dimensions in order to obtain the relevant attributes. The data set used was data on cattle disease in Madura with a total of 350 data consisting of 21 features and 7 classes. Data were broken down using K-fold Cross Validation using k = 5. Based on the test results, the best accuracy was obtained with the number of features = 18 and KNN (k = 3) which resulted in an accuracy of 94.28571, a recall of 0.942857 and a precision of 0.942857.

Download Full-text

Effect of information gain on document classification using k-nearest neighbor

10.26594/register.v8i1.2397 ◽

2022 ◽

Vol 8 (1) ◽

pp. 50

Author(s):

Rifki Indra Perwira ◽

Bambang Yuwono ◽

Risya Ines Putri Siswoyo ◽

Febri Liantoni ◽

Hidayatulah Himawan

Keyword(s):

Feature Selection ◽

Test Data ◽

Nearest Neighbor ◽

Intelligent System ◽

Information Gain ◽

Training Data ◽

State Universities ◽

Features Selection ◽

K Nearest Neighbor ◽

Support Students

State universities have a library as a facility to support students’ education and science, which contains various books, journals, and final assignments. An intelligent system for classifying documents is needed to ease library visitors in higher education as a form of service to students. The documents that are in the library are generally the result of research. Various complaints related to the imbalance of data texts and categories based on irrelevant document titles and words that have the ambiguity of meaning when searching for documents are the main reasons for the need for a classification system. This research uses k-Nearest Neighbor (k-NN) to categorize documents based on study interests with information gain features selection to handle unbalanced data and cosine similarity to measure the distance between test and training data. Based on the results of tests conducted with 276 training data, the highest results using the information gain selection feature using 80% training data and 20% test data produce an accuracy of 87.5% with a parameter value of k=5. The highest accuracy results of 92.9% are achieved without information gain feature selection, with the proportion of training data of 90% and 10% test data and parameters k=5, 7, and 9. This paper concludes that without information gain feature selection, the system has better accuracy than using the feature selection because every word in the document title is considered to have an essential role in forming the classification.

Download Full-text