Adaptive Multi-level Backward Tracking for Sequential Feature Selection

In the past few decades, the large amount of available data has become a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models. In this research, we propose a new technique called One-level Forward Multi-level Backward Selection (OFMB). The proposed algorithm consists of two phases. The first phase aims to create preliminarily selected subsets. The second phase provides an improvement on the previous result by an adaptive multi-level backward searching technique. Hence, the idea is to apply an improvement step during the feature addition and an adaptive search method on the backtracking step. We have tested our algorithm on twelve standard UCI datasets based on k-nearest neighbor and naive Bayes classifiers. Their accuracy was then compared with some popular methods. OFMB showed better results than the other sequential forward searching techniques for most of the tested datasets.

Download Full-text

Predicting heart failure using a wrapper-based feature selection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i3.pp1530-1539 ◽

2021 ◽

Vol 21 (3) ◽

pp. 1530

Author(s):

Minh Tuan Le ◽

Minh Thanh Vo ◽

Nhat Tan Pham ◽

Son V.T Dao

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Nearest Neighbor ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Medical Practitioners ◽

Machine Learning Model ◽

Heart Contraction ◽

Artificial Neural Network Ann

In the current health system, it is very difficult for medical practitioners/physicians to diagnose the effectiveness of heart contraction. In this research, we proposed a machine learning model to predict heart contraction using an artificial neural network (ANN). We also proposed a novel wrapper-based feature selection utilizing a grey wolf optimization (GWO) to reduce the number of required input attributes. In this work, we compared the results achieved using our method and several conventional machine learning algorithms approaches such as support vector machine, decision tree, K-nearest neighbor, naïve bayes, random forest, and logistic regression. Computational results show not only that much fewer features are needed, but also higher prediction accuracy can be achieved around 87%. This work has the potential to be applicable to clinical practice and become a supporting tool for doctors/physicians.

Download Full-text

Feature selection via computational intelligence techniques

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189090 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6205-6216

Author(s):

Ramazan Algin ◽

Ali Fuat Alkaya ◽

Mustafa Agaoglu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Naive Bayes ◽

Search Algorithms ◽

Naïve Bayes ◽

Feature Subset ◽

K Nearest Neighbor ◽

Migrating Birds Optimization

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.

Download Full-text

YouTube Spam Comment Detection Using Support Vector Machine and K–Nearest Neighbor

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i2.pp612-619 ◽

2018 ◽

Vol 12 (2) ◽

pp. 612 ◽

Cited By ~ 2

Author(s):

Aqliima Aziz ◽

Cik Feresa Mohd Foozy ◽

Palaniappan Shamala ◽

Zurinah Suradi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Feature Selection ◽

Social Networking ◽

Nearest Neighbor ◽

Good Accuracy ◽

Support Vector ◽

Learning Tools ◽

K Nearest Neighbor ◽

Accuracy Result

<p>Social networking such as YouTube, Facebook and others are very popular nowadays. The best thing about YouTube is user can subscribe also giving opinion on the comment section. However, this attract the spammer by spamming the comments on that videos. Thus, this study develop a YouTube detection framework by using Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). There are five (5) phases involved in this research such as Data Collection, Pre-processing, Feature Selection, Classification and Detection. The experiments is done by using Weka and RapidMiner. The accuracy result of SVM and KNN by using both machine learning tools show good accuracy result. Others solution to avoid spam attack is trying not to click the link on comments to avoid any problems.</p>

Download Full-text

E-Learning Recommender System for Learners: A Machine Learning based Approach

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2019.4.4-076 ◽

2019 ◽

Vol 4 (4) ◽

pp. 957-967

Author(s):

Kamika Chaudhary ◽

Neena Gupta

Keyword(s):

Machine Learning ◽

Random Forest ◽

Web Mining ◽

Nearest Neighbor ◽

Naive Bayes ◽

Naïve Bayes ◽

Second Phase ◽

Processing Unit ◽

K Nearest Neighbor ◽

E Learning

Web mining procedure helps the surfers to get the required information but finding the exact information is as good as finding a needle in a haystack. In this work, an intelligent prediction model using Tensor Flow environment for Graphics Processing Unit (GPU) devices has been designed to meet the challenges of speed and accuracy. The proposed approach is isolated into two stages: pre-processing and prediction. In the first phase, the procedure starts via looking through the URLs of various e-learning sites particular to computer science subjects. At that point, the content of looked through URLs are perused and after that from their keywords are produced identified with a particular subject in the wake of playing out the pre-processing of the content. Second phase is prediction that predicts query specific links of e-learning website. The proposed Intelligent E-learning through Web (IEW) has content mining, lexical analysis, classification and machine learning based prediction as its key features. Algorithms like SVM, Naïve Bayes, K-Nearest Neighbor, and Random Forest were tested and it was found that Random Forest gave an accuracy of 98.98%, SVM 42%, KNN 63% and Naïve Bayes 66%. Based on the results IEW uses Random forest for prediction.

Download Full-text

COVID-19 Pneumonia Level Detection using Deep Learning Algorithm

10.36227/techrxiv.12619193.v1 ◽

2020 ◽

Cited By ~ 2

Author(s):

Kayhan Ghafoor

Keyword(s):

Lung Inflammation ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Second Phase ◽

K Nearest Neighbor ◽

Deep Learning Algorithm ◽

Severe Stage ◽

Efficient Detection ◽

Two Phases

The first COVID-19 confirmed case is reported in Wuhan, China and spread across the globe with unprecedented impact on humanity. Since this pandemic requires pervasive diagnosis, it is significant to develop smart, fast and efficient detection technique. To this end, we developed an Artificial Intelligence (AI) engine to classify the lung inflammation level (mild, progressive, severe stage) of the COVID-19 confirmed patient. In particular, the developed model consists of two phases; in the first phase, we calculate the volume and density of lesions and opacities of the CT images of the confirmed COVID-19 patient using Morphological approaches. In the second phase, the second phase classifies the pneumonia level of the confirmed COVID-19 patient. To achieve precise classification of lung inflammation, we use modified Convolution Neural Network (CNN) and k-Nearest Neighbor (kNN). The result of the experiments show that the utilized models can provide the accuracy up to 95.65\% and 91.304 \% of CNN and kNN respectively.<br>

Download Full-text

K-Nearest Neighbor with K-Fold Cross Validation and Analytic Hierarchy Process on Data Classification

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1204 ◽

2021 ◽

Vol 2 (1) ◽

Author(s):

Zoelkarnain Rinanda Tembusai ◽

Herman Mawengkang ◽

Muhammad Zarlis

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Analytic Hierarchy Process ◽

Cross Validation ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Analytic Hierarchy ◽

Machine Learning Model ◽

Hierarchy Process ◽

Fold Cross Validation

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.

Download Full-text

Genetic Algorithm to Optimize k-Nearest Neighbor Parameter for Benchmarked Medical Datasets Classification

Jurnal Online Informatika ◽

10.15575/join.v5i2.656 ◽

2020 ◽

Vol 5 (2) ◽

pp. 153

Author(s):

Rizki Tri Prasetio

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Feature Selection ◽

Nearest Neighbor ◽

Learning Algorithm ◽

P Value ◽

Computer Assisted ◽

K Nearest Neighbor ◽

Forward Selection ◽

Backward Elimination

Computer assisted medical diagnosis is a major machine learning problem being researched recently. General classifiers learn from the data itself through training process, due to the inexperience of an expert in determining parameters. This research proposes a methodology based on machine learning paradigm. Integrates the search heuristic that is inspired by natural evolution called genetic algorithm with the simplest and the most used learning algorithm, k-nearest Neighbor. The genetic algorithm were used for feature selection and parameter optimization while k-nearest Neighbor were used as a classifier. The proposed method is experimented on five benchmarked medical datasets from University California Irvine Machine Learning Repository and compared with original k-NN and other feature selection algorithm i.e., forward selection, backward elimination and greedy feature selection. Experiment results show that the proposed method is able to achieve good performance with significant improvement with p value of t-Test is 0.0011.

Download Full-text

Comparative analysis of proficiencies of various textures and geometric features in breast mass classification using k-nearest neighbor

Visual Computing for Industry Biomedicine and Art ◽

10.1186/s42492-021-00100-1 ◽

2022 ◽

Vol 5 (1) ◽

Author(s):

Harmandeep Singh ◽

Vipul Sharma ◽

Damanpreet Singh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Nearest Neighbor ◽

State Of The Art ◽

Breast Mass ◽

Geometric Features ◽

K Nearest Neighbor ◽

Breast Masses ◽

Selection Algorithms

AbstractThis paper introduces a comparative analysis of the proficiencies of various textures and geometric features in the diagnosis of breast masses on mammograms. An improved machine learning-based framework was developed for this study. The proposed system was tested using 106 full field digital mammography images from the INbreast dataset, containing a total of 115 breast mass lesions. The proficiencies of individual and various combinations of computed textures and geometric features were investigated by evaluating their contributions towards attaining higher classification accuracies. Four state-of-the-art filter-based feature selection algorithms (Relief-F, Pearson correlation coefficient, neighborhood component analysis, and term variance) were employed to select the top 20 most discriminative features. The Relief-F algorithm outperformed other feature selection algorithms in terms of classification results by reporting 85.2% accuracy, 82.0% sensitivity, and 88.0% specificity. A set of nine most discriminative features were then selected, out of the earlier mentioned 20 features obtained using Relief-F, as a result of further simulations. The classification performances of six state-of-the-art machine learning classifiers, namely k-nearest neighbor (k-NN), support vector machine, decision tree, Naive Bayes, random forest, and ensemble tree, were investigated, and the obtained results revealed that the best classification results (accuracy = 90.4%, sensitivity = 92.0%, specificity = 88.0%) were obtained for the k-NN classifier with the number of neighbors having k = 5 and squared inverse distance weight. The key findings include the identification of the nine most discriminative features, that is, FD26 (Fourier Descriptor), Euler number, solidity, mean, FD14, FD13, periodicity, skewness, and contrast out of a pool of 125 texture and geometric features. The proposed results revealed that the selected nine features can be used for the classification of breast masses in mammograms.

Download Full-text

COVID-19 Pneumonia Level Detection using Deep Learning Algorithm

10.36227/techrxiv.12619193 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kayhan Ghafoor

Keyword(s):

Lung Inflammation ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Second Phase ◽

K Nearest Neighbor ◽

Deep Learning Algorithm ◽

Severe Stage ◽

Efficient Detection ◽

Two Phases

Download Full-text

Accurate Detection and Quantization of Leaf- Diseases through Soft Computing

International Journal of Computational Physics Series ◽

10.29167/a1i1p236-247 ◽

2018 ◽

Vol 1 (1) ◽

pp. 236-247

Author(s):

Divya Srivastava ◽

Rajitha B. ◽

Suneeta Agarwal

Keyword(s):

Machine Learning ◽

Image Processing ◽

Agricultural Production ◽

Bacterial Blight ◽

Early Stage ◽

Second Phase ◽

Computationally Efficient ◽

Stage Of Disease ◽

Accurate Detection ◽

Two Phases

Diseases in leaves can cause the significant reduction in both quality and quantity of agricultural production. If early and accurate detection of disease/diseases in leaves can be automated, then the proper remedy can be taken timely. A simple and computationally efficient approach is presented in this paper for disease/diseases detection on leaves. Only detecting the disease is not beneficial without knowing the stage of disease thus the paper also determine the stage of disease/diseases by quantizing the affected of the leaves by using digital image processing and machine learning. Though there exists a variety of diseases on leaves, but the bacterial and fungal spots (Early Scorch, Late Scorch, and Leaf Spot) are the most prominent diseases found on leaves. Keeping this in mind the paper deals with the detection of Bacterial Blight and Fungal Spot both at an early stage (Early Scorch) and late stage (Late Scorch) on the variety of leaves. The proposed approach is divided into two phases, in the first phase, it identifies one or more disease/diseases existing on leaves. In the second phase, amount of area affected by the disease/diseases is calculated. The experimental results obtained showed 97% accuracy using the proposed approach.

Download Full-text