scholarly journals Adaptive Multi-level Backward Tracking for Sequential Feature Selection

2021 ◽  
Vol 15 (1) ◽  
pp. 1-20
Author(s):  
Knitchepon Chotchantarakun ◽  
Ohm Sornil

In the past few decades, the large amount of available data has become a major challenge in data mining and machine learning. Feature selection is a significant preprocessing step for selecting the most informative features by removing irrelevant and redundant features, especially for large datasets. These selected features play an important role in information searching and enhancing the performance of machine learning models. In this research, we propose a new technique called One-level Forward Multi-level Backward Selection (OFMB). The proposed algorithm consists of two phases. The first phase aims to create preliminarily selected subsets. The second phase provides an improvement on the previous result by an adaptive multi-level backward searching technique. Hence, the idea is to apply an improvement step during the feature addition and an adaptive search method on the backtracking step. We have tested our algorithm on twelve standard UCI datasets based on k-nearest neighbor and naive Bayes classifiers. Their accuracy was then compared with some popular methods. OFMB showed better results than the other sequential forward searching techniques for most of the tested datasets.

Author(s):  
Minh Tuan Le ◽  
Minh Thanh Vo ◽  
Nhat Tan Pham ◽  
Son V.T Dao

In the current health system, it is very difficult for medical practitioners/physicians to diagnose the effectiveness of heart contraction. In this research, we proposed a machine learning model to predict heart contraction using an artificial neural network (ANN). We also proposed a novel wrapper-based feature selection utilizing a grey wolf optimization (GWO) to reduce the number of required input attributes. In this work, we compared the results achieved using our method and several conventional machine learning algorithms approaches such as support vector machine, decision tree, K-nearest neighbor, naïve bayes, random forest, and logistic regression. Computational results show not only that much fewer features are needed, but also higher prediction accuracy can be achieved around 87%. This work has the potential to be applicable to clinical practice and become a supporting tool for doctors/physicians.


2020 ◽  
Vol 39 (5) ◽  
pp. 6205-6216
Author(s):  
Ramazan Algin ◽  
Ali Fuat Alkaya ◽  
Mustafa Agaoglu

Feature selection (FS) has become an essential task in overcoming high dimensional and complex machine learning problems. FS is a process used for reducing the size of the dataset by separating or extracting unnecessary and unrelated properties from it. This process improves the performance of classification algorithms and reduces the evaluation time by enabling the use of small sized datasets with useful features during the classification process. FS aims to gain a minimal feature subset in a problem domain while retaining the accuracy of the original data. In this study, four computational intelligence techniques, namely, migrating birds optimization (MBO), simulated annealing (SA), differential evolution (DE) and particle swarm optimization (PSO) are implemented for the FS problem as search algorithms and compared on the 17 well-known datasets taken from UCI machine learning repository where the dimension of the tackled datasets vary from 4 to 500. This is the first time that MBO is applied for solving the FS problem. In order to judge the quality of the subsets generated by the search algorithms, two different subset evaluation methods are implemented in this study. These methods are probabilistic consistency-based FS (PCFS) and correlation-based FS (CFS). Performance comparison of the algorithms is done by using three well-known classifiers; k-nearest neighbor, naive bayes and decision tree (C4.5). As a benchmark, the accuracy values found by classifiers using the datasets with all features are used. Results of the experiments show that our MBO-based filter approach outperforms the other three approaches in terms of accuracy values. In the experiments, it is also observed that as a subset evaluator CFS outperforms PCFS and as a classifier C4.5 gets better results when compared to k-nearest neighbor and naive bayes.


Author(s):  
Aqliima Aziz ◽  
Cik Feresa Mohd Foozy ◽  
Palaniappan Shamala ◽  
Zurinah Suradi

<p>Social networking such as YouTube, Facebook and others are very popular nowadays. The best thing about YouTube is user can subscribe also giving opinion on the comment section. However, this attract the spammer by spamming the comments on that videos. Thus, this study develop a YouTube detection framework by using Support Vector Machine (SVM) and K-Nearest Neighbor (k-NN). There are five (5) phases involved in this research such as Data Collection, Pre-processing, Feature Selection, Classification and Detection. The experiments is done by using Weka and RapidMiner. The accuracy result of SVM and KNN by using both machine learning tools show good accuracy result. Others solution to avoid spam attack is trying not to click the link on comments to avoid any problems.</p>


Author(s):  
Kamika Chaudhary ◽  
Neena Gupta

Web mining procedure helps the surfers to get the required information but finding the exact information is as good as finding a needle in a haystack. In this work, an intelligent prediction model using Tensor Flow environment for Graphics Processing Unit (GPU) devices has been designed to meet the challenges of speed and accuracy. The proposed approach is isolated into two stages: pre-processing and prediction. In the first phase, the procedure starts via looking through the URLs of various e-learning sites particular to computer science subjects. At that point, the content of looked through URLs are perused and after that from their keywords are produced identified with a particular subject in the wake of playing out the pre-processing of the content. Second phase is prediction that predicts query specific links of e-learning website. The proposed Intelligent E-learning through Web (IEW) has content mining, lexical analysis, classification and machine learning based prediction as its key features. Algorithms like SVM, Naïve Bayes, K-Nearest Neighbor, and Random Forest were tested and it was found that Random Forest gave an accuracy of 98.98%, SVM 42%, KNN 63% and Naïve Bayes 66%. Based on the results IEW uses Random forest for prediction.


Author(s):  
Kayhan Ghafoor

The first COVID-19 confirmed case is reported in Wuhan, China and spread across the globe with unprecedented impact on humanity. Since this pandemic requires pervasive diagnosis, it is significant to develop smart, fast and efficient detection technique. To this end, we developed an Artificial Intelligence (AI) engine to classify the lung inflammation level (mild, progressive, severe stage) of the COVID-19 confirmed patient. In particular, the developed model consists of two phases; in the first phase, we calculate the volume and density of lesions and opacities of the CT images of the confirmed COVID-19 patient using Morphological approaches. In the second phase, the second phase classifies the pneumonia level of the confirmed COVID-19 patient. To achieve precise classification of lung inflammation, we use modified Convolution Neural Network (CNN) and k-Nearest Neighbor (kNN). The result of the experiments show that the utilized models can provide the accuracy up to 95.65\% and 91.304 \% of CNN and kNN respectively.<br>


Author(s):  
Zoelkarnain Rinanda Tembusai ◽  
Herman Mawengkang ◽  
Muhammad Zarlis

This study analyzes the performance of the k-Nearest Neighbor method with the k-Fold Cross Validation algorithm as an evaluation model and the Analytic Hierarchy Process method as feature selection for the data classification process in order to obtain the best level of accuracy and machine learning model. The best test results are in fold-3, which is getting an accuracy rate of 95%. Evaluation of the k-Nearest Neighbor model with k-Fold Cross Validation can get a good machine learning model and the Analytic Hierarchy Process as a feature selection also gets optimal results and can reduce the performance of the k-Nearest Neighbor method because it only uses features that have been selected based on the level of importance for decision making.


2020 ◽  
Vol 5 (2) ◽  
pp. 153
Author(s):  
Rizki Tri Prasetio

Computer assisted medical diagnosis is a major machine learning problem being researched recently. General classifiers learn from the data itself through training process, due to the inexperience of an expert in determining parameters. This research proposes a methodology based on machine learning paradigm. Integrates the search heuristic that is inspired by natural evolution called genetic algorithm with the simplest and the most used learning algorithm, k-nearest Neighbor. The genetic algorithm were used for feature selection and parameter optimization while k-nearest Neighbor were used as a classifier. The proposed method is experimented on five benchmarked medical datasets from University California Irvine Machine Learning Repository and compared with original k-NN and other feature selection algorithm i.e., forward selection, backward elimination and greedy feature selection.  Experiment results show that the proposed method is able to achieve good performance with significant improvement with p value of t-Test is 0.0011.


Author(s):  
Harmandeep Singh ◽  
Vipul Sharma ◽  
Damanpreet Singh

AbstractThis paper introduces a comparative analysis of the proficiencies of various textures and geometric features in the diagnosis of breast masses on mammograms. An improved machine learning-based framework was developed for this study. The proposed system was tested using 106 full field digital mammography images from the INbreast dataset, containing a total of 115 breast mass lesions. The proficiencies of individual and various combinations of computed textures and geometric features were investigated by evaluating their contributions towards attaining higher classification accuracies. Four state-of-the-art filter-based feature selection algorithms (Relief-F, Pearson correlation coefficient, neighborhood component analysis, and term variance) were employed to select the top 20 most discriminative features. The Relief-F algorithm outperformed other feature selection algorithms in terms of classification results by reporting 85.2% accuracy, 82.0% sensitivity, and 88.0% specificity. A set of nine most discriminative features were then selected, out of the earlier mentioned 20 features obtained using Relief-F, as a result of further simulations. The classification performances of six state-of-the-art machine learning classifiers, namely k-nearest neighbor (k-NN), support vector machine, decision tree, Naive Bayes, random forest, and ensemble tree, were investigated, and the obtained results revealed that the best classification results (accuracy = 90.4%, sensitivity = 92.0%, specificity = 88.0%) were obtained for the k-NN classifier with the number of neighbors having k = 5 and squared inverse distance weight. The key findings include the identification of the nine most discriminative features, that is, FD26 (Fourier Descriptor), Euler number, solidity, mean, FD14, FD13, periodicity, skewness, and contrast out of a pool of 125 texture and geometric features. The proposed results revealed that the selected nine features can be used for the classification of breast masses in mammograms.


Author(s):  
Kayhan Ghafoor

The first COVID-19 confirmed case is reported in Wuhan, China and spread across the globe with unprecedented impact on humanity. Since this pandemic requires pervasive diagnosis, it is significant to develop smart, fast and efficient detection technique. To this end, we developed an Artificial Intelligence (AI) engine to classify the lung inflammation level (mild, progressive, severe stage) of the COVID-19 confirmed patient. In particular, the developed model consists of two phases; in the first phase, we calculate the volume and density of lesions and opacities of the CT images of the confirmed COVID-19 patient using Morphological approaches. In the second phase, the second phase classifies the pneumonia level of the confirmed COVID-19 patient. To achieve precise classification of lung inflammation, we use modified Convolution Neural Network (CNN) and k-Nearest Neighbor (kNN). The result of the experiments show that the utilized models can provide the accuracy up to 95.65\% and 91.304 \% of CNN and kNN respectively.<br>


2018 ◽  
Vol 1 (1) ◽  
pp. 236-247
Author(s):  
Divya Srivastava ◽  
Rajitha B. ◽  
Suneeta Agarwal

Diseases in leaves can cause the significant reduction in both quality and quantity of agricultural production. If early and accurate detection of disease/diseases in leaves can be automated, then the proper remedy can be taken timely. A simple and computationally efficient approach is presented in this paper for disease/diseases detection on leaves. Only detecting the disease is not beneficial without knowing the stage of disease thus the paper also determine the stage of disease/diseases by quantizing the affected of the leaves by using digital image processing and machine learning. Though there exists a variety of diseases on leaves, but the bacterial and fungal spots (Early Scorch, Late Scorch, and Leaf Spot) are the most prominent diseases found on leaves. Keeping this in mind the paper deals with the detection of Bacterial Blight and Fungal Spot both at an early stage (Early Scorch) and late stage (Late Scorch) on the variety of leaves. The proposed approach is divided into two phases, in the first phase, it identifies one or more disease/diseases existing on leaves. In the second phase, amount of area affected by the disease/diseases is calculated. The experimental results obtained showed 97% accuracy using the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document