scholarly journals Feature Optimization of Exhaled Breath Signals Based on Pearson-BPSO

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Lijun Hao ◽  
Min Zhang ◽  
Gang Huang

Feature optimization, which is the theme of this paper, is actually the selective selection of the variables on the input side at the time of making a predictive kind of model. However, an improved feature optimization algorithm for breath signal based on the Pearson-BPSO was proposed and applied to distinguish hepatocellular carcinoma by electronic nose (eNose) in the paper. First, the multidimensional features of the breath curves of hepatocellular carcinoma patients and healthy controls in the training samples were extracted; then, the features with less relevance to the classification were removed according to the Pearson correlation coefficient; next, the fitness function was constructed based on K-Nearest Neighbor (KNN) classification error and feature dimension, and the feature optimization transformation matrix was obtained based on BPSO. Furthermore, the transformation matrix was applied to optimize the test sample’s features. Finally, the performance of the optimization algorithm was evaluated by the classifier. The experiment results have shown that the Pearson-BPSO algorithm could effectively improve the classification performance compared with BPSO and PCA optimization methods. The accuracy of SVM and RF classifier was 86.03% and 90%, respectively, and the sensitivity and specificity were about 90% and 80%. Consequently, the application of Pearson-BPSO feature optimization algorithm will help improve the accuracy of hepatocellular carcinoma detection by eNose and promote the clinical application of intelligent detection.

2008 ◽  
Vol 18 (06) ◽  
pp. 459-467 ◽  
Author(s):  
ROBERTO GIL-PITA ◽  
XIN YAO

The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method.


Author(s):  
Amit Saxena ◽  
John Wang

This paper presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the paper with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection.


Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1206
Author(s):  
Hui Xu ◽  
Krzysztof Przystupa ◽  
Ce Fang ◽  
Andrzej Marciniak ◽  
Orest Kochan ◽  
...  

With the widespread use of the Internet, network security issues have attracted more and more attention, and network intrusion detection has become one of the main security technologies. As for network intrusion detection, the original data source always has a high dimension and a large amount of data, which greatly influence the efficiency and the accuracy. Thus, both feature selection and the classifier then play a significant role in raising the performance of network intrusion detection. This paper takes the results of classification optimization of weighted K-nearest neighbor (KNN) with those of the feature selection algorithm into consideration, and proposes a combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN, in order to improve the performance of network intrusion detection. Experimental results show that the weighted KNN can increase the efficiency at the expense of a small amount of the accuracy. Thus, the proposed combination strategy of feature selection based on an integrated optimization algorithm and weighted KNN can then improve both the efficiency and the accuracy of network intrusion detection.


2022 ◽  
Vol 13 (1) ◽  
pp. 0-0

This research presents a way of feature selection problem for classification of sentiments that use ensemble-based classifier. This includes a hybrid approach of minimum redundancy and maximum relevance (mRMR) technique and Forest Optimization Algorithm (FOA) (i.e. mRMR-FOA) based feature selection. Before applying the FOA on sentiment analysis, it has been used as feature selection technique applied on 10 different classification datasets publically available on UCI machine learning repository. The classifiers for example k-Nearest Neighbor (k-NN), Support Vector Machine (SVM) and Naïve Bayes used the ensemble based algorithm for available datasets. The mRMR-FOA uses the Blitzer’s dataset (customer reviews on electronic products survey) to select the significant features. The classification of sentiments has noticed to improve by 12 to 18%. The evaluated results are further enhanced by the ensemble of k-NN, NB and SVM with an accuracy of 88.47% for the classification of sentiment analysis task.


2021 ◽  
Vol 6 (4) ◽  
Author(s):  
Aminat B. Yusuf ◽  
Ogar O. Austin ◽  
Shinaigo Y. Tadi ◽  
Fatsuma Jauro

Medical industry contains a large amount of sensitive data that must be evaluated in order to get insight into records. The nonlinearity, non-normality, correlation structures and complicated diabetic medical records, on the other hand, makes accurate predictions difficult. The Pima Indian Diabetes dataset is one of them, owing to the dataset's imbalance, large number of missing values and difficulty in identifying highly risk factors. Some of these challenges have been solved using computational approaches such as machine learning methods, but they have not performed ideally, with pre-processing techniques being recognized as critical to achieving correct findings. The goal of this work is to apply multiple pre-processing approaches to increase the accuracy of some simple models. These multiple pre-processing techniques are median imputation in which null values are substituted by finding the median of the input variables dependent on whether or not the patient is diabetic and then follow by applying oversampling and under-sampling procedures on both majority and minority votes. These votes are applied in order to address the problem of class imbalance as pointed out from the literature. Finally, the dimension reduction Pearson correlation is used to detect high-risk features since it is effective at quantifying information between attributes and their labels. In this study, these techniques are applied in the same order to Linear Regression, Naive Bayes, Decision Tree, K Nearest Neighbor, Random Forest and Gaussian Boosting classifiers. The utility of the techniques on the mentioned classifiers is validated using performance measures such as Accuracy, Precision and Recall.  The Random Forest Classifier is found to be the best-improved model, with 95 percent accuracy, 94.25 percent precision and 95.35 percent recall. Medical practitioners may find the provided strategies beneficial in improving the efficiency of diabetes analysis. Keywords— Classifiers, diabetes, Pima Indian Diabetes dataset, pre-processing techniques


2020 ◽  
Vol 12 (2) ◽  
pp. 168-175
Author(s):  
Sumarni Sumarni ◽  
Suhardi Rustam

Problems the Topic of the final project is a form of scientific writing that contains the results of observations from a study of the problems that occur with the use of methods related to the particular field of science. Every student in every program of study must draw up a final project. However, before embarking on writing the final project, each student must have the topic area as a destination, the step of selection the topic of final project is an initial step before working on the final task. One way to get the final task is to see the value of general courses as well as courses, concentration majors, the value of which dominate the is is decent to scope the research topic. this research is conducted on the application of the method of K-Nearest Neighbor (KNN) for categorization of the value of the courses of concentration for the coverage of the research topic, topic the entire value in the dataset will be classified by KNN and in the optimization with the Particle swarm Optimization algorithm (PSO). The experimental categorization of the final project is built with the training data Mahasiswa Universitas Ichsan Gorontalo that has been classified previously and test data derived from the entire value of the courses is not yet known categories. The results of the experiments, the value of the resulting accuracy of algorithms KNN, namely the value of the best accuracy with K=3, K Folds = 10 has an accuracy that is 72.46% and the Algorithm of KNN-PSO best accuracy with K=3, K Folds = 10 has an accuracy that is 89.86%, shows the accuracy is better by using the optimization algorithm


2022 ◽  
Vol 000 (000) ◽  
pp. 000-000
Author(s):  
Chuanli Liu ◽  
Hongli Yang ◽  
Yuemin Feng ◽  
Cuihong Liu ◽  
Fajuan Rui ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document