The Construction of Support Vector Machine Classifier Using the Firefly Algorithm

Computational Intelligence and Neuroscience ◽

10.1155/2015/212719 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 21

Author(s):

Chih-Feng Chao ◽

Ming-Huwi Horng

Keyword(s):

Feature Selection ◽

Firefly Algorithm ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Lagrangian Multiplier ◽

Data Sets ◽

Smoothness Parameter ◽

Multiclass Svm ◽

The One

The setting of parameters in the support vector machines (SVMs) is very important with regard to its accuracy and efficiency. In this paper, we employ the firefly algorithm to train all parameters of the SVM simultaneously, including the penalty parameter, smoothness parameter, and Lagrangian multiplier. The proposed method is called the firefly-based SVM (firefly-SVM). This tool is not considered the feature selection, because the SVM, together with feature selection, is not suitable for the application in a multiclass classification, especially for the one-against-all multiclass SVM. In experiments, binary and multiclass classifications are explored. In the experiments on binary classification, ten of the benchmark data sets of the University of California, Irvine (UCI), machine learning repository are used; additionally the firefly-SVM is applied to the multiclass diagnosis of ultrasonic supraspinatus images. The classification performance of firefly-SVM is also compared to the original LIBSVM method associated with the grid search method and the particle swarm optimization based SVM (PSO-SVM). The experimental results advocate the use of firefly-SVM to classify pattern classifications for maximum accuracy.

Download Full-text

A Semisupervised Feature Selection with Support Vector Machine

Journal of Applied Mathematics ◽

10.1155/2013/416320 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Kun Dai ◽

Hong-Yi Yu ◽

Qing Li

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Classification Performance ◽

Learning Problems ◽

Support Vector ◽

Data Sets ◽

Alternating Direction ◽

Optimal Classification ◽

Highly Correlated

Feature selection has proved to be a beneficial tool in learning problems with the main advantages of interpretation and generalization. Most existing feature selection methods do not achieve optimal classification performance, since they neglect the correlations among highly correlated features which all contribute to classification. In this paper, a novel semisupervised feature selection algorithm based on support vector machine (SVM) is proposed, termed SENFS. In order to solve SENFS, an efficient algorithm based on the alternating direction method of multipliers is then developed. One advantage of SENFS is that it encourages highly correlated features to be selected or removed together. Experimental results demonstrate the effectiveness of our feature selection method on simulation data and benchmark data sets.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

Phybrata Sensors and Machine Learning for Enhanced Neurophysiological Diagnosis and Treatment

Sensors ◽

10.3390/s21217417 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7417

Author(s):

Alex J. Hope ◽

Utkarsh Vashisth ◽

Matthew J. Parker ◽

Andreas B. Ralston ◽

Joshua M. Roper ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Random Forest ◽

Binary Classification ◽

Classification Performance ◽

Support Vector ◽

Use Case ◽

Signal Features ◽

Test Population

Concussion injuries remain a significant public health challenge. A significant unmet clinical need remains for tools that allow related physiological impairments and longer-term health risks to be identified earlier, better quantified, and more easily monitored over time. We address this challenge by combining a head-mounted wearable inertial motion unit (IMU)-based physiological vibration acceleration (“phybrata”) sensor and several candidate machine learning (ML) models. The performance of this solution is assessed for both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments. Results are compared with previously reported approaches to ML-based concussion diagnostics. Using phybrata data from a previously reported concussion study population, four different machine learning models (Support Vector Machine, Random Forest Classifier, Extreme Gradient Boost, and Convolutional Neural Network) are first investigated for binary classification of the test population as healthy vs. concussion (Use Case 1). Results are compared for two different data preprocessing pipelines, Time-Series Averaging (TSA) and Non-Time-Series Feature Extraction (NTS). Next, the three best-performing NTS models are compared in terms of their multiclass prediction performance for specific concussion-related impairments: vestibular, neurological, both (Use Case 2). For Use Case 1, the NTS model approach outperformed the TSA approach, with the two best algorithms achieving an F1 score of 0.94. For Use Case 2, the NTS Random Forest model achieved the best performance in the testing set, with an F1 score of 0.90, and identified a wider range of relevant phybrata signal features that contributed to impairment classification compared with manual feature inspection and statistical data analysis. The overall classification performance achieved in the present work exceeds previously reported approaches to ML-based concussion diagnostics using other data sources and ML models. This study also demonstrates the first combination of a wearable IMU-based sensor and ML model that enables both binary classification of concussion patients and multiclass predictions of specific concussion-related neurophysiological impairments.

Download Full-text

Feature Selection Models Based on Hybrid Firefly Algorithm with Mutation Operator for Network Intrusion Detection

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.0228.19 ◽

2021 ◽

Vol 14 (1) ◽

pp. 192-202

Author(s):

Karrar Alwan ◽

◽

Ahmed AbuEl-Atta ◽

Hala Zayed ◽

◽

...

Keyword(s):

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Firefly Algorithm ◽

Detection System ◽

Binary Classification ◽

Feature Reduction ◽

Detection Accuracy ◽

Multi Classification ◽

Modified Firefly Algorithm

Accurate intrusion detection is necessary to preserve network security. However, developing efficient intrusion detection system is a complex problem due to the nonlinear nature of the intrusion attempts, the unpredictable behaviour of network traffic, and the large number features in the problem space. Hence, selecting the most effective and discriminating feature is highly important. Additionally, eliminating irrelevant features can improve the detection accuracy as well as reduce the learning time of machine learning algorithms. However, feature reduction is an NPhard problem. Therefore, several metaheuristics have been employed to determine the most effective feature subset within reasonable time. In this paper, two intrusion detection models are built based on a modified version of the firefly algorithm to achieve the feature selection task. The first and, the second models have been used for binary and multiclass classification, respectively. The modified firefly algorithm employed a mutation operation to avoid trapping into local optima through enhancing the exploration capabilities of the original firefly. The significance of the selected features is evaluated using a Naïve Bayes classifier over a benchmark standard dataset, which contains different types of attacks. The obtained results revealed the superiority of the modified firefly algorithm against the original firefly algorithm in terms of the classification accuracy and the number of selected features under different scenarios. Additionally, the results assured the superiority of the proposed intrusion detection system against other recently proposed systems in both binary classification and multi-classification scenarios. The proposed system has 96.51% and 96.942% detection accuracy in binary classification and multi-classification, respectively. Moreover, the proposed system reduced the number of attributes from 41 to 9 for binary classification and to 10 for multi-classification.

Download Full-text

On the parameter optimization of Support Vector Machines for binary classification

Journal of Integrative Bioinformatics ◽

10.1515/jib-2012-201 ◽

2012 ◽

Vol 9 (3) ◽

pp. 33-43 ◽

Cited By ~ 30

Author(s):

Paulo Gaspar ◽

Jaime Carbonell ◽

José Luís Oliveira

Keyword(s):

Support Vector Machines ◽

Binary Classification ◽

Classification Performance ◽

Biological Data ◽

Parameters Optimization ◽

Support Vector ◽

Minimal Risk ◽

Class Separation ◽

Vector Machines ◽

Analyse Data

Summary Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets.Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive.In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.

Download Full-text

Artificial bee colony algorithm for feature selection and improved support vector machine for text classification

Information Discovery and Delivery ◽

10.1108/idd-09-2018-0045 ◽

2019 ◽

Vol 47 (3) ◽

pp. 154-170

Author(s):

Janani Balakumar ◽

S. Vijayarani Mohan

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Classification ◽

Support Vector ◽

Data Sets ◽

Selection Algorithm ◽

Data Set ◽

Content Type ◽

Benchmark Data ◽

Bee Colony

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.

Download Full-text

Modified Firefly Algorithm With Chaos Theory for Feature Selection

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2019040101 ◽

2019 ◽

Vol 10 (2) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Sujata Dash ◽

Ruppa Thulasiram ◽

Parimala Thulasiraman

Keyword(s):

Feature Selection ◽

Firefly Algorithm ◽

Optimization Problems ◽

Search Algorithm ◽

Heuristic Method ◽

High Dimensional ◽

Support Vector ◽

Combinatorial Optimization Problems ◽

Meta Search ◽

Combined Algorithm

Conventional algorithms such as gradient-based optimization methods usually struggle to deal with high-dimensional non-linear problems and often land up with local minima. Recently developed nature-inspired optimization algorithms are the best approaches for finding global solutions for combinatorial optimization problems like microarray datasets. In this article, a novel hybrid swarm intelligence-based meta-search algorithm is proposed by combining a heuristic method called conditional mutual information maximization with chaos-based firefly algorithm. The combined algorithm is computed in an iterative manner to boost the sharing of information between fireflies, enhancing the search efficiency of chaos-based firefly algorithm and reduces the computational complexities of feature selection. The meta-search model is implemented using a well-established classifier, such as support vector machine as the modeler in a wrapper approach. The chaos-based firefly algorithm increases the global search mobility of fireflies. The efficiency of the model is studied over high-dimensional disease datasets and compared with standard firefly algorithm, particle swarm optimization, and genetic algorithm in the same experimental environment to establish its superiority of feature selection over selected counterparts.

Download Full-text

Feature Selection for Ordinal Text Classification

Neural Computation ◽

10.1162/neco_a_00558 ◽

2014 ◽

Vol 26 (3) ◽

pp. 557-591 ◽

Cited By ~ 23

Author(s):

Stefano Baccianella ◽

Andrea Esuli ◽

Fabrizio Sebastiani

Keyword(s):

Feature Selection ◽

Supervised Learning ◽

Opinion Mining ◽

Rating Scale ◽

Learning Algorithms ◽

Learning Task ◽

Support Vector ◽

Data Sets ◽

Product Review ◽

Ordinal Classification

Ordinal classification (also known as ordinal regression) is a supervised learning task that consists of estimating the rating of a data item on a fixed, discrete rating scale. This problem is receiving increased attention from the sentiment analysis and opinion mining community due to the importance of automatically rating large amounts of product review data in digital form. As in other supervised learning tasks such as binary or multiclass classification, feature selection is often needed in order to improve efficiency and avoid overfitting. However, although feature selection has been extensively studied for other classification tasks, it has not for ordinal classification. In this letter, we present six novel feature selection methods that we have specifically devised for ordinal classification and test them on two data sets of product review data against three methods previously known from the literature, using two learning algorithms from the support vector regression tradition. The experimental results show that all six proposed metrics largely outperform all three baseline techniques (and are more stable than these others by an order of magnitude), on both data sets and for both learning algorithms.

Download Full-text

Large Margin Feature Selection for Support Vector Machine

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.274.161 ◽

2013 ◽

Vol 274 ◽

pp. 161-164 ◽

Cited By ~ 1

Author(s):

Wei Pan ◽

Pei Jun Ma ◽

Xiao Hong Su

Keyword(s):

Feature Selection ◽

Optimal Solution ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

L1 Norm ◽

A Algorithm ◽

Regularization Technique ◽

Feature Weight ◽

Selection For

Feature selection is an preprocessing step in pattern analysis and machine learning. In this paper, we design a algorithm for feature subset. We present L1-norm regularization technique for sparse feature weight. Margin loss are introduced to evaluate features, and we employs gradient descent to search the optimal solution to maximize margin. The proposed technique is tested on UCI data sets. Compared with four margin based loss functions for SVM, the proposed technique is effective and efficient.

Download Full-text

Automatic recognition of self-acknowledged limitations in clinical research literature

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocy038 ◽

2018 ◽

Vol 25 (7) ◽

pp. 855-861 ◽

Cited By ~ 4

Author(s):

Halil Kilicoglu ◽

Graciela Rosemblat ◽

Mario Malički ◽

Gerben ter Riet

Keyword(s):

Machine Learning ◽

Clinical Research ◽

Binary Classification ◽

Classification Performance ◽

Research Literature ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Rule Based ◽

Research Transparency

Abstract Objective To automatically recognize self-acknowledged limitations in clinical research publications to support efforts in improving research transparency. Methods To develop our recognition methods, we used a set of 8431 sentences from 1197 PubMed Central articles. A subset of these sentences was manually annotated for training/testing, and inter-annotator agreement was calculated. We cast the recognition problem as a binary classification task, in which we determine whether a given sentence from a publication discusses self-acknowledged limitations or not. We experimented with three methods: a rule-based approach based on document structure, supervised machine learning, and a semi-supervised method that uses self-training to expand the training set in order to improve classification performance. The machine learning algorithms used were logistic regression (LR) and support vector machines (SVM). Results Annotators had good agreement in labeling limitation sentences (Krippendorff’s α = 0.781). Of the three methods used, the rule-based method yielded the best performance with 91.5% accuracy (95% CI [90.1-92.9]), while self-training with SVM led to a small improvement over fully supervised learning (89.9%, 95% CI [88.4-91.4] vs 89.6%, 95% CI [88.1-91.1]). Conclusions The approach presented can be incorporated into the workflows of stakeholders focusing on research transparency to improve reporting of limitations in clinical studies.

Download Full-text