Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm

Mehreen Naz; Kashif Zafar; Ayesha Khan

doi:10.3390/data4020076

Ensemble Based Classification of Sentiments Using Forest Optimization Algorithm

Data ◽

10.3390/data4020076 ◽

2019 ◽

Vol 4 (2) ◽

pp. 76 ◽

Cited By ~ 2

Author(s):

Mehreen Naz ◽

Kashif Zafar ◽

Ayesha Khan

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Support Vector ◽

Feature Subset ◽

Hybrid Technique ◽

Computational Performance

Feature subset selection is a process to choose a set of relevant features from a high dimensionality dataset to improve the performance of classifiers. The meaningful words extracted from data forms a set of features for sentiment analysis. Many evolutionary algorithms, like the Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), have been applied to feature subset selection problem and computational performance can still be improved. This research presents a solution to feature subset selection problem for classification of sentiments using ensemble-based classifiers. It consists of a hybrid technique of minimum redundancy and maximum relevance (mRMR) and Forest Optimization Algorithm (FOA)-based feature selection. Ensemble-based classification is implemented to optimize the results of individual classifiers. The Forest Optimization Algorithm as a feature selection technique has been applied to various classification datasets from the UCI machine learning repository. The classifiers used for ensemble methods for UCI repository datasets are the k-Nearest Neighbor (k-NN) and Naïve Bayes (NB). For the classification of sentiments, 15–20% improvement has been recorded. The dataset used for classification of sentiments is Blitzer’s dataset consisting of reviews of electronic products. The results are further improved by ensemble of k-NN, NB, and Support Vector Machine (SVM) with an accuracy of 95% for the classification of sentiment tasks.

Download Full-text

A novel feature selection algorithm based on damping oscillation theory

PLoS ONE ◽

10.1371/journal.pone.0255307 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255307

Author(s):

Fujun Wang ◽

Xing Wang

Keyword(s):

Feature Selection ◽

Optimization Algorithm ◽

Euclidean Distance ◽

Oscillation Theory ◽

Feature Subset Selection ◽

Support Vector ◽

Data Sets ◽

Feature Subset ◽

Selection Algorithm ◽

Filter Model

Feature selection is an important task in big data analysis and information retrieval processing. It reduces the number of features by removing noise, extraneous data. In this paper, one feature subset selection algorithm based on damping oscillation theory and support vector machine classifier is proposed. This algorithm is called the Maximum Kendall coefficient Maximum Euclidean Distance Improved Gray Wolf Optimization algorithm (MKMDIGWO). In MKMDIGWO, first, a filter model based on Kendall coefficient and Euclidean distance is proposed, which is used to measure the correlation and redundancy of the candidate feature subset. Second, the wrapper model is an improved grey wolf optimization algorithm, in which its position update formula has been improved in order to achieve optimal results. Third, the filter model and the wrapper model are dynamically adjusted by the damping oscillation theory to achieve the effect of finding an optimal feature subset. Therefore, MKMDIGWO achieves both the efficiency of the filter model and the high precision of the wrapper model. Experimental results on five UCI public data sets and two microarray data sets have demonstrated the higher classification accuracy of the MKMDIGWO algorithm than that of other four state-of-the-art algorithms. The maximum ACC value of the MKMDIGWO algorithm is at least 0.5% higher than other algorithms on 10 data sets.

Download Full-text

An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset

The Scientific World JOURNAL ◽

10.1155/2015/821798 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Senthilkumar Devaraj ◽

S. Paulraj

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Complex Nature ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Multidimensional Datasets ◽

Study Results

Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets.

Download Full-text

Feature Subset Selection Problem using Wrapper Approach in Supervised Learning

International Journal of Computer Applications ◽

10.5120/169-295 ◽

2010 ◽

Vol 1 (7) ◽

pp. 13-17 ◽

Cited By ~ 42

Author(s):

Asha Gowda Karegowda ◽

A.S. Manjunath ◽

M.A. Jayaram

Keyword(s):

Supervised Learning ◽

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Feature Subset ◽

Wrapper Approach

Download Full-text

Feature subset selection for support vector machines by incremental regularized risk minimization

2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541) ◽

10.1109/ijcnn.2004.1380930 ◽

2005 ◽

Cited By ~ 5

Author(s):

H. Frohlich ◽

A. Zell

Keyword(s):

Support Vector Machines ◽

Subset Selection ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

Risk Minimization ◽

Vector Machines ◽

Selection For ◽

Regularized Risk Minimization

Download Full-text

Improved Intrusion Detection Algorithm based on TLBO and GA Algorithms

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/2/5 ◽

2021 ◽

Vol 18 (2) ◽

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Optimization Algorithm ◽

Feature Subset Selection ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Feature Subset ◽

Teaching Learning Based Optimization ◽

Teaching Learning

Optimization algorithms are widely used for the identification of intrusion. This is attributable to the increasing number of audit data features and the decreasing performance of human-based smart Intrusion Detection Systems (IDS) regarding classification accuracy and training time. In this paper, an improved method for intrusion detection for binary classification was presented and discussed in detail. The proposed method combined the New Teaching-Learning-Based Optimization Algorithm (NTLBO), Support Vector Machine (SVM), Extreme Learning Machine (ELM), and Logistic Regression (LR) (feature selection and weighting) NTLBO algorithm with supervised machine learning techniques for Feature Subset Selection (FSS). The process of selecting the least number of features without any effect on the result accuracy in FSS was considered a multi-objective optimization problem. The NTLBO was proposed in this paper as an FSS mechanism; its algorithm-specific, parameter-less concept (which requires no parameter tuning during an optimization) was explored. The experiments were performed on the prominent intrusion machine-learning datasets (KDDCUP’99 and CICIDS 2017), where significant enhancements were observed with the suggested NTLBO algorithm as compared to the classical Teaching-Learning-Based Optimization algorithm (TLBO), NTLBO presented better results than TLBO and many existing works. The results showed that NTLBO reached 100% accuracy for KDDCUP’99 dataset and 97% for CICIDS dataset

Download Full-text

A New Hybrid Feature Subset Selection Framework Based on Binary Genetic Algorithm and Information Theory

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026819500202 ◽

2019 ◽

Vol 18 (03) ◽

pp. 1950020 ◽

Cited By ~ 13

Author(s):

Alok Kumar Shukla ◽

Pradeep Singh ◽

Manu Vardhan

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Classification Accuracy ◽

B Cell Lymphoma ◽

Feature Subset Selection ◽

Classification Model ◽

Significant Feature ◽

Support Vector ◽

Feature Subset ◽

Binary Genetic Algorithm

The explosion of the high-dimensional dataset in the scientific repository has been encouraging interdisciplinary research on data mining, pattern recognition and bioinformatics. The fundamental problem of the individual Feature Selection (FS) method is extracting informative features for classification model and to seek for the malignant disease at low computational cost. In addition, existing FS approaches overlook the fact that for a given cardinality, there can be several subsets with similar information. This paper introduces a novel hybrid FS algorithm, called Filter-Wrapper Feature Selection (FWFS) for a classification problem and also addresses the limitations of existing methods. In the proposed model, the front-end filter ranking method as Conditional Mutual Information Maximization (CMIM) selects the high ranked feature subset while the succeeding method as Binary Genetic Algorithm (BGA) accelerates the search in identifying the significant feature subsets. One of the merits of the proposed method is that, unlike an exhaustive method, it speeds up the FS procedure without lancing of classification accuracy on reduced dataset when a learning model is applied to the selected subsets of features. The efficacy of the proposed (FWFS) method is examined by Naive Bayes (NB) classifier which works as a fitness function. The effectiveness of the selected feature subset is evaluated using numerous classifiers on five biological datasets and five UCI datasets of a varied dimensionality and number of instances. The experimental results emphasize that the proposed method provides additional support to the significant reduction of the features and outperforms the existing methods. For microarray data-sets, we found the lowest classification accuracy is 61.24% on SRBCT dataset and highest accuracy is 99.32% on Diffuse large B-cell lymphoma (DLBCL). In UCI datasets, the lowest classification accuracy is 40.04% on the Lymphography using k-nearest neighbor (k-NN) and highest classification accuracy is 99.05% on the ionosphere using support vector machine (SVM).

Download Full-text

Addressing Low Dimensionality Feature Subset Selection: ReliefF(-k) or Extended Correlation-Based Feature Selection(eCFS)?

Advances in Intelligent Systems and Computing - 14th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2019) ◽

10.1007/978-3-030-20055-8_24 ◽

2019 ◽

pp. 251-260 ◽

Cited By ~ 1

Author(s):

Antonio J. Tallón-Ballesteros ◽

Luís Cavique ◽

Simon Fong

Keyword(s):

Feature Selection ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Low Dimensionality ◽

Correlation Based Feature Selection

Download Full-text

Solving feature subset selection problem by a Parallel Scatter Search

European Journal of Operational Research ◽

10.1016/j.ejor.2004.08.010 ◽

2006 ◽

Vol 169 (2) ◽

pp. 477-489 ◽

Cited By ~ 119

Author(s):

Félix Garcı́a López ◽

Miguel Garcı́a Torres ◽

Belén Melián Batista ◽

José A. Moreno Pérez ◽

J. Marcos Moreno-Vega

Keyword(s):

Scatter Search ◽

Subset Selection ◽

Feature Subset Selection ◽

Selection Problem ◽

Feature Subset

Download Full-text

Using a Feature Subset Selection method and Support Vector Machine to address curse of dimensionality and redundancy in Hyperion hyperspectral data classification

The Egyptian Journal of Remote Sensing and Space Science ◽

10.1016/j.ejrs.2017.02.003 ◽

2018 ◽

Vol 21 (1) ◽

pp. 27-36 ◽

Cited By ~ 7

Author(s):

Amir Salimi ◽

Mansour Ziaii ◽

Ali Amiri ◽

Mahdieh Hosseinjani Zadeh ◽

Sadegh Karimpouli ◽

...

Keyword(s):

Support Vector Machine ◽

Subset Selection ◽

Data Classification ◽

Curse Of Dimensionality ◽

Selection Method ◽

Hyperspectral Data ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset

Download Full-text

Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines

Knowledge and Information Systems ◽

10.1007/s10115-018-1185-y ◽

2018 ◽

Vol 58 (1) ◽

pp. 139-167 ◽

Cited By ~ 17

Author(s):

Syed Muhammad Saqlain ◽

Muhammad Sher ◽

Faiz Ali Shah ◽

Imran Khan ◽

Muhammad Usman Ashraf ◽

...

Keyword(s):

Matthews Correlation Coefficient ◽

Subset Selection ◽

Disease Diagnosis ◽

Feature Subset Selection ◽

Support Vector ◽

Feature Subset ◽

Fisher Score ◽

Vector Machines ◽

Selection For ◽

Heart Disease Diagnosis

Download Full-text