Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter

AbstractThe last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall).

Download Full-text

Sparse Least Squares Support Vector Machines Based on Genetic Algorithms: A Feature Selection Approach

Advances in Computational Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-20518-8_42 ◽

2019 ◽

pp. 500-511

Author(s):

Pedro Hericson Machado Araújo ◽

Ajalmar R. Rocha Neto

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Least Squares ◽

Support Vector ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.2 ◽

2021 ◽

Vol 9 ◽

pp. 1255

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Selection Process ◽

Area Under The Curve ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. SVM-RCE-R, further enhances the capabilities of SVM-RCE by the addition of a novel user specified ranking function. This ranking function enables the user to stipulate the weights of the accuracy, sensitivity, specificity, f-measure, area under the curve and the precision in the ranking function This flexibility allows the user to select for greater sensitivity or greater specificity as needed for a specific project. The usefulness of SVM-RCE-R is further supported by development of the maTE tool which uses a similar approach to identify microRNA (miRNA) targets. We have also now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to applyThe use of SVM-RCE-R in Knime is simple and intuitive and allows researchers to immediately begin their analysis without having to consult an information technology specialist. The input for the Knime implemented tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in SVM-RCE. The results show that the inclusion of the ranking function has a significant impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics.

Download Full-text

Optimization of the ANOVA Procedure for Support Vector Machines

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7375.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 5160-5165

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Execution Time ◽

Computing Time ◽

Mixed Integer ◽

Support Vector ◽

Support Vector Classifier ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

Feature selection is a powerful tool to identify the important characteristics of data for prediction. Feature selection, therefore, can be a tool for avoiding overfitting, improving prediction accuracy and reducing execution time. The applications of feature selection procedures are particularly important in Support vector machines, which is used for prediction in large datasets. The larger the dataset, the more computationally exhaustive and challenging it is to build a predictive model using the support vector classifier. This paper investigates how the feature selection approach based on the analysis of variance (ANOVA) can be optimized for Support Vector Machines (SVMs) to improve its execution time and accuracy. We introduce new conditions on the SVMs prior to running the ANOVA to optimize the performance of the support vector classifier. We also establish the bootstrap procedure as alternative to cross validation to perform model selection. We run our experiments using popular datasets and compare our results to existing modifications of SVMs with feature selection procedure. We propose a number of ANOVA-SVM modifications which are simple to perform, while at the same time, boost significantly the accuracy and computing time of the SVMs in comparison to existing methods like the Mixed Integer Linear Feature Selection approach.

Download Full-text

A nonlinear support vector machine‐based feature selection approach for fault detection and diagnosis: Application to the Tennessee Eastman process

AIChE Journal ◽

10.1002/aic.16497 ◽

2019 ◽

Vol 65 (3) ◽

pp. 992-1005 ◽

Cited By ~ 14

Author(s):

Melis Onel ◽

Chris A. Kieslich ◽

Efstratios N. Pistikopoulos

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Fault Detection ◽

Fault Detection And Diagnosis ◽

Support Vector ◽

Tennessee Eastman Process ◽

Selection Approach ◽

Detection And Diagnosis ◽

Feature Selection Approach ◽

Nonlinear Support

Download Full-text

SVR-FFS: A novel forward feature selection approach for high-frequency time series forecasting using support vector regression

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113729 ◽

2020 ◽

Vol 160 ◽

pp. 113729 ◽

Cited By ~ 2

Author(s):

José Manuel Valente ◽

Sebastián Maldonado

Keyword(s):

Time Series ◽

Feature Selection ◽

Support Vector Regression ◽

High Frequency ◽

Time Series Forecasting ◽

Support Vector ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Prediction of human disease-associated phosphorylation sites with combined feature selection approach and support vector machine

IET Systems Biology ◽

10.1049/iet-syb.2014.0051 ◽

2015 ◽

Vol 9 (4) ◽

pp. 155-163 ◽

Cited By ~ 9

Author(s):

Xiaoyi Xu ◽

Ao Li ◽

Minghui Wang

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Human Disease ◽

Support Vector ◽

Phosphorylation Sites ◽

Selection Approach ◽

Feature Selection Approach ◽

Combined Feature

Download Full-text

FEATURE SELECTION FOR SUPPORT VECTOR MACHINES USING GENETIC ALGORITHMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213004001818 ◽

2004 ◽

Vol 13 (04) ◽

pp. 791-800 ◽

Cited By ~ 26

Author(s):

HOLGER FRÖHLICH ◽

OLIVIER CHAPELLE ◽

BERNHARD SCHÖLKOPF

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Cross Validation ◽

Support Vector ◽

Generalization Error ◽

New Approach ◽

Vector Machines ◽

Selection For ◽

Natural Way

The problem of feature selection is a difficult combinatorial task in Machine Learning and of high practical relevance, e.g. in bioinformatics. Genetic Algorithms (GAs) offer a natural way to solve this problem. In this paper we present a special Genetic Algorithm, which especially takes into account the existing bounds on the generalization error for Support Vector Machines (SVMs). This new approach is compared to the traditional method of performing cross-validation and to other existing algorithms for feature selection.

Download Full-text

A Composite Hybrid Feature Selection Learning-Based Optimization of Genetic Algorithm For Breast Cancer Detection

10.20944/preprints202003.0298.v1 ◽

2020 ◽

Author(s):

Ahmed Abdullah Farid ◽

Gamal Selim ◽

Hatem Khater

Keyword(s):

Breast Cancer ◽

Genetic Algorithm ◽

Feature Selection ◽

Early Stage ◽

Fitness Function ◽

Support Vector ◽

Initial Population ◽

Tree Classifier ◽

Selection Approach ◽

Feature Selection Approach

Breast cancer is a significant health issue across the world. Breast cancer is the most widely-diagnosed cancer in women; early-stage diagnosis of disease and therapies increase patient safety. This paper proposes a synthetic model set of features focused on the optimization of the genetic algorithm (CHFS-BOGA) to forecast breast cancer. This hybrid feature selection approach combines the advantages of three filter feature selection approaches with an optimize Genetic Algorithm (OGA) to select the best features to improve the performance of the classification process and scalability. We propose OGA by improving the initial population generating and genetic operators using the results of filter approaches as some prior information with using the C4.5 decision tree classifier as a fitness function instead of probability and random selection. The authors collected available updated data from Wisconsin UCI machine learning with a total of 569 rows and 32 columns. The dataset evaluated using an explorer set of weka data mining open-source software for the analysis purpose. The results show that the proposed hybrid feature selection approach significantly outperforms the single filter approaches and principal component analysis (PCA) for optimum feature selection. These characteristics are good indicators for the return prediction. The highest accuracy achieved with the proposed system before (CHFS-BOGA) using the support vector machine (SVM) classifiers was 97.3%. The highest accuracy after (CHFS-BOGA-SVM) was 98.25% on split 70.0% train, remainder test, and 100% on the full training set. Moreover, the receiver operating characteristic (ROC) curve was equal to 1.0. The results showed that the proposed (CHFS-BOGA-SVM) system was able to accurately classify the type of breast tumor, whether malignant or benign.

Download Full-text

Feature Selection for Bankruptcy Prediction

Nature-Inspired Computing Design, Development, and Applications ◽

10.4018/978-1-4666-1574-8.ch009 ◽

2012 ◽

pp. 158-178

Author(s):

A. Gaspar-Cunha ◽

F. Mendes ◽

J. Duarte ◽

A. Vieira ◽

B. Ribeiro ◽

...

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Financial Statements ◽

Bankruptcy Prediction ◽

Decision Makers ◽

Support Vector ◽

Financial Health ◽

Vector Machines ◽

Feature Selection Approach ◽

A Company

In this work a Multi-Objective Evolutionary Algorithm (MOEA) was applied for feature selection in the problem of bankruptcy prediction. This algorithm maximizes the accuracy of the classifier while keeping the number of features low. A two-objective problem, that is minimization of the number of features and accuracy maximization, was fully analyzed using the Logistic Regression (LR) and Support Vector Machines (SVM) classifiers. Simultaneously, the parameters required by both classifiers were also optimized, and the validity of the methodology proposed was tested using a database containing financial statements of 1200 medium sized private French companies. Based on extensive tests, it is shown that MOEA is an efficient feature selection approach. Best results were obtained when both the accuracy and the classifiers parameters are optimized. The proposed method can provide useful information for decision makers in characterizing the financial health of a company.

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.1 ◽

2020 ◽

Vol 9 ◽

pp. 1255 ◽

Cited By ~ 1

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Simple Structure ◽

Selection Process ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. The usefulness of SVM-RCE-R is further supported by development of maTE tool, which uses a similar approach to identify microRNA (miRNA) targets. We have now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to apply and to make it more accessible to the biomedical community. The use of SVM-RCE-R in Knime is simple and intuitive, allowing researchers to immediately begin their data analysis without having to consult an information technology specialist. The input for the Knime tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in the previous version. One of these features is a user-specific ranking function that enables the user to provide the weights of the accuracy, sensitivity, specificity, f-measure, area under curve and precision in the ranking function, allowing the user to select for greater sensitivity or greater specificity as needed. The results show that the ranking function has an impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics. This finding motivates future studies to suggest the optimal ranking function.

Download Full-text