Feature Selection for Bankruptcy Prediction

In this work a Multi-Objective Evolutionary Algorithm (MOEA) was applied for feature selection in the problem of bankruptcy prediction. This algorithm maximizes the accuracy of the classifier while keeping the number of features low. A two-objective problem, that is minimization of the number of features and accuracy maximization, was fully analyzed using the Logistic Regression (LR) and Support Vector Machines (SVM) classifiers. Simultaneously, the parameters required by both classifiers were also optimized, and the validity of the methodology proposed was tested using a database containing financial statements of 1200 medium sized private French companies. Based on extensive tests, it is shown that MOEA is an efficient feature selection approach. Best results were obtained when both the accuracy and the classifiers parameters are optimized. The proposed method can provide useful information for decision makers in characterizing the financial health of a company.

Download Full-text

Bankruptcy Prediction of Engineering Companies in the EU Using Classification Methods

Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis ◽

10.11118/actaun201866051347 ◽

2018 ◽

Vol 66 (5) ◽

pp. 1347-1356 ◽

Cited By ~ 1

Author(s):

Michaela Staňková ◽

David Hampel

Keyword(s):

Logistic Regression ◽

Support Vector Machines ◽

Binary Classification ◽

Classification Tree ◽

Classification Trees ◽

Bankruptcy Prediction ◽

Support Vector ◽

Type I ◽

Vector Machines ◽

The Eu

This article focuses on the problem of binary classification of 902 small- and medium‑sized engineering companies active in the EU, together with additional 51 companies which went bankrupt in 2014. For classification purposes, the basic statistical method of logistic regression has been selected, together with a representative of machine learning (support vector machines and classification trees method) to construct models for bankruptcy prediction. Different settings have been tested for each method. Furthermore, the models were estimated based on complete data and also using identified artificial factors. To evaluate the quality of prediction we observe not only the total accuracy with the type I and II errors but also the area under ROC curve criterion. The results clearly show that increasing distance to bankruptcy decreases the predictive ability of all models. The classification tree method leads us to rather simple models. The best classification results were achieved through logistic regression based on artificial factors. Moreover, this procedure provides good and stable results regardless of other settings. Artificial factors also seem to be a suitable variable for support vector machines models, but classification trees achieved better results using original data.

Download Full-text

Sparse Least Squares Support Vector Machines Based on Genetic Algorithms: A Feature Selection Approach

Advances in Computational Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-20518-8_42 ◽

2019 ◽

pp. 500-511

Author(s):

Pedro Hericson Machado Araújo ◽

Ajalmar R. Rocha Neto

Keyword(s):

Genetic Algorithms ◽

Feature Selection ◽

Support Vector Machines ◽

Least Squares ◽

Support Vector ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

Download Full-text

Bankruptcy Prediction Using Support Vector Machines and Feature Selection During the Recent Financial Crisis

International Journal of Economics and Finance ◽

10.5539/ijef.v7n8p182 ◽

2015 ◽

Vol 7 (8) ◽

Cited By ~ 4

Author(s):

Umberto Dellepiane ◽

Michele Di Marcantonio ◽

Enrico Laghi ◽

Stefania Renzi

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Financial Crisis ◽

Bankruptcy Prediction ◽

Support Vector ◽

Vector Machines ◽

Recent Financial Crisis

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.1 ◽

2020 ◽

Vol 9 ◽

pp. 1255 ◽

Cited By ~ 1

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Simple Structure ◽

Selection Process ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. The usefulness of SVM-RCE-R is further supported by development of maTE tool, which uses a similar approach to identify microRNA (miRNA) targets. We have now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to apply and to make it more accessible to the biomedical community. The use of SVM-RCE-R in Knime is simple and intuitive, allowing researchers to immediately begin their data analysis without having to consult an information technology specialist. The input for the Knime tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in the previous version. One of these features is a user-specific ranking function that enables the user to provide the weights of the accuracy, sensitivity, specificity, f-measure, area under curve and precision in the ranking function, allowing the user to select for greater sensitivity or greater specificity as needed. The results show that the ranking function has an impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics. This finding motivates future studies to suggest the optimal ranking function.

Download Full-text

SVM-Based Credit Rating and Feature Selection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.618.573 ◽

2014 ◽

Vol 618 ◽

pp. 573-577 ◽

Cited By ~ 1

Author(s):

Yu Qiang Qin ◽

Yu Dong Qi ◽

Hui Ying

Keyword(s):

Logistic Regression ◽

Feature Selection ◽

Financial Institutions ◽

Credit Card ◽

Credit Rating ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Vector Machines ◽

Reference Agency

The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit rating for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines (SVM) against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.

Download Full-text

Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

F1000Research ◽

10.12688/f1000research.26880.2 ◽

2021 ◽

Vol 9 ◽

pp. 1255

Author(s):

Malik Yousef ◽

Burcu Bakir-Gungor ◽

Amhar Jabeer ◽

Gokhan Goy ◽

Rehman Qureshi ◽

...

Keyword(s):

Feature Selection ◽

Selection Process ◽

Area Under The Curve ◽

Ranking Function ◽

Support Vector ◽

Scientific Publications ◽

Vector Machines ◽

Feature Selection Approach ◽

Sensitivity Specificity ◽

Excel File

In our earlier study, we proposed a novel feature selection approach, Recursive Cluster Elimination with Support Vector Machines (SVM-RCE) and implemented this approach in Matlab. Interest in this approach has grown over time and several researchers have incorporated SVM-RCE into their studies, resulting in a substantial number of scientific publications. This increased interest encouraged us to reconsider how feature selection, particularly in biological datasets, can benefit from considering the relationships of those genes in the selection process, this led to our development of SVM-RCE-R. SVM-RCE-R, further enhances the capabilities of SVM-RCE by the addition of a novel user specified ranking function. This ranking function enables the user to stipulate the weights of the accuracy, sensitivity, specificity, f-measure, area under the curve and the precision in the ranking function This flexibility allows the user to select for greater sensitivity or greater specificity as needed for a specific project. The usefulness of SVM-RCE-R is further supported by development of the maTE tool which uses a similar approach to identify microRNA (miRNA) targets. We have also now implemented the SVM-RCE-R algorithm in Knime in order to make it easier to applyThe use of SVM-RCE-R in Knime is simple and intuitive and allows researchers to immediately begin their analysis without having to consult an information technology specialist. The input for the Knime implemented tool is an EXCEL file (or text or CSV) with a simple structure and the output is also an EXCEL file. The Knime version also incorporates new features not available in SVM-RCE. The results show that the inclusion of the ranking function has a significant impact on the performance of SVM-RCE-R. Some of the clusters that achieve high scores for a specified ranking can also have high scores in other metrics.

Download Full-text

Optimization of the ANOVA Procedure for Support Vector Machines

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7375.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 5160-5165

Keyword(s):

Feature Selection ◽

Support Vector Machines ◽

Execution Time ◽

Computing Time ◽

Mixed Integer ◽

Support Vector ◽

Support Vector Classifier ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

Feature selection is a powerful tool to identify the important characteristics of data for prediction. Feature selection, therefore, can be a tool for avoiding overfitting, improving prediction accuracy and reducing execution time. The applications of feature selection procedures are particularly important in Support vector machines, which is used for prediction in large datasets. The larger the dataset, the more computationally exhaustive and challenging it is to build a predictive model using the support vector classifier. This paper investigates how the feature selection approach based on the analysis of variance (ANOVA) can be optimized for Support Vector Machines (SVMs) to improve its execution time and accuracy. We introduce new conditions on the SVMs prior to running the ANOVA to optimize the performance of the support vector classifier. We also establish the bootstrap procedure as alternative to cross validation to perform model selection. We run our experiments using popular datasets and compare our results to existing modifications of SVMs with feature selection procedure. We propose a number of ANOVA-SVM modifications which are simple to perform, while at the same time, boost significantly the accuracy and computing time of the SVMs in comparison to existing methods like the Mixed Integer Linear Feature Selection approach.

Download Full-text

Hybrid feature selection approach to identify optimal features of profile metadata to detect social bots in Twitter

Social Network Analysis and Mining ◽

10.1007/s13278-021-00786-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Eiman Alothali ◽

Kadhim Hayawi ◽

Hany Alashwal

Keyword(s):

Feature Selection ◽

Random Forest ◽

Cross Validation ◽

Performance Metrics ◽

Area Under The Curve ◽

Support Vector ◽

Detection Systems ◽

Vector Machines ◽

Selection Approach ◽

Feature Selection Approach

AbstractThe last few years have revealed that social bots in social networks have become more sophisticated in design as they adapt their features to avoid detection systems. The deceptive nature of bots to mimic human users is due to the advancement of artificial intelligence and chatbots, where these bots learn and adjust very quickly. Therefore, finding the optimal features needed to detect them is an area for further investigation. In this paper, we propose a hybrid feature selection (FS) method to evaluate profile metadata features to find these optimal features, which are evaluated using random forest, naïve Bayes, support vector machines, and neural networks. We found that the cross-validation attribute evaluation performance was the best when compared to other FS methods. Our results show that the random forest classifier with six optimal features achieved the best score of 94.3% for the area under the curve. The results maintained overall 89% accuracy, 83.8% precision, and 83.3% recall for the bot class. We found that using four features: favorites_count, verified, statuses_count, and average_tweets_per_day, achieves good performance metrics for bot detection (84.1% precision, 81.2% recall).

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text