A novel multi-stage ensemble model with a hybrid genetic algorithm for credit scoring on imbalanced data

With the advancement of machine learning, credit scoring can be performed better. As one of the widely recognized machine learning methods, ensemble learning has demonstrated significant improvements in the predictive accuracy over individual machine learning models for credit scoring. This study proposes a novel multi-stage ensemble model with multiple K-means-based selective undersampling for credit scoring. First, a new multiple K-means-based undersampling method is proposed to deal with the imbalanced data. Then, a new selective sampling mechanism is proposed to select the better-performing base classifiers adaptively. Finally, a new feature-enhanced stacking method is proposed to construct an effective ensemble model by composing the shortlisted base classifiers. In the experiments, four datasets with four evaluation indicators are used to evaluate the performance of the proposed model, and the experimental results prove the superiority of the proposed model over other benchmark models.

Download Full-text

A novel multi-stage ensemble model for credit scoring based on synthetic sampling and feature transformation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211467 ◽

2021 ◽

pp. 1-16

Author(s):

Fang He ◽

Wenyu Zhang ◽

Zhijia Yan

Keyword(s):

Credit Scoring ◽

Imbalanced Data ◽

Transformation Method ◽

Classification Performance ◽

Ensemble Model ◽

Feature Transformation ◽

Learning Methods ◽

Scoring Model ◽

Multi Stage ◽

Credit Scoring Model

Credit scoring has become increasingly important for financial institutions. With the advancement of artificial intelligence, machine learning methods, especially ensemble learning methods, have become increasingly popular for credit scoring. However, the problems of imbalanced data distribution and underutilized feature information have not been well addressed sufficiently. To make the credit scoring model more adaptable to imbalanced datasets, the original model-based synthetic sampling method is extended herein to balance the datasets by generating appropriate minority samples to alleviate class overlap. To enable the credit scoring model to extract inherent correlations from features, a new bagging-based feature transformation method is proposed, which transforms features using a tree-based algorithm and selects features using the chi-square statistic. Furthermore, a two-layer ensemble method that combines the advantages of dynamic ensemble selection and stacking is proposed to improve the classification performance of the proposed multi-stage ensemble model. Finally, four standardized datasets are used to evaluate the performance of the proposed ensemble model using six evaluation metrics. The experimental results confirm that the proposed ensemble model is effective in improving classification performance and is superior to other benchmark models.

Download Full-text

A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113872 ◽

2021 ◽

Vol 165 ◽

pp. 113872

Author(s):

Wenyu Zhang ◽

Dongqi Yang ◽

Shuai Zhang ◽

Jose H. Ablanedo-Rosas ◽

Xin Wu ◽

...

Keyword(s):

Credit Scoring ◽

Ensemble Model ◽

Multi Stage

Download Full-text

Study on Multi-stage Logistics System Design Problem with Inventory Considering Demand Change by Hybrid Genetic Algorithm

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.130.704 ◽

2010 ◽

Vol 130 (4) ◽

pp. 704-711

Author(s):

Hisaki Inoue ◽

Mitsuo Gen

Keyword(s):

Genetic Algorithm ◽

System Design ◽

Design Problem ◽

Hybrid Genetic Algorithm ◽

Logistics System ◽

Multi Stage ◽

Logistics System Design

Download Full-text

A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring

Expert Systems with Applications ◽

10.1016/j.eswa.2018.12.020 ◽

2019 ◽

Vol 121 ◽

pp. 221-232 ◽

Cited By ~ 16

Author(s):

Wenyu Zhang ◽

Hongliang He ◽

Shuai Zhang

Keyword(s):

Genetic Algorithm ◽

Hybrid Model ◽

Credit Scoring ◽

Multi Stage

Download Full-text

A Multi-Stage Self-Adaptive Classifier Ensemble Model With Application in Credit Scoring

IEEE Access ◽

10.1109/access.2019.2922676 ◽

2019 ◽

Vol 7 ◽

pp. 78549-78559 ◽

Cited By ~ 4

Author(s):

Shanshan Guo ◽

Hongliang He ◽

Xiaoling Huang

Keyword(s):

Credit Scoring ◽

Classifier Ensemble ◽

Ensemble Model ◽

Multi Stage ◽

Self Adaptive

Download Full-text

Evaluation of Accidental Death Records Using Hybrid Genetic Algorithm

SSRN Electronic Journal ◽

10.2139/ssrn.3563084 ◽

2020 ◽

Cited By ~ 2

Author(s):

Nikhil Sharma ◽

Ila Kaushik ◽

Rajat Rathi ◽

Santosh Kumar

Keyword(s):

Genetic Algorithm ◽

Hybrid Genetic Algorithm ◽

Accidental Death

Download Full-text

Design of GA and Ontology based NLP Frameworks for Online Opinion Mining

Recent Patents on Engineering ◽

10.2174/1872212112666180115162726 ◽

2019 ◽

Vol 13 (2) ◽

pp. 159-165

Author(s):

Manik Sharma ◽

Gurvinder Singh ◽

Rajinder Singh

Keyword(s):

Genetic Algorithm ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Opinion Mining ◽

Hybrid Genetic Algorithm ◽

Online Reviews ◽

Middle Tier ◽

Complete Set ◽

Mining Model

Background: For almost every domain, a tremendous degree of data is accessible in an online and offline mode. Billions of users are daily posting their views or opinions by using different online applications like WhatsApp, Facebook, Twitter, Blogs, Instagram etc. Objective: These reviews are constructive for the progress of the venture, civilization, state and even nation. However, this momentous amount of information is useful only if it is collectively and effectively mined. Methodology: Opinion mining is used to extract the thoughts, expression, emotions, critics, appraisal from the data posted by different persons. It is one of the prevailing research techniques that coalesce and employ the features from natural language processing. Here, an amalgamated approach has been employed to mine online reviews. Results: To improve the results of genetic algorithm based opining mining patent, here, a hybrid genetic algorithm and ontology based 3-tier natural language processing framework named GAO_NLP_OM has been designed. First tier is used for preprocessing and corrosion of the sentences. Middle tier is composed of genetic algorithm based searching module, ontology for English sentences, base words for the review, complete set of English words with item and their features. Genetic algorithm is used to expedite the polarity mining process. The last tier is liable for semantic, discourse and feature summarization. Furthermore, the use of ontology assists in progressing more accurate opinion mining model. Conclusion: GAO_NLP_OM is supposed to improve the performance of genetic algorithm based opinion mining patent. The amalgamation of genetic algorithm, ontology and natural language processing seems to produce fast and more precise results. The proposed framework is able to mine simple as well as compound sentences. However, affirmative preceded interrogative, hidden feature and mixed language sentences still be a challenge for the proposed framework.

Download Full-text