Prediction of credit risk with an ensemble model: a correlation-based classifier selection approach

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Zhibin Xiong ◽  
Jun Huang

Purpose Ensemble models that combine multiple base classifiers have been widely used to improve prediction performance in credit risk evaluation. However, an arbitrary selection of base classifiers is problematic. The purpose of this paper is to develop a framework for selecting base classifiers to improve the overall classification performance of an ensemble model. Design/methodology/approach In this study, selecting base classifiers is treated as a feature selection problem, where the output from a base classifier can be considered a feature. The proposed correlation-based classifier selection using the maximum information coefficient (MIC-CCS), a correlation-based classifier selection under the maximum information coefficient method, selects the features (classifiers) using nonlinear optimization programming, which seeks to optimize the relationship between the accuracy and diversity of base classifiers, based on MIC. Findings The empirical results show that ensemble models perform better than stand-alone ones, whereas the ensemble model based on MIC-CCS outperforms the ensemble models with unselected base classifiers and other ensemble models based on traditional forward and backward selection methods. Additionally, the classification performance of the ensemble model in which correlation is measured with MIC is better than that measured with the Pearson correlation coefficient. Research limitations/implications The study provides an alternate solution to effectively select base classifiers that are significantly different, so that they can provide complementary information and, as these selected classifiers have good predictive capabilities, the classification performance of the ensemble model is improved. Originality/value This paper introduces MIC to the correlation-based selection process to better capture nonlinear and nonfunctional relationships in a complex credit data structure and construct a novel nonlinear programming model for base classifiers selection that has not been used in other studies.

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Aydin Shishegaran ◽  
Behnam Karami ◽  
Elham Safari Danalou ◽  
Hesam Varaee ◽  
Timon Rabczuk

Purpose The resistance of steel plate shear walls (SPSW) under explosive loads is evaluated using nonlinear FE analysis and surrogate methods. This study uses the conventional weapons effect program (CONWEP) model for the explosive load and the Johnson-Cook model for the steel plate. Based on the Taguchi method, 25 samples out of 100 samples are selected for a parametric study where we predict the damaged zones and the maximum deflection of SPSWs under explosive loads. Then, this study uses a multiple linear regression (MLR), multiple Ln equation regression (MLnER), gene expression programming (GEP), adaptive network-based fuzzy inference (ANFIS) and an ensemble model to predict the maximum detection of SPSWs. Several statistical parameters and error terms are used to evaluate the accuracy of the different surrogate models. The results show that the cross-section in the y-direction and the plate thickness have the most significant effects on the maximum deflection of SPSWs. The results also show that the maximum deflection is related to the scaled distance, i.e. for a value of 0.383. The ensemble model performs better than all other models for predicting the maximum deflection of SPSWs under explosive loads. Design/methodology/approach The SPSW under explosive loads is evaluated using nonlinear FE analysis and surrogate methods. This study uses the CONWEP model for the explosive load and the Johnson-Cook model for the steel plate. Based on the Taguchi method, 25 samples out of 100 samples are selected for a parametric study where we predict the damaged zones and the maximum deflection of SPSWs under explosive loads. Then, this study uses a MLR, MLnER, GEP, ANFIS and an ensemble model to predict the maximum detection of SPSWs. Several statistical parameters and error terms are used to evaluate the accuracy of the different surrogate models. The results show that the cross-section in the y-direction and the plate thickness have the most significant effects on the maximum deflection of SPSWs. The results also show that the maximum deflection is related to the scaled distance, i.e. for a value of 0.383. The ensemble model performs better than all other models for predicting the maximum deflection of SPSWs under explosive loads. Findings The resistance of SPSW under explosive loads is evaluated using nonlinear FE analysis and surrogate methods. This study uses the CONWEP model for the explosive load and the Johnson-Cook model for the steel plate. Based on the Taguchi method, 25 samples out of 100 samples are selected for a parametric study where we predict the damaged zones and the maximum deflection of SPSWs under explosive loads. Then, this study uses a MLR, MLnER, GEP, ANFIS and an ensemble model to predict the maximum detection of SPSWs. Several statistical parameters and error terms are used to evaluate the accuracy of the different surrogate models. The results show that the cross-section in the y-direction and the plate thickness have the most significant effects on the maximum deflection of SPSWs. The results also show that the maximum deflection is related to the scaled distance, i.e. for a value of 0.383. The ensemble model performs better than all other models for predicting the maximum deflection of SPSWs under explosive loads. Originality/value The resistance of SPSW under explosive loads is evaluated using nonlinear FE analysis and surrogate methods. This study uses the CONWEP model for the explosive load and the Johnson-Cook model for the steel plate. Based on the Taguchi method, 25 samples out of 100 samples are selected for a parametric study where we predict the damaged zones and the maximum deflection of SPSWs under explosive loads. Then, this study uses a MLR, MLnER, GEP, ANFIS and an ensemble model to predict the maximum detection of SPSWs. Several statistical parameters and error terms are used to evaluate the accuracy of the different surrogate models. The results show that the cross-section in the y-direction and the plate thickness have the most significant effects on the maximum deflection of SPSWs. The results also show that the maximum deflection is related to the scaled distance, i.e. for a value of 0.383. The ensemble model performs better than all other models for predicting the maximum deflection of SPSWs under explosive loads.


2018 ◽  
Vol 10 (2) ◽  
pp. 185-205
Author(s):  
Hassan Akram ◽  
Khalil ur Rahman

PurposeThis study aims to examine and compare the credit risk management (CRM) scenario of Islamic banks (IBs) and conventional banks (CBs) in Pakistan, keeping in view the phenomenal growth of Islamic banking and its future implications.Design/methodology/approachA sample of five CBs and four IBs was chosen out of the whole banking industry for the study. Secondary data obtained from the banks’ annual financial reports for 13 years, starting from 2004 to 2016, were analyzed. Multiple regression, correlation and descriptive analysis were used in the examination of the data.FindingsThe results show that loan quality (LQ) has a positive and significant impact on CRM for both IBs and CBs. Asset quality (AQ), on the other hand, has a negative impact on CRM in the case of IBs, but has a significantly positive relation with CRM in the case of CBs. The impact of 16 ratios measuring LQ and AQ have also been individually checked on CRM, by making use of a regression model using a dummy variable of financial crises for robust comparison among CBs and IBs. The model proved significant, and CRM performance of IBs was observed to be better than that of CBs. Moreover, the mean average value of financial ratios used as a measuring tool for these variables shows that the CRM performance of IBs operating in Pakistan was better than that of CBs over the period of the study.Practical implicationsThe research findings are expected to facilitate bankers, investors, academics and policy makers to build a better understanding of CRM practices as adopted by CBs and IBs. The findings would be useful in formulating policy measures for the progress of the banking industry in Pakistan.Originality/valueThis research is unique in terms of its approach toward analyzing and comparing CRM performance of CBs and IBs. Such work has not been carried out before in the Pakistani banking industry.


2021 ◽  
Author(s):  
Dongyup Shin ◽  
Hye Jin Kam ◽  
Min-Seok Jeon ◽  
Ha Young Kim

BACKGROUND In the case of Korean institutions and enterprises that collect non-standardized and non-unified formats of electronic medical examination results from multiple medical institutions, a group of experienced nurses in examination work has been established by classification guidelines based on important keywords and manually classifying individual test results to offer consistent services. However, there have been problems in which rule-based algorithms or human-labor-intensive works can be time-consuming or limited owing to high potential errors. We investigated natural language processing (NLP) architectures and proposed ensemble models to create automated classifiers. OBJECTIVE This study aimed to develop practical deep learning models with electronic medical records (EMRs) from 284 healthcare institutions and open-source corpus datasets for automatically classifying three thyroid condition labels: healthy, caution-required, and critical. The primary goal is to increase the overall accuracy of the classification, yet there are practical and industrial needs to correctly predict healthy (negative) thyroid condition data, which are most medical examination results, and minimize false-negative rates under the prediction of healthy thyroid condition. METHODS The datasets included thyroid and comprehensive medical examination reports. The textual data are not only documented in fully complete sentences, but also written in the format of a list of words or phrases. Therefore, we propose static and contextualized ensemble NLP-neTwork (SCENT) systems to successfully reflect static and contextual information and handle incomplete sentences. We prepared each convolutional neural network (CNN), long short-term memory (LSTM), and Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA) based ensemble model by training or fine-tuning them multiple times. Through comprehensive experiments, we propose two versions of ensemble models, SCENT-v1 and SCENT-v2, with the single-architecture-based CNN, LSTM, and ELECTRA ensemble models for the best classification performance and practical use, respectively. SCENT-v1 is an ensemble of CNN and ELECTRA ensemble models, and SCENT-v2 is a hierarchical ensemble of CNN, LSTM, and ELECTRA ensemble models. SCENT-v2 first classifies the three labels using an ELECTRA ensemble model and then reclassifies them using an ensemble model of CNN and LSTM ensemble models if the ELECTRA ensemble model predicted them as “healthy” labels. RESULTS SCENT-v1 outperformed all the suggested models, with the highest F1-score (92.56%). SCENT-v2 had the second-highest recall value (94.44%) and the fewest misclassifications for caution-required thyroid condition while maintaining zero classification error for the critical thyroid condition under the prediction of the healthy thyroid condition. CONCLUSIONS The proposed SCENT demonstrates good classification performance despite the unique characteristics of Koreans and problems of data lack and imbalance, especially for the extremely low amount of critical condition data. The result of SCENT-v1 indicates that different perspectives of static and contextual input token representations can enhance classification performance. SCENT-v2 has a strong impact on the prediction of healthy thyroid conditions.


2020 ◽  
Vol 12 (4) ◽  
pp. 495-529
Author(s):  
Mohamad Hassan ◽  
Evangelos Giouvris

Purpose This study Investigates Shareholders' value adjustment in response to financial institutions (FIs) merger announcements in the immediate event window and in the extended event window. This study also investigates accounting measures performance, comparison of post-merger to pre-merger, including several cash flow measures and not just profitability measures, as the empirical literature review suggests. Finally, the authors examine FIs mergers orientations of diversification and focus create more value for shareholders (in the immediate announcement window and several months afterward) and/or generates better cash flows, profitability and less credit risk. Design/methodology/approach This study examines FIs merger effect on bidders’ shareholder’s value and on their observed performance. This examination deploys three techniques simultaneously: a) an event study analysis, to estimate and calculate abnormal returns (ARs) and cumulative abnormal returns (CARs) in the narrow windows of the merger announcement, b) buy and hold event study analysis, to estimate ARs in the wider window of the event, +50 to +230 days after the merger announcement and c) an observed performance analysis, of financial and capital efficiency measures before and after the merger announcement; return on equity, liquidity, cost to income ratio, capital to total assets ratio, net loans to total loans, credit risk, loans to deposits ratio, other expenses and total assets, economic value addition, weighted average cost of capital and return on invested capital. Deal criteria of value, mega-deals, strategic orientation (as in Ansoff (1980) growth strategies), acquiring bank size and payment method are set as individually as control variables. Findings Results show that FIs mergers destroy share value for the bidding firms pursuing a market penetration strategy. Market development and product development strategies enable shareholders’ value creation in short and long horizons. Diversification strategies do not influence bidding shareholders’ value. Local bank to bank mergers create shareholders’ value and enhance liquidity and economic value in the short run. Bank to bank cross border mergers create value for bidders’ in the long term but are associated with high costs and higher risks. Originality/value A significant advancement over the current literature is in assessing mergers, not only for bank bidders but also for the three pillars FIs of the financial sector; banks, real-estate companies and investment companies mergers. It is an improvement over current finance literature because it deploys two different strategies in the analysis. At a univariate level, shareholder value creation and market reaction to merger announcements are examined over short (−5 or +5 days) and long (+230 days) windows of the event. Followed by regressing, the resultant CARs and BHARs over financial performance variables at the multivariate level.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Angel Kit Yi Wong ◽  
Sylvia Yee Fan Tang ◽  
Dora Dong Yu Li ◽  
May May Hung Cheng

PurposeThe purpose of this paper is threefold. Firstly, a new concept, teacher buoyancy, is introduced. Based on the significance to study how teachers bounce back from minor and frequent setbacks (vs. major adversities emphasized in resilience) in their daily work and the research on buoyancy by Martin and Marsh, a dual-component framework to conceptualize this new concept is introduced. Secondly, the development of a new instrument, the Teacher Buoyancy Scale (TBS), to measure it is presented. Thirdly, results of a study using the TBS are reported, which provide insights into how teacher buoyancy can be fostered.Design/methodology/approachThe study employed a quantitative design. A total of 258 teachers taking a part-time initial teacher education (ITE) program completed the TBS. Their responses were analyzed by exploratory factor analysis (EFA). In addition to descriptive statistics and reliability coefficients, Pearson correlation coefficients were calculated to examine the relationship among the factors.FindingsThe data analysis indicated five factors, namely, Coping with difficulties, Bouncing back cognitively and emotionally, Working hard and appraising difficulties positively, Caring for one's well-being and Striving for professional growth. These factors can be readily interpreted by the dual-component framework. Correlations among the factors further revealed that enabling factors can be subdivided into more proximal personal strengths relating to direct coping, and more distal personal assets pertaining to personal well-being. It is the latter that correlates most highly with perceived teacher buoyancy.Originality/valueThe most original contribution of this paper is the proposal of the new concept of teacher buoyancy which is teachers' capacity to deal with the everyday challenges that most teachers face in their teaching. The delineation between buoyancy and resilience sharpens the focus of the problem domain that is most relevant to teachers. The development of the TBS provides a useful and reliable instrument to examine teacher buoyancy in future studies.


2021 ◽  
Vol 21 (S2) ◽  
Author(s):  
Kun Zeng ◽  
Yibin Xu ◽  
Ge Lin ◽  
Likeng Liang ◽  
Tianyong Hao

Abstract Background Eligibility criteria are the primary strategy for screening the target participants of a clinical trial. Automated classification of clinical trial eligibility criteria text by using machine learning methods improves recruitment efficiency to reduce the cost of clinical research. However, existing methods suffer from poor classification performance due to the complexity and imbalance of eligibility criteria text data. Methods An ensemble learning-based model with metric learning is proposed for eligibility criteria classification. The model integrates a set of pre-trained models including Bidirectional Encoder Representations from Transformers (BERT), A Robustly Optimized BERT Pretraining Approach (RoBERTa), XLNet, Pre-training Text Encoders as Discriminators Rather Than Generators (ELECTRA), and Enhanced Representation through Knowledge Integration (ERNIE). Focal Loss is used as a loss function to address the data imbalance problem. Metric learning is employed to train the embedding of each base model for feature distinguish. Soft Voting is applied to achieve final classification of the ensemble model. The dataset is from the standard evaluation task 3 of 5th China Health Information Processing Conference containing 38,341 eligibility criteria text in 44 categories. Results Our ensemble method had an accuracy of 0.8497, a precision of 0.8229, and a recall of 0.8216 on the dataset. The macro F1-score was 0.8169, outperforming state-of-the-art baseline methods by 0.84% improvement on average. In addition, the performance improvement had a p-value of 2.152e-07 with a standard t-test, indicating that our model achieved a significant improvement. Conclusions A model for classifying eligibility criteria text of clinical trials based on multi-model ensemble learning and metric learning was proposed. The experiments demonstrated that the classification performance was improved by our ensemble model significantly. In addition, metric learning was able to improve word embedding representation and the focal loss reduced the impact of data imbalance to model performance.


2020 ◽  
pp. 1-17
Author(s):  
Dongqi Yang ◽  
Wenyu Zhang ◽  
Xin Wu ◽  
Jose H. Ablanedo-Rosas ◽  
Lingxiao Yang ◽  
...  

With the rapid development of commercial credit mechanisms, credit funds have become fundamental in promoting the development of manufacturing corporations. However, large-scale, imbalanced credit application information poses a challenge to accurate bankruptcy predictions. A novel multi-stage ensemble model with fuzzy clustering and optimized classifier composition is proposed herein by combining the fuzzy clustering-based classifier selection method, the random subspace (RS)-based classifier composition method, and the genetic algorithm (GA)-based classifier compositional optimization method to achieve accuracy in predicting bankruptcy among corporates. To overcome the inherent inflexibility of traditional hard clustering methods, a new fuzzy clustering-based classifier selection method is proposed based on the mini-batch k-means algorithm to obtain the best performing base classifiers for generating classifier compositions. The RS-based classifier composition method was applied to enhance the robustness of candidate classifier compositions by randomly selecting several subspaces in the original feature space. The GA-based classifier compositional optimization method was applied to optimize the parameters of the promising classifier composition through the iterative mechanism of the GA. Finally, six datasets collected from the real world were tested with four evaluation indicators to assess the performance of the proposed model. The experimental results showed that the proposed model outperformed the benchmark models with higher predictive accuracy and efficiency.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 870
Author(s):  
Jiansheng Zhang ◽  
Hongli Fu ◽  
Yan Xu

In recent years, scientists have found a close correlation between DNA methylation and aging in epigenetics. With the in-depth research in the field of DNA methylation, researchers have established a quantitative statistical relationship to predict the individual ages. This work used human blood tissue samples to study the association between age and DNA methylation. We built two predictors based on healthy and disease data, respectively. For the health data, we retrieved a total of 1191 samples from four previous reports. By calculating the Pearson correlation coefficient between age and DNA methylation values, 111 age-related CpG sites were selected. Gradient boosting regression was utilized to build the predictive model and obtained the R2 value of 0.86 and MAD of 3.90 years on testing dataset, which were better than other four regression methods as well as Horvath’s results. For the disease data, 354 rheumatoid arthritis samples were retrieved from a previous study. Then, 45 CpG sites were selected to build the predictor and the corresponded MAD and R2 were 3.11 years and 0.89 on the testing dataset respectively, which showed the robustness of our predictor. Our results were better than the ones from other four regression methods. Finally, we also analyzed the twenty-four common CpG sites in both healthy and disease datasets which illustrated the functional relevance of the selected CpG sites.


2020 ◽  
Vol 13 (3) ◽  
pp. 365-388
Author(s):  
Asha Sukumaran ◽  
Thomas Brindha

PurposeThe humans are gifted with the potential of recognizing others by their uniqueness, in addition with more other demographic characteristics such as ethnicity (or race), gender and age, respectively. Over the decades, a vast count of researchers had undergone in the field of psychological, biological and cognitive sciences to explore how the human brain characterizes, perceives and memorizes faces. Moreover, certain computational advancements have been developed to accomplish several insights into this issue.Design/methodology/approachThis paper intends to propose a new race detection model using face shape features. The proposed model includes two key phases, namely. (a) feature extraction (b) detection. The feature extraction is the initial stage, where the face color and shape based features get mined. Specifically, maximally stable extremal regions (MSER) and speeded-up robust transform (SURF) are extracted under shape features and dense color feature are extracted as color feature. Since, the extracted features are huge in dimensions; they are alleviated under principle component analysis (PCA) approach, which is the strongest model for solving “curse of dimensionality”. Then, the dimensional reduced features are subjected to deep belief neural network (DBN), where the race gets detected. Further, to make the proposed framework more effective with respect to prediction, the weight of DBN is fine tuned with a new hybrid algorithm referred as lion mutated and updated dragon algorithm (LMUDA), which is the conceptual hybridization of lion algorithm (LA) and dragonfly algorithm (DA).FindingsThe performance of proposed work is compared over other state-of-the-art models in terms of accuracy and error performance. Moreover, LMUDA attains high accuracy at 100th iteration with 90% of training, which is 11.1, 8.8, 5.5 and 3.3% better than the performance when learning percentage (LP) = 50%, 60%, 70%, and 80%, respectively. More particularly, the performance of proposed DBN + LMUDA is 22.2, 12.5 and 33.3% better than the traditional classifiers DCNN, DBN and LDA, respectively.Originality/valueThis paper achieves the objective detecting the human races from the faces. Particularly, MSER feature and SURF features are extracted under shape features and dense color feature are extracted as color feature. As a novelty, to make the race detection more accurate, the weight of DBN is fine tuned with a new hybrid algorithm referred as LMUDA, which is the conceptual hybridization of LA and DA, respectively.


2016 ◽  
Vol 24 (3) ◽  
pp. 343-362
Author(s):  
Latif Cem Osken ◽  
Ceylan Onay ◽  
Gözde Unal

Purpose This paper aims to analyze the dynamics of the security lending process and lending markets to identify the market-wide variables reflecting the characteristics of the stock borrowed and to measure the credit risk arising from lending contracts. Design/methodology/approach Using the data provided by Istanbul Settlement and Custody Bank on the equity lending contracts of Securities Lending and Borrowing Market between 2010 and 2012 and the data provided by Borsa Istanbul on Equity Market transactions for the same timeframe, this paper analyzes whether stock price volatility, stock returns, return per unit amount of risk and relative liquidity of lending market and equity market affect the defaults of lending contracts by using both linear regression and ordinary least squares regression for robustness and proxying the concepts of relative liquidity, volatility and return constructs by more than variable to correlate findings. Findings The results illustrate a statistically significant relationship between volatility and the default state of the lending contracts but fail to establish a connection between default states and stock returns or relative liquidity of markets. Research limitations/implications With the increasing pressure for clearing security lending contracts in central counterparties, it is imperative for both central counterparties and regulators to be able to precisely measure the risk exposure due to security lending transactions. The results gained from a limited set of lending transactions merit further studies to identify non-borrower and non-systemic credit risk determinants. Originality/value This is the first study to analyze the non-borrower and non-systemic credit risk determinants in security lending markets.


Sign in / Sign up

Export Citation Format

Share Document