scholarly journals SME Default Prediction Framework with the Effective Use of External Public Credit Data

2020 ◽  
Vol 12 (18) ◽  
pp. 7575
Author(s):  
Zhichao Luo ◽  
Pingyu Hsu ◽  
Ni Xu

Traditional default prediction models mainly rely on financial data. However, financial data on small and medium-sized enterprises (SMEs) are difficult to obtain, and even when they are available, their opaqueness may hinder analysis. Therefore, traditional prediction models encounter serious problems when being utilized to predict the defaulting of SMEs. In this paper, a novel prediction framework utilizing only external public credit data is proposed. The external public credit data used include SMEs’ basic information (BI), credit information from the government (CIG), and court verdict information (CVI), which can be collected from publicly accessible websites. Records on 15,605 sample companies were collected from approximately 300,000 companies. Among them, 8183 have defaulted. The empirical data were applied to construct prediction models using logistic regression, the classification and regression tree (CART) model, and LightGBM. The best results achieved 0.87 accuracy and 0.92 area under receiver operating characteristic (AUC). The results show that the model only uses the external credit data proven to have significant predict ability, and CIG variables offer the best prediction capacities.

2015 ◽  
Vol 68 ◽  
pp. 405-410
Author(s):  
W.R. Henshall ◽  
G.N. Hill ◽  
R.M. Beresford

Measured surface wetness duration is often used in disease risk prediction models but is only available from a few weather stations Wetness can be modelled from more widely available weather station networks using other meteorological variables This study compared wetness duration measured using different methods of interpreting wetness sensor output and from different sensor types with wetness calculated from a classification and regression tree (CART) model The model calculated wetness from temperature relative humidity and wind speed Different wetness sensors and different wetness calculation methods from the same sensor made little difference to recorded wetness duration Total wetness duration was greater for modelled than for measured wetness at all but one of seven sites investigated The use of modelled and measured wetness inputs into a grape botrytis prediction model indicated that modelled wetness is unsuitable for use in New Zealand without being calibrated for local conditions


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Serena Cabaro ◽  
Vittoria D’Esposito ◽  
Tiziana Di Matola ◽  
Silvia Sale ◽  
Michele Cennamo ◽  
...  

AbstractIn Europe, multiple waves of infections with SARS-CoV-2 (COVID-19) have been observed. Here, we have investigated whether common patterns of cytokines could be detected in individuals with mild and severe forms of COVID-19 in two pandemic waves, and whether machine learning approach could be useful to identify the best predictors. An increasing trend of multiple cytokines was observed in patients with mild or severe/critical symptoms of COVID-19, compared with healthy volunteers. Linear Discriminant Analysis (LDA) clearly recognized the three groups based on cytokine patterns. Classification and Regression Tree (CART) further indicated that IL-6 discriminated controls and COVID-19 patients, whilst IL-8 defined disease severity. During the second wave of pandemics, a less intense cytokine storm was observed, as compared with the first. IL-6 was the most robust predictor of infection and discriminated moderate COVID-19 patients from healthy controls, regardless of epidemic peak curve. Thus, serum cytokine patterns provide biomarkers useful for COVID-19 diagnosis and prognosis. Further definition of individual cytokines may allow to envision novel therapeutic options and pave the way to set up innovative diagnostic tools.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kaizhou Huang ◽  
Feiyang Ji ◽  
Zhongyang Xie ◽  
Daxian Wu ◽  
Xiaowei Xu ◽  
...  

Abstract Artificial liver support systems (ALSS) are widely used to treat patients with hepatitis B virus-related acute-on-chronic liver failure (HBV-ACLF). The aims of the present study were to investigate the subgroups of patients with HBV-ACLF who may benefit from ALSS therapy, and the relevant patient-specific factors. 489 ALSS-treated HBV-ACLF patients were enrolled, and served as derivation and validation cohorts for classification and regression tree (CART) analysis. CART analysis identified three factors prognostic of survival: hepatic encephalopathy (HE), prothrombin time (PT), and total bilirubin (TBil) level; and two distinct risk groups: low (28-day mortality 10.2–39.5%) and high risk (63.8–91.1%). The CART model showed that patients lacking HE and with a PT ≤ 27.8 s and a TBil level ≤455 μmol/L experienced less 28-day mortality after ALSS therapy. For HBV-ACLF patients with HE and a PT > 27.8 s, mortality remained high after such therapy. Patients lacking HE with a PT ≤ 27.8 s and TBil level ≤ 455 μmol/L may benefit markedly from ALSS therapy. For HBV-ACLF patients at high risk, unnecessary ALSS therapy should be avoided. The CART model is a novel user-friendly tool for screening HBV-ACLF patient eligibility for ALSS therapy, and will aid clinicians via ACLF risk stratification and therapeutic guidance.


2019 ◽  
Vol 11 (5) ◽  
pp. 1327 ◽  
Author(s):  
Bei Zhou ◽  
Zongzhi Li ◽  
Shengrui Zhang ◽  
Xinfen Zhang ◽  
Xin Liu ◽  
...  

Hit-and-run (HR) crashes refer to crashes involving drivers of the offending vehicle fleeing incident scenes without aiding the possible victims or informing authorities for emergency medical services. This paper aims at identifying significant predictors of HR and non-hit-and-run (NHR) in vehicle-bicycle crashes based on the classification and regression tree (CART) method. An oversampling technique is applied to deal with the data imbalance problem, where the number of minority instances (HR crash) is much lower than that of the majority instances (NHR crash). The police-reported data within City of Chicago from September 2017 to August 2018 is collected. The G-mean (geometric mean) is used to evaluate the classification performance. Results indicate that, compared with original CART model, the G-mean of CART model incorporating data imbalance treatment is increased from 23% to 61% by 171%. The decision tree reveals that the following five variables play the most important roles in classifying HR and NHR in vehicle-bicycle crashes: Driver age, bicyclist safety equipment, driver action, trafficway type, and gender of drivers. Several countermeasures are recommended accordingly. The current study demonstrates that, by incorporating data imbalance treatment, the CART method could provide much more robust classification results.


2021 ◽  
Author(s):  
Peng Song ◽  
Shengwei Ren ◽  
Yu Liu ◽  
Pei Li ◽  
Qingyan Zeng

Abstract The aim of this study was to develop a predictive model for subclinical keratoconus (SKC) based on decision tree (DT) algorithms. A total of 194 eyes (including 105 normal eyes and 89 SKC) were included in the double-center retrospective study. Data were separately used for training and validation databases. The baseline variables were derived from tomography and biomechanical imaging. DT models were generated in the training database using Chi-square automatic interaction detection (CHAID) and classification and regression tree (CART) algorithms. The discriminating rules of the CART model selected variables of the Belin/Ambrósio deviation (BAD-D), stiffness parameter at first applanation (SPA1), back eccentricity (Becc), and maximum pachymetric progression index in order, while the CHAID model selected BAD-D, deformation amplitude ratio, SPA1, and Becc. The CART model allowed discrimination between normal and SKC eyes with 92.2% accuracy, which was higher than that of the CHAID model (88.3%), BAD-D (82.0%), Corvis biomechanical index (CBI, 77.3%), and tomographic and biomechanical index (TBI, 78.1%). The discriminating performance of the CART model was validated with 92.4% accuracy, while the CHAID model was validated with 86.4% accuracy in the validation database. Thus, the CART model using tomography and biomechanical imaging was an excellent model for SKC screening and provided easy-to-understand discriminating rules.


2020 ◽  
Vol 39 (5) ◽  
pp. 6073-6087
Author(s):  
Meltem Yontar ◽  
Özge Hüsniye Namli ◽  
Seda Yanik

Customer behavior prediction is gaining more importance in the banking sector like in any other sector recently. This study aims to propose a model to predict whether credit card users will pay their debts or not. Using the proposed model, potential unpaid risks can be predicted and necessary actions can be taken in time. For the prediction of customers’ payment status of next months, we use Artificial Neural Network (ANN), Support Vector Machine (SVM), Classification and Regression Tree (CART) and C4.5, which are widely used artificial intelligence and decision tree algorithms. Our dataset includes 10713 customer’s records obtained from a well-known bank in Taiwan. These records consist of customer information such as the amount of credit, gender, education level, marital status, age, past payment records, invoice amount and amount of credit card payments. We apply cross validation and hold-out methods to divide our dataset into two parts as training and test sets. Then we evaluate the algorithms with the proposed performance metrics. We also optimize the parameters of the algorithms to improve the performance of prediction. The results show that the model built with the CART algorithm, one of the decision tree algorithm, provides high accuracy (about 86%) to predict the customers’ payment status for next month. When the algorithm parameters are optimized, classification accuracy and performance are increased.


2021 ◽  
Vol 21 (5) ◽  
pp. 165-173
Author(s):  
Donggoo Seo ◽  
Byunghun Park ◽  
Younghyun Lee ◽  
Wonhee Lee ◽  
Jungjae Kim ◽  
...  

This study has developed a model that predicts casualties (dead and injured people) using the Classification And Regression Tree (CART). Based on the fire statistics collected over a decade, this model aims to select the appropriate risk-assessment scenarios and fire prevention and safety methods applicable on individual buildings. Our evaluation indicates that this CART model can accurately predict 48 scenarios based on 5 variables related to the types of fire, fire growth rates, and evacuation situations, and calculate the corresponding probabilities for each occurrence. This model is expected to improve future quantitative fire risk assessments.


2021 ◽  
Vol 37 (4) ◽  
pp. 293-304
Author(s):  
Thobela Tyasi ◽  
Amanda Tshegofatso Mkhonto ◽  
Madumetja Mathapo ◽  
Kagisho Molabe

Regression tree is the data mining algorithm method which contains a series of calculations that creates a model from collected data. Present study aimed to develop model to estimate body weight (BW) from biometric traits viz. withers height (WH), sternum height (SH), body length (BL), heart girth (HG) and rump height (RH). A total of eighty-three (n = 83) South African non-descript indigenous goats ( 54 females and 29 males) aged three months and above were used in the study. Pearson?s correlations and classification and regression tree (CART) as statistical techniques were used for data analysis. Correlation results indicated that there was a positive highly statistical significant (P < 0.01) correlation between BW and all biometric traits in both males and females, the positive highly statistical significant correlation was observed between BW and WH (r = 0.82) in female goats while in males the highest positive statistical significant correlation was detected between BW and BL (r = 0.83). CART model indicated that the BW mean was 29.868 kilograms (kg) as dependent variable and BL had the highest remarkable role in BW followed by SH, RH while the age had the least remarkable role in BW. This study suggests that BL, SH and RH might be used by South African non-descript goats? farmers as a selection criterion during breeding to improve BW of animal. More completive studies and experiments need to be done using CART to predict BW in more sample size of South African nondescript goats or other goat breeds.


Author(s):  
Bahareh Ghasemain ◽  
Dawod Talebpoor Asl ◽  
Binh Thai Pham ◽  
Mohammadtghi Avand ◽  
Huu Duy Nguyen ◽  
...  

Shallow landslides through land degrading not only lead to threat the properly and life of human but they also may produce huge ecosystem damages. The aim of this study was to compare the performance of two decision tree machine learning algorithms including classification and regression tree (CART) and reduced error pruning tree (REPTree) for shallow landslide susceptibility mapping in Bijar, Kurdistan province, Iran. We first used 20 conditioning factors and then they were tested by information gain ratio (IGR) technique to select the most important ones. We then constructed a geodatabase based on the selected factors along with a total of 111 landslide locations with a ratio of 80/20 (for calibration/validation). The performance of the models was checked by the true positive rate (TP Rate), false positive rate (FP Rate), precision, recall, F1-Measure, Kappa, mean absolute error, and area under the receiver operatic curve (AUC). Results of IGR specified that the slope angle and TWI had the most contribution to shallow landslide occurrence in the study area. Moreover, results concluded that although these models had a high goodness-of-fit and prediction accuracy, the CART model (AUC=0.856) outperformed the REPTree model (AUC=0.837). Therefore, the CART model can be used as a promising tool and also as a base classifier to hybrid with optimization algorithms and Meta classifiers for spatial prediction of shallow landslide-prone areas.


Sign in / Sign up

Export Citation Format

Share Document