Corporate Social Irresponsibility and Credit Risk Prediction: A Machine Learning Approach

Daniel V. Fauser; Andreas Gruener

doi:10.3790/ccm.53.4.513

Corporate Social Irresponsibility and Credit Risk Prediction: A Machine Learning Approach

Credit and Capital Markets – Kredit und Kapital ◽

10.3790/ccm.53.4.513 ◽

2020 ◽

Vol 53 (4) ◽

pp. 513-554

Author(s):

Daniel V. Fauser ◽

Andreas Gruener

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Risk Prediction ◽

Prediction Accuracy ◽

Explanatory Power ◽

Machine Learning Algorithms ◽

Out Of Sample ◽

Corporate Social Irresponsibility ◽

Out Of Sample Prediction ◽

Corporate Social

This paper examines the prediction accuracy of various machine learning (ML) algorithms for firm credit risk. It marks the first attempt to leverage data on corporate social irresponsibility (CSI) to better predict credit risk in an ML context. Even though the literature on default and credit risk is vast, the potential explanatory power of CSI for firm credit risk prediction remains unexplored. Previous research has shown that CSI may jeopardize firm survival and thus potentially comes into play in predicting credit risk. We find that prediction accuracy varies considerably between algorithms, with advanced machine learning algorithms (e. g. random forests) outperforming traditional ones (e. g. linear regression). Random forest regression achieves an out-of-sample prediction accuracy of 89.75% for adjusted R2 due to the ability of capturing non-linearity and complex interaction effects in the data. We further show that including information on CSI in firm credit risk prediction does not consistently increase prediction accuracy. One possible interpretation of this result is that CSI does not (yet) seem to be systematically reflected in credit ratings, despite prior literature indicating that CSI increases credit risk. Our study contributes to improving firm credit risk predictions using a machine learning design and to exploring how CSI is reflected in credit risk ratings.

Download Full-text

Microstructure in the Machine Age

Review of Financial Studies ◽

10.1093/rfs/hhaa078 ◽

2020 ◽

Cited By ~ 1

Author(s):

David Easley ◽

Marcos López de Prado ◽

Maureen O’Hara ◽

Zhibai Zhang

Keyword(s):

Machine Learning ◽

Market Efficiency ◽

Predictive Power ◽

Explanatory Power ◽

Futures Contracts ◽

Machine Age ◽

Price Process ◽

Out Of Sample ◽

Asset Classes ◽

Out Of Sample Prediction

Abstract Understanding modern market microstructure phenomena requires large amounts of data and advanced mathematical tools. We demonstrate how machine learning can be applied to microstructural research. We find that microstructure measures continue to provide insights into the price process in current complex markets. Some microstructure features with high explanatory power exhibit low predictive power, while others with less explanatory power have more predictive power. We find that some microstructure-based measures are useful for out-of-sample prediction of various market statistics, leading to questions about market efficiency. We also show how microstructure measures can have important cross-asset effects. Our results are derived using 87 liquid futures contracts across all asset classes.

Download Full-text

Completing the Market

IMF Working Papers ◽

10.5089/9781513524085.001 ◽

2019 ◽

Vol 19 (292) ◽

Author(s):

Nan Hu ◽

Jian Li ◽

Alexis Meyer-Cirkel

Keyword(s):

Machine Learning ◽

Credit Risk ◽

Prediction Accuracy ◽

Ensemble Methods ◽

Predictive Performance ◽

Gradient Boosting ◽

Learning Methods ◽

Machine Learning Methods ◽

Ensemble Machine Learning ◽

Out Of Sample Prediction

We compared the predictive performance of a series of machine learning and traditional methods for monthly CDS spreads, using firms’ accounting-based, market-based and macroeconomics variables for a time period of 2006 to 2016. We find that ensemble machine learning methods (Bagging, Gradient Boosting and Random Forest) strongly outperform other estimators, and Bagging particularly stands out in terms of accuracy. Traditional credit risk models using OLS techniques have the lowest out-of-sample prediction accuracy. The results suggest that the non-linear machine learning methods, especially the ensemble methods, add considerable value to existent credit risk prediction accuracy and enable CDS shadow pricing for companies missing those securities.

Download Full-text

A feature-based hybrid recommender system for risk prediction : Machine learning approach (Preprint)

10.2196/preprints.11010 ◽

2020 ◽

Author(s):

Uzair Bhatti

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Predictive Accuracy ◽

Correct Diagnosis ◽

Recommendation Systems ◽

Data Integrity ◽

Machine Learning Algorithms ◽

Patient Counseling ◽

Hybrid Filtering ◽

Novel Algorithm

BACKGROUND In the era of health informatics, exponential growth of information generated by health information systems and healthcare organizations demands expert and intelligent recommendation systems. It has become one of the most valuable tools as it reduces problems such as information overload while selecting and suggesting doctors, hospitals, medicine, diagnosis etc according to patients’ interests. OBJECTIVE Recommendation uses Hybrid Filtering as one of the most popular approaches, but the major limitations of this approach are selectivity and data integrity issues.Mostly existing recommendation systems & risk prediction algorithms focus on a single domain, on the other end cross-domain hybrid filtering is able to alleviate the degree of selectivity and data integrity problems to a better extent. METHODS We propose a novel algorithm for recommendation & predictive model using KNN algorithm with machine learning algorithms and artificial intelligence (AI). We find the factors that directly impact on diseases and propose an approach for predicting the correct diagnosis of different diseases. We have constructed a series of models with good reliability for predicting different surgery complications and identified several novel clinical associations. We proposed a novel algorithm pr-KNN to use KNN for prediction and recommendation of diseases RESULTS Beside that we compared the performance of our algorithm with other machine algorithms and found better performance of our algorithm, with predictive accuracy improving by +3.61%. CONCLUSIONS The potential to directly integrate these predictive tools into EHRs may enable personalized medicine and decision-making at the point of care for patient counseling and as a teaching tool. CLINICALTRIAL dataset for the trials of patient attached

Download Full-text

Machine Learning Approach for Predicting Lane-Change Maneuvers using the SHRP2 Naturalistic Driving Study Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211003581 ◽

2021 ◽

pp. 036119812110035

Author(s):

Anik Das ◽

Mohamed M. Ahmed

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lane Change ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Naturalistic Driving Study ◽

Naturalistic Driving ◽

Change Prediction

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.

Download Full-text

Internet financial supervision based on machine learning and improved neural network

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189555 ◽

2020 ◽

pp. 1-12

Author(s):

Cao Yanli

Keyword(s):

Neural Network ◽

Machine Learning ◽

Credit Risk ◽

Machine Learning Algorithms ◽

Financial Supervision ◽

Loan Pricing ◽

Comprehensive Management ◽

Risk Pricing ◽

Basel Agreement ◽

Relevant Factors

The research on the risk pricing of Internet finance online loans not only enriches the theory and methods of online loan pricing, but also helps to improve the level of online loan risk pricing. In order to improve the efficiency of Internet financial supervision, this article builds an Internet financial supervision system based on machine learning algorithms and improved neural network algorithms. Moreover, on the basis of factor analysis and discretization of loan data, this paper selects the relatively mature Logistic regression model to evaluate the credit risk of the borrower and considers the comprehensive management of credit risk and the matching with income. In addition, according to the relevant provisions of the New Basel Agreement on expected losses and economic capital, starting from the relevant factors, this article combines the credit risk assessment results to obtain relevant factors through regional research and conduct empirical analysis. The research results show that the model constructed in this paper has certain reliability.

Download Full-text

Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study (Preprint)

10.2196/preprints.16850 ◽

2019 ◽

Author(s):

Lei Zhang ◽

Xianwen Shang ◽

Subhashaan Sreedharan ◽

Xixi Yan ◽

Jianbin Liu ◽

...

Keyword(s):

Machine Learning ◽

Type 2 Diabetes ◽

Risk Prediction ◽

Machine Learning Algorithms ◽

Survey Study ◽

Gradient Boosting ◽

Diabetes Incidence ◽

Diabetes Onset ◽

Prediction Of Diabetes

BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (<i>P</i><.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.

Download Full-text

Predicting the Risk of Hypertension Based on Several Easy-to-Collect Risk Factors: A Machine Learning Method

Frontiers in Public Health ◽

10.3389/fpubh.2021.619429 ◽

2021 ◽

Vol 9 ◽

Author(s):

Huanhuan Zhao ◽

Xiaoyu Zhang ◽

Yang Xu ◽

Lisheng Gao ◽

Zuchang Ma ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Risk Prediction ◽

Disease Risk ◽

Learning Algorithms ◽

Large Population ◽

Machine Learning Algorithms ◽

Hypertension Risk ◽

Model Training

Hypertension is a widespread chronic disease. Risk prediction of hypertension is an intervention that contributes to the early prevention and management of hypertension. The implementation of such intervention requires an effective and easy-to-implement hypertension risk prediction model. This study evaluated and compared the performance of four machine learning algorithms on predicting the risk of hypertension based on easy-to-collect risk factors. A dataset of 29,700 samples collected through a physical examination was used for model training and testing. Firstly, we identified easy-to-collect risk factors of hypertension, through univariate logistic regression analysis. Then, based on the selected features, 10-fold cross-validation was utilized to optimize four models, random forest (RF), CatBoost, MLP neural network and logistic regression (LR), to find the best hyper-parameters on the training set. Finally, the performance of models was evaluated by AUC, accuracy, sensitivity and specificity on the test set. The experimental results showed that the RF model outperformed the other three models, and achieved an AUC of 0.92, an accuracy of 0.82, a sensitivity of 0.83 and a specificity of 0.81. In addition, Body Mass Index (BMI), age, family history and waist circumference (WC) are the four primary risk factors of hypertension. These findings reveal that it is feasible to use machine learning algorithms, especially RF, to predict hypertension risk without clinical or genetic data. The technique can provide a non-invasive and economical way for the prevention and management of hypertension in a large population.

Download Full-text

Implementing machine learning algorithms for suicide risk prediction in clinical practice: A focus group study

10.31234/osf.io/6m5qd ◽

2021 ◽

Author(s):

Kate Bentley ◽

Kelly Zuromski ◽

Rebecca Fortgang ◽

Emily Madsen ◽

Daniel Kessler ◽

...

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Clinical Practice ◽

Risk Prediction ◽

Suicide Risk ◽

Prediction Models ◽

Clinical Decision ◽

Machine Learning Algorithms ◽

Suicide Risk Assessment ◽

Focus Group Study

Background: Interest in developing machine learning algorithms that use electronic health record data to predict patients’ risk of suicidal behavior has recently proliferated. Whether and how such models might be implemented and useful in clinical practice, however, remains unknown. In order to ultimately make automated suicide risk prediction algorithms useful in practice, and thus better prevent patient suicides, it is critical to partner with key stakeholders (including the frontline providers who will be using such tools) at each stage of the implementation process.Objective: The aim of this focus group study was to inform ongoing and future efforts to deploy suicide risk prediction models in clinical practice. The specific goals were to better understand hospital providers’ current practices for assessing and managing suicide risk; determine providers’ perspectives on using automated suicide risk prediction algorithms; and identify barriers, facilitators, recommendations, and factors to consider for initiatives in this area. Methods: We conducted 10 two-hour focus groups with a total of 40 providers from psychiatry, internal medicine and primary care, emergency medicine, and obstetrics and gynecology departments within an urban academic medical center. Audio recordings of open-ended group discussions were transcribed and coded for relevant and recurrent themes by two independent study staff members. All coded text was reviewed and discrepancies resolved in consensus meetings with doctoral-level staff. Results: Though most providers reported using standardized suicide risk assessment tools in their clinical practices, existing tools were commonly described as unhelpful and providers indicated dissatisfaction with current suicide risk assessment methods. Overall, providers’ general attitudes toward the practical use of automated suicide risk prediction models and corresponding clinical decision support tools were positive. Providers were especially interested in the potential to identify high-risk patients who might be missed by traditional screening methods. Some expressed skepticism about the potential usefulness of these models in routine care; specific barriers included concerns about liability, alert fatigue, and increased demand on the healthcare system. Key facilitators included presenting specific patient-level features contributing to risk scores, emphasizing changes in risk over time, and developing systematic clinical workflows and provider trainings. Participants also recommended considering risk-prediction windows, timing of alerts, who will have access to model predictions, and variability across treatment settings.Conclusions: Providers were dissatisfied with current suicide risk assessment methods and open to the use of a machine learning-based risk prediction system to inform clinical decision-making. They also raised multiple concerns about potential barriers to the usefulness of this approach and suggested several possible facilitators. Future efforts in this area will benefit from incorporating systematic qualitative feedback from providers, patients, administrators, and payers on the use of new methods in routine care, especially given the complex, sensitive, and unfortunately still stigmatized nature of suicide risk.

Download Full-text

Machine learning models for prediction of postoperative ileus in patients underwent laparoscopic colorectal surgery

10.21203/rs.2.18587/v2 ◽

2020 ◽

Author(s):

Xueyan Li ◽

Genshan Ma ◽

Xiaobo Qian ◽

Yamou Wu ◽

Xiaochen Huang ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Colorectal Surgery ◽

Decision Tree ◽

Risk Prediction ◽

Postoperative Ileus ◽

Laparoscopic Colorectal Surgery ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Malignant Lesions

Abstract Background: We aimed to assess the performance of machine learning algorithms for the prediction of risk factors of postoperative ileus (POI) in patients underwent laparoscopic colorectal surgery for malignant lesions. Methods: We conducted analyses in a retrospective observational study with a total of 637 patients at Suzhou Hospital of Nanjing Medical University. Four machine learning algorithms (logistic regression, decision tree, random forest, gradient boosting decision tree) were considered to predict risk factors of POI. The total cases were randomly divided into training and testing data sets, with a ratio of 8:2. The performance of each model was evaluated by area under receiver operator characteristic curve (AUC), precision, recall and F1-score. Results: The morbidity of POI in this study was 19.15% (122/637). Gradient boosting decision tree reached the highest AUC (0.76) and was the best model for POI risk prediction. In addition, the results of the importance matrix of gradient boosting decision tree showed that the five most important variables were time to first passage of flatus, opioids during POD3, duration of surgery, height and weight. Conclusions: The gradient boosting decision tree was the optimal model to predict the risk of POI in patients underwent laparoscopic colorectal surgery for malignant lesions. And the results of our study could be useful for clinical guidelines in POI risk prediction.

Download Full-text

Combine Clustering and Machine Learning for Enhancing the Efficiency of Energy Baseline of Chiller System

Energies ◽

10.3390/en13174368 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4368 ◽

Cited By ~ 1

Author(s):

Chun-Wei Chen ◽

Chun-Chang Li ◽

Chen-Yu Lin

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Learning Models ◽

Important Method ◽

Gap Statistic ◽

Machine Learning Model ◽

Key Variables

Energy baseline is an important method for measuring the energy-saving benefits of chiller system, and the benefits can be calculated by comparing prediction models and actual results. Currently, machine learning is often adopted as a prediction model for energy baselines. Common models include regression, ensemble learning, and deep learning models. In this study, we first reviewed several machine learning algorithms, which were used to establish prediction models. Then, the concept of clustering to preprocess chiller data was adopted. Data mining, K-means clustering, and gap statistic were used to successfully identify the critical variables to cluster chiller modes. Applying these key variables effectively enhanced the quality of the chiller data, and combining the clustering results and the machine learning model effectively improved the prediction accuracy of the model and the reliability of the energy baselines.

Download Full-text