The Application of Tree-based model to Unbalanced German Credit Data Analysis

With the development of financial consumption, demand for credit has soared. Since the bank has detailed client data, it is important to build effective models to distinguish between high-risk groups and low-risk groups. However, traditional credit evaluation methods including expert opinion, credit rating and credit scoring are very subjective and inaccurate. Moreover, the data are highly unbalanced since the number of high-risk groups is significantly less than that of low-risk groups. Progress in machine learning makes it possible to conduct accurate credit analysis. The tree-based machine learning models are particularly suitable for the unbalanced credit data by weighting the credit individuals. We apply a series of tree-based machine learning models to analyze the German Credit Data from the UCI Repository of Machine Learning Databases.

Download Full-text

Implementing clinical decision support for oncology advanced care planning: A systems engineering framework to optimize the usability and utility of a machine learning predictive model in clinical practice.

Journal of Clinical Oncology ◽

10.1200/jco.2020.39.28_suppl.330 ◽

2021 ◽

Vol 39 (28_suppl) ◽

pp. 330-330

Author(s):

Teja Ganta ◽

Stephanie Lehrman ◽

Rachel Pappalardo ◽

Madalene Crow ◽

Meagan Will ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Predictive Model ◽

Systems Engineering ◽

Care Planning ◽

Learning Models ◽

Predictive Tool ◽

Risk Of Death ◽

The Impact ◽

Machine Learning Models

330 Background: Machine learning models are well-positioned to transform cancer care delivery by providing oncologists with more accurate or accessible information to augment clinical decisions. Many machine learning projects, however, focus on model accuracy without considering the impact of using the model in real-world settings and rarely carry forward to clinical implementation. We present a human-centered systems engineering approach to address clinical problems with workflow interventions utilizing machine learning algorithms. Methods: We aimed to develop a mortality predictive tool, using a Random Forest algorithm, to identify oncology patients at high risk of death within 30 days to move advance care planning (ACP) discussions earlier in the illness trajectory. First, a project sponsor defined the clinical need and requirements of an intervention. The data scientists developed the predictive algorithm using data available in the electronic health record (EHR). A multidisciplinary workgroup was assembled including oncology physicians, advanced practice providers, nurses, social workers, chaplain, clinical informaticists, and data scientists. Meeting bi-monthly, the group utilized human-centered design (HCD) methods to understand clinical workflows and identify points of intervention. The workgroup completed a workflow redesign workshop, a 90-minute facilitated group discussion, to integrate the model in a future state workflow. An EHR (Epic) analyst built the user interface to support the intervention per the group’s requirements. The workflow was piloted in thoracic oncology and bone marrow transplant with plans to scale to other cancer clinics. Results: Our predictive model performance on test data was acceptable (sensitivity 75%, specificity 75%, F-1 score 0.71, AUC 0.82). The workgroup identified a “quality of life coordinator” who: reviews an EHR report of patients scheduled in the upcoming 7 days who have a high risk of 30-day mortality; works with the oncology team to determine ACP clinical appropriateness; documents the need for ACP; identifies potential referrals to supportive oncology, social work, or chaplain; and coordinates the oncology appointment. The oncologist receives a reminder on the day of the patient’s scheduled visit. Conclusions: This workgroup is a viable approach that can be replicated at institutions to address clinical needs and realize the full potential of machine learning models in healthcare. The next steps for this project are to address end-user feedback from the pilot, expand the intervention to other cancer disease groups, and track clinical metrics.

Download Full-text

Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore

Applied Clinical Informatics ◽

10.1055/s-0041-1726422 ◽

2021 ◽

Vol 12 (02) ◽

pp. 372-382

Author(s):

Christine Xia Wu ◽

Ernest Suresh ◽

Francis Wei Loong Phng ◽

Kai Pik Tai ◽

Janthorn Pakdeethai ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Real Time ◽

Risk Score ◽

Patient Specific ◽

Learning Models ◽

Medicine Department ◽

High Risk Patients ◽

Risk Patients ◽

Machine Learning Models

Abstract Objective To develop a risk score for the real-time prediction of readmissions for patients using patient specific information captured in electronic medical records (EMR) in Singapore to enable the prospective identification of high-risk patients for enrolment in timely interventions. Methods Machine-learning models were built to estimate the probability of a patient being readmitted within 30 days of discharge. EMR of 25,472 patients discharged from the medicine department at Ng Teng Fong General Hospital between January 2016 and December 2016 were extracted retrospectively for training and internal validation of the models. We developed and implemented a real-time 30-day readmission risk score generation in the EMR system, which enabled the flagging of high-risk patients to care providers in the hospital. Based on the daily high-risk patient list, the various interfaces and flow sheets in the EMR were configured according to the information needs of the various stakeholders such as the inpatient medical, nursing, case management, emergency department, and postdischarge care teams. Results Overall, the machine-learning models achieved good performance with area under the receiver operating characteristic ranging from 0.77 to 0.81. The models were used to proactively identify and attend to patients who are at risk of readmission before an actual readmission occurs. This approach successfully reduced the 30-day readmission rate for patients admitted to the medicine department from 11.7% in 2017 to 10.1% in 2019 (p < 0.01) after risk adjustment. Conclusion Machine-learning models can be deployed in the EMR system to provide real-time forecasts for a more comprehensive outlook in the aspects of decision-making and care provision.

Download Full-text

Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study

PLoS Medicine ◽

10.1371/journal.pmed.1002701 ◽

2018 ◽

Vol 15 (11) ◽

pp. e1002701 ◽

Cited By ~ 37

Author(s):

Kristin M. Corey ◽

Sehj Kashyap ◽

Elizabeth Lorenzi ◽

Sandhya A. Lagoo-Deenadayalan ◽

Katherine Heller ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Electronic Health Record ◽

Surgical Patients ◽

Health Record ◽

Learning Models ◽

Electronic Health Record Data ◽

Record Data ◽

Development And Validation ◽

Machine Learning Models

Download Full-text

Review of Machine Learning models for Credit Scoring Analysis

Ingeniería solidaria ◽

10.16925/2357-6014.2020.01.11 ◽

2020 ◽

Vol 16 (1) ◽

Author(s):

Madapuri Rudra Kumar ◽

Vinit Kumar Gunjan

Keyword(s):

Machine Learning ◽

Financial Institutions ◽

Profile Analysis ◽

Credit Scoring ◽

Process Models ◽

Machine Learning Techniques ◽

Process Conditions ◽

Learning Models ◽

Credit Score ◽

Machine Learning Models

Introduction:Increase in computing power and the deeper usage of the robust computing systems in the financial system is propelling the business growth, improving the operational efficiency of the financial institutions, and increasing the effectiveness of the transaction processing solutions used by the organizations. Problem:Despite that the financial institutions are relying on the credit scoring patterns for analyzing the credit worthiness of the clients, still there are many factors that are imminent for improvement in the credit score evaluation patterns. Objective:Machine learning is offering immense potential in Fintech space and determining a personal credit score. Organizations by applying deep learning and machine learning techniques can tap individuals who are not being serviced by traditional financial institutions. Methodology:One of the major insights into the system is that the traditional models of banking intelligence solutions are predominantly the programmed models that can align with the information and banking systems that are used by the banks. But in the case of the machine-learning models that rely on algorithmic systems require more integral computation which is intrinsic. Results:The test analysis of the proposed machine learning model indicates effective and enhanced analysis process compared to the non-machine learning solutions. The model in terms of using various classifiers indicate potential ways in which the solution can be significant. Conclusion: If the systems can be developed to align with more pragmatic terms for analysis, it can help in improving the process conditions of customer profile analysis, wherein the process models have to be developed for comprehensive analysis and the ones that can make a sustainable solution for the credit system management. Originality:The proposed solution is effective and the one conceptualized to improve the credit scoring system patterns. Limitations: The model is tested in isolation and not in comparison to any of the existing credit scoring patterns.

Download Full-text

Patient-Level Prediction of Cardio-Cerebrovascular Events in Hypertension Using Nationwide Claims Data (Preprint)

10.2196/preprints.11757 ◽

2018 ◽

Author(s):

Jaram Park ◽

Jeong-Whun Kim ◽

Borim Ryu ◽

Eunyoung Heo ◽

Se Young Jung ◽

...

Keyword(s):

Machine Learning ◽

Health Care ◽

High Risk ◽

Machine Learning Algorithms ◽

Risk Level ◽

Care Providers ◽

Learning Models ◽

Health Maintenance ◽

External Test ◽

Machine Learning Models

BACKGROUND Prevention and management of chronic diseases are the main goals of national health maintenance programs. Previously widely used screening tools, such as Health Risk Appraisal, are restricted in their achievement this goal due to their limitations, such as static characteristics, accessibility, and generalizability. Hypertension is one of the most important chronic diseases requiring management via the nationwide health maintenance program, and health care providers should inform patients about their risks of a complication caused by hypertension. OBJECTIVE Our goal was to develop and compare machine learning models predicting high-risk vascular diseases for hypertensive patients so that they can manage their blood pressure based on their risk level. METHODS We used a 12-year longitudinal dataset of the nationwide sample cohort, which contains the data of 514,866 patients and allows tracking of patients’ medical history across all health care providers in Korea (N=51,920). To ensure the generalizability of our models, we conducted an external validation using another national sample cohort dataset, comprising one million different patients, published by the National Health Insurance Service. From each dataset, we obtained the data of 74,535 and 59,738 patients with essential hypertension and developed machine learning models for predicting cardiovascular and cerebrovascular events. Six machine learning models were developed and compared for evaluating performances based on validation metrics. RESULTS Machine learning algorithms enabled us to detect high-risk patients based on their medical history. The long short-term memory-based algorithm outperformed in the within test (F1-score=.772, external test F1-score=.613), and the random forest-based algorithm of risk prediction showed better performance over other machine learning algorithms concerning generalization (within test F1-score=.757, external test F1-score=.705). Concerning the number of features, in the within test, the long short-term memory-based algorithms outperformed regardless of the number of features. However, in the external test, the random forest-based algorithm was the best, irrespective of the number of features it encountered. CONCLUSIONS We developed and compared machine learning models predicting high-risk vascular diseases in hypertensive patients so that they may manage their blood pressure based on their risk level. By relying on the prediction model, a government can predict high-risk patients at the nationwide level and establish health care policies in advance.

Download Full-text

Transparency, auditability, and explainability of machine learning models in credit scoring

Journal of the Operational Research Society ◽

10.1080/01605682.2021.1922098 ◽

2021 ◽

pp. 1-21

Author(s):

Michael Bücker ◽

Gero Szepannek ◽

Alicja Gosiewska ◽

Przemyslaw Biecek

Keyword(s):

Machine Learning ◽

Credit Scoring ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Prognostic Machine Learning Models for First-Year Mortality in Incident Hemodialysis Patients: Development and Validation Study (Preprint)

10.2196/preprints.20578 ◽

2020 ◽

Author(s):

Kaixiang Sheng ◽

Ping Zhang ◽

Xi Yao ◽

Jiawei Li ◽

Yongchun He ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Dialysis Initiation ◽

Gradient Boosting ◽

First Year ◽

Learning Models ◽

High Risk Patients ◽

Extreme Gradient Boosting ◽

Risk Patients ◽

Machine Learning Models

BACKGROUND The first-year survival rate among patients undergoing hemodialysis remains poor. Current mortality risk scores for patients undergoing hemodialysis employ regression techniques and have limited applicability and robustness. OBJECTIVE We aimed to develop a machine learning model utilizing clinical factors to predict first-year mortality in patients undergoing hemodialysis that could assist physicians in classifying high-risk patients. METHODS Training and testing cohorts consisted of 5351 patients from a single center and 5828 patients from 97 renal centers undergoing hemodialysis (incident only). The outcome was all-cause mortality during the first year of dialysis. Extreme gradient boosting was used for algorithm training and validation. Two models were established based on the data obtained at dialysis initiation (model 1) and data 0-3 months after dialysis initiation (model 2), and 10-fold cross-validation was applied to each model. The area under the curve (AUC), sensitivity (recall), specificity, precision, balanced accuracy, and F1 score were used to assess the predictive ability of the models. RESULTS In the training and testing cohorts, 585 (10.93%) and 764 (13.11%) patients, respectively, died during the first-year follow-up. Of 42 candidate features, the 15 most important features were selected. The performance of model 1 (AUC 0.83, 95% CI 0.78-0.84) was similar to that of model 2 (AUC 0.85, 95% CI 0.81-0.86). CONCLUSIONS We developed and validated 2 machine learning models to predict first-year mortality in patients undergoing hemodialysis. Both models could be used to stratify high-risk patients at the early stages of dialysis.

Download Full-text

Machine Learning Models for Image-Based Diagnosis and Prognosis of COVID-19: Systematic Review (Preprint)

10.2196/preprints.25181 ◽

2020 ◽

Author(s):

Mahdieh Montazeri ◽

Roxana ZahediNasab ◽

Ali Farahani ◽

Hadis Mohseni ◽

Fahimeh Ghasemian

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Health Care ◽

High Risk ◽

Convolutional Neural Networks ◽

Risk Of Bias ◽

Learning Models ◽

Learning Methods ◽

Diagnosis And Prognosis ◽

Machine Learning Models

BACKGROUND Accurate and timely diagnosis and effective prognosis of the disease is important to provide the best possible care for patients with COVID-19 and reduce the burden on the health care system. Machine learning methods can play a vital role in the diagnosis of COVID-19 by processing chest x-ray images. OBJECTIVE The aim of this study is to summarize information on the use of intelligent models for the diagnosis and prognosis of COVID-19 to help with early and timely diagnosis, minimize prolonged diagnosis, and improve overall health care. METHODS A systematic search of databases, including PubMed, Web of Science, IEEE, ProQuest, Scopus, bioRxiv, and medRxiv, was performed for COVID-19–related studies published up to May 24, 2020. This study was performed in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guidelines. All original research articles describing the application of image processing for the prediction and diagnosis of COVID-19 were considered in the analysis. Two reviewers independently assessed the published papers to determine eligibility for inclusion in the analysis. Risk of bias was evaluated using the Prediction Model Risk of Bias Assessment Tool. RESULTS Of the 629 articles retrieved, 44 articles were included. We identified 4 prognosis models for calculating prediction of disease severity and estimation of confinement time for individual patients, and 40 diagnostic models for detecting COVID-19 from normal or other pneumonias. Most included studies used deep learning methods based on convolutional neural networks, which have been widely used as a classification algorithm. The most frequently reported predictors of prognosis in patients with COVID-19 included age, computed tomography data, gender, comorbidities, symptoms, and laboratory findings. Deep convolutional neural networks obtained better results compared with non–neural network–based methods. Moreover, all of the models were found to be at high risk of bias due to the lack of information about the study population, intended groups, and inappropriate reporting. CONCLUSIONS Machine learning models used for the diagnosis and prognosis of COVID-19 showed excellent discriminative performance. However, these models were at high risk of bias, because of various reasons such as inadequate information about study participants, randomization process, and the lack of external validation, which may have resulted in the optimistic reporting of these models. Hence, our findings do not recommend any of the current models to be used in practice for the diagnosis and prognosis of COVID-19.

Download Full-text

Development of a Clinical Decision Support System for Severity Risk Prediction and Triage of COVID-19 Patients at Hospital Admission: an International Multicenter Study

10.1101/2020.05.01.20053413 ◽

2020 ◽

Author(s):

Guangyao Wu ◽

Pei Yang ◽

Henry C. Woodruff ◽

Xiangang Rao ◽

Julien Guiot ◽

...

Keyword(s):

Machine Learning ◽

Critical Illness ◽

High Risk ◽

Hospital Admission ◽

Risk Prediction ◽

Retrospective Cohort ◽

Interquartile Range ◽

Learning Models ◽

Medical Resources ◽

Machine Learning Models

Key pointsQuestionHow do nomograms and machine-learning algorithms of severity risk prediction and triage of COVID-19 patients at hospital admission perform?FindingsThis model was prospectively validated on six test datasets comprising of 426 patients and yielded AUCs ranging from 0.816 to 0.976, accuracies ranging from 70.8% to 93.8%, sensitivities ranging from 83.7% to 100%, and specificities ranging from 41.0% to 95.7%. The cut-off probability values for low, medium, and high-risk groups were 0.072 and 0.244.MeaningThe findings of this study suggest that our models performs well for the diagnosis and prediction of progression to severe or critical illness of COVID-19 patients and could be used for triage of COVID-19 patients at hospital admission.IMPORTANCEThe outbreak of the coronavirus disease 2019 (COVID-19) has globally strained medical resources and caused significant mortality for severely and critically ill patients. However, the availability of validated nomograms and the machine-learning model to predict severity risk and triage of affected patients is limited.OBJECTIVETo develop and validate nomograms and machine-learning models for severity risk assessment and triage for COVID-19 patients at hospital admission.DESIGN, SETTING, AND PARTICIPANTSA retrospective cohort of 299 consecutively hospitalized COVID-19 patients at The Central Hospital of Wuhan, China, from December 23, 2019, to February 13, 2020, was used to train and validate the models. Six cohorts with 426 patients from eight centers in China, Italy, and Belgium, from February 20, 2020, to March 21, 2020, were used to prospectively validate the models.MAIN OUTCOME AND MEASURESThe main outcome was the onset of severe or critical illness during hospitalization. Model performances were quantified using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.RESULTSOf the 299 hospitalized COVID-19 patients in the retrospective cohort, the median age was 50 years ((interquartile range, 35.5-63.0; range, 20–94 years) and 137 (45.8%) were men. Of the 426 hospitalized COVID-19 patients in the prospective cohorts, the median age was 62.0 years ((interquartile range, 50.0-72.0; range, 19-94 years) and 236 (55.4%) were men. The model was prospectively validated on six cohorts yielding AUCs ranging from 0.816 to 0.976, with accuracies ranging from 70.8% to 93.8%, sensitivities ranging from 83.7% to 100%, and specificities ranging from 41.0% to 95.7%. The cut-off values of the low, medium, and high-risk probabilities were 0.072 and 0.244. The developed online calculators can be found at https://covid19risk.ai/.CONCLUSION AND RELEVANCEThe machine learning models, nomograms, and online calculators might be useful for the prediction of onset of severe and critical illness among COVID-19 patients and triage at hospital admission. Further prospective research and clinical feedback are necessary to evaluate the clinical usefulness of this model and to determine whether these models can help optimize medical resources and reduce mortality rates compared with current clinical practices.

Download Full-text

CT imaging-based machine learning model: a potential modality for predicting low-risk and high-risk groups of thymoma: “Impact of surgical modality choice”

World Journal of Surgical Oncology ◽

10.1186/s12957-021-02259-6 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Ayten Kayi Cangir ◽

Kaan Orhan ◽

Yusuf Kahya ◽

Hilal Özakıncı ◽

Betül Bahar Kazak ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Risk Groups ◽

Learning Model ◽

Low Risk ◽

Pathology Report ◽

Histopathological Diagnosis ◽

Gray Level ◽

Long Run ◽

Machine Learning Model

Abstract Introduction Radiomics methods are used to analyze various medical images, including computed tomography (CT), magnetic resonance, and positron emission tomography to provide information regarding the diagnosis, patient outcome, tumor phenotype, and the gene-protein signatures of various diseases. In low-risk group, complete surgical resection is typically sufficient, whereas in high-risk thymoma, adjuvant therapy is usually required. Therefore, it is important to distinguish between both. This study evaluated the CT radiomics features of thymomas to discriminate between low- and high-risk thymoma groups. Materials and methods In total, 83 patients with thymoma were included in this study between 2004 and 2019. We used the Radcloud platform (Huiying Medical Technology Co., Ltd.) to manage the imaging and clinical data and perform the radiomics statistical analysis. The training and validation datasets were separated by a random method with a ratio of 2:8 and 502 random seeds. The histopathological diagnosis was noted from the pathology report. Results Four machine-learning radiomics features were identified to differentiate a low-risk thymoma group from a high-risk thymoma group. The radiomics feature names were Energy, Zone Entropy, Long Run Low Gray Level Emphasis, and Large Dependence Low Gray Level Emphasis. Conclusions The results demonstrated that a machine-learning model and a multilayer perceptron classifier analysis can be used on CT images to predict low- and high-risk thymomas. This combination could be a useful preoperative method to determine the surgical approach for thymoma.

Download Full-text