A Study of Dimensionality Reduction Techniques with Machine Learning Methods for Credit Risk Prediction

Author(s):  
E. Sivasankar ◽  
C. Selvi ◽  
C. Mala
2021 ◽  
Vol 15 ◽  
Author(s):  
Jinyu Zang ◽  
Yuanyuan Huang ◽  
Lingyin Kong ◽  
Bingye Lei ◽  
Pengfei Ke ◽  
...  

Recently, machine learning techniques have been widely applied in discriminative studies of schizophrenia (SZ) patients with multimodal magnetic resonance imaging (MRI); however, the effects of brain atlases and machine learning methods remain largely unknown. In this study, we collected MRI data for 61 first-episode SZ patients (FESZ), 79 chronic SZ patients (CSZ) and 205 normal controls (NC) and calculated 4 MRI measurements, including regional gray matter volume (GMV), regional homogeneity (ReHo), amplitude of low-frequency fluctuation and degree centrality. We systematically analyzed the performance of two classifications (SZ vs NC; FESZ vs CSZ) based on the combinations of three brain atlases, five classifiers, two cross validation methods and 3 dimensionality reduction algorithms. Our results showed that the groupwise whole-brain atlas with 268 ROIs outperformed the other two brain atlases. In addition, the leave-one-out cross validation was the best cross validation method to select the best hyperparameter set, but the classification performances by different classifiers and dimensionality reduction algorithms were quite similar. Importantly, the contributions of input features to both classifications were higher with the GMV and ReHo features of brain regions in the prefrontal and temporal gyri. Furthermore, an ensemble learning method was performed to establish an integrated model, in which classification performance was improved. Taken together, these findings indicated the effects of these factors in constructing effective classifiers for psychiatric diseases and showed that the integrated model has the potential to improve the clinical diagnosis and treatment evaluation of SZ.


2019 ◽  
Vol 22 (03) ◽  
pp. 1950021 ◽  
Author(s):  
Huei-Wen Teng ◽  
Michael Lee

Machine learning has successful applications in credit risk management, portfolio management, automatic trading, and fraud detection, to name a few, in the domain of finance technology. Reformulating and solving these topics adequately and accurately is problem specific and challenging along with the availability of complex and voluminous data. In credit risk management, one major problem is to predict the default of credit card holders using real dataset. We review five machine learning methods: the [Formula: see text]-nearest neighbors decision trees, boosting, support vector machine, and neural networks, and apply them to the above problem. In addition, we give explicit Python scripts to conduct analysis using a dataset of 29,999 instances with 23 features collected from a major bank in Taiwan, downloadable in the UC Irvine Machine Learning Repository. We show that the decision tree performs best among others in terms of validation curves.


Entropy ◽  
2016 ◽  
Vol 18 (5) ◽  
pp. 195 ◽  
Author(s):  
You Zhu ◽  
Chi Xie ◽  
Gang-Jin Wang ◽  
Xin-Guo Yan

Cutting edge improved techniques gave greater values to Artificial Intelligence (AI) and Machine Learning (ML) which are becoming a part of interest rapidly for numerous types of researches presently. Clustering and Dimensionality Reduction Techniques are one of the trending methods utilized in Machine Learning these days. Fundamentally clustering techniques such as K-means and Hierarchical is utilized to predict the data and put it into the required group in a cluster format. Clustering can be utilized in recommendation frameworks, examination of clients related to social media platforms, patients related to particular diseases of specific age groups can be categorized, etc. While most aspects of the dimensionality lessening method such as Principal Component Analysis and Linear Discriminant Analysis are a bit like the clustering method but it decreases the data size and plots the cluster. In this paper, a comparative and predictive analysis is done utilizing three different datasets namely IRIS, Wine, and Seed from the UCI benchmark in Machine learning on four distinctive techniques. The class prediction analysis of the dataset is done employing a flask-app. The main aim is to form a good clustering pattern for each dataset for given techniques. The experimental analysis calculates the accuracy of the shaped clusters used different machine learning classifiers namely Logistic Regression, K-nearest neighbors, Support Vector Machine, Gaussian Naïve Bayes, Decision Tree Classifier, and Random Forest Classifier. Cohen Kappa is another accuracy indicator used to compare the obtained classification result. It is observed that Kmeans and Hierarchical clustering analysis provide a good clustering pattern of the input dataset than the dimensionality reduction techniques. Clustering Design is well-formed in all the techniques. The KNN classifier provides an improved accuracy in all the techniques of the dataset.


2014 ◽  
Vol 2 (40) ◽  
pp. 1-48 ◽  
Author(s):  
Alex Bottle ◽  
Rene Gaudoin ◽  
Rosalind Goudie ◽  
Simon Jones ◽  
Paul Aylin

BackgroundNHS hospitals collect a wealth of administrative data covering accident and emergency (A&E) department attendances, inpatient and day case activity, and outpatient appointments. Such data are increasingly being used to compare units and services, but adjusting for risk is difficult.ObjectivesTo derive robust risk-adjustment models for various patient groups, including those admitted for heart failure (HF), acute myocardial infarction, colorectal and orthopaedic surgery, and outcomes adjusting for available patient factors such as comorbidity, using England’s Hospital Episode Statistics (HES) data. To assess if more sophisticated statistical methods based on machine learning such as artificial neural networks (ANNs) outperform traditional logistic regression (LR) for risk prediction. To update and assess for the NHS the Charlson index for comorbidity. To assess the usefulness of outpatient data for these models.Main outcome measuresMortality, readmission, return to theatre, outpatient non-attendance. For HF patients we considered various readmission measures such as diagnosis-specific and total within a year.MethodsWe systematically reviewed studies comparing two or more comorbidity indices. Logistic regression, ANNs, support vector machines and random forests were compared for mortality and readmission. Models were assessed using discrimination and calibration statistics. Competing risks proportional hazards regression and various count models were used for future admissions and bed-days.ResultsOur systematic review and empirical analysis suggested that for general purposes comorbidity is currently best described by the set of 30 Elixhauser comorbidities plus dementia. Model discrimination was often high for mortality and poor, or at best moderate, for other outcomes, for examplec = 0.62 for readmission andc = 0.73 for death following stroke. Calibration was often good for procedure groups but poorer for diagnosis groups, with overprediction of low risk a common cause. The machine learning methods we investigated offered little beyond LR for their greater complexity and implementation difficulties. For HF, some patient-level predictors differed by primary diagnosis of readmission but not by length of follow-up. Prior non-attendance at outpatient appointments was a useful, strong predictor of readmission. Hospital-level readmission rates for HF did not correlate with readmission rates for non-HF; hospital performance on national audit process measures largely correlated only with HF readmission rates.ConclusionsMany practical risk-prediction or casemix adjustment models can be generated from HES data using LR, though an extra step is often required for accurate calibration. Including outpatient data in readmission models is useful. The three machine learning methods we assessed added little with these data. Readmission rates for HF patients should be divided by diagnosis on readmission when used for quality improvement.Future workAs HES data continue to develop and improve in scope and accuracy, they can be used more, for instance A&E records. The return to theatre metric appears promising and could be extended to other index procedures and specialties. While our data did not warrant the testing of a larger number of machine learning methods, databases augmented with physiological and pathology information, for example, might benefit from methods such as boosted trees. Finally, one could apply the HF readmissions analysis to other chronic conditions.FundingThe National Institute for Health Research Health Services and Delivery Research programme.


Sign in / Sign up

Export Citation Format

Share Document