Improving Explainability of Major Risk Factors in Artificial Neural Networks for Auto Insurance Rate Regulation

In insurance rate-making, the use of statistical machine learning techniques such as artificial neural networks (ANN) is an emerging approach, and many insurance companies have been using them for pricing. However, due to the complexity of model specification and its implementation, model explainability may be essential to meet insurance pricing transparency for rate regulation purposes. This requirement may imply the need for estimating or evaluating the variable importance when complicated models are used. Furthermore, from both rate-making and rate-regulation perspectives, it is critical to investigate the impact of major risk factors on the response variables, such as claim frequency or claim severity. In this work, we consider the modelling problems of how claim counts, claim amounts and average loss per claim are related to major risk factors. ANN models are applied to meet this goal, and variable importance is measured to improve the model’s explainability due to the models’ complex nature. The results obtained from different variable importance measurements are compared, and dominant risk factors are identified. The contribution of this work is in making advanced mathematical models possible for applications in auto insurance rate regulation. This study focuses on analyzing major risks only, but the proposed method can be applied to more general insurance pricing problems when additional risk factors are being considered. In addition, the proposed methodology is useful for other business applications where statistical machine learning techniques are used.

Download Full-text

FRI0585 HIGH-THROUGHPUT METHODOLOGY FOR EMR-BASED IDENTIFICATION OF CLINICAL SUB-PHENOTYPES IN COMPLEX PATIENT POPULATIONS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3489 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 897.2-897

Author(s):

M. Maurits ◽

T. Huizinga ◽

M. Reinders ◽

S. Raychaudhuri ◽

E. Karlson ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Dimensionality Reduction ◽

High Throughput ◽

Brain Cancer ◽

Machine Learning Techniques ◽

Summary Statistics ◽

Medical Problems ◽

Learning Techniques ◽

Icd Codes

Background:Heterogeneity in disease populations complicates discovery of risk factors. To identify risk factors for subpopulations of diseases, we need analytical methods that can deal with unidentified disease subgroups.Objectives:Inspired by successful approaches from the Big Data field, we developed a high-throughput approach to identify subpopulations within patients with heterogeneous, complex diseases using the wealth of information available in Electronic Medical Records (EMRs).Methods:We extracted longitudinal healthcare-interaction records coded by 1,853 PheCodes[1] of the 64,819 patients from the Boston’s Partners-Biobank. Through dimensionality reduction using t-SNE[2] we created a 2D embedding of 32,424 of these patients (set A). We then identified distinct clusters post-t-SNE using DBscan[3] and visualized the relative importance of individual PheCodes within them using specialized spectrographs. We replicated this procedure in the remaining 32,395 records (set B).Results:Summary statistics of both sets were comparable (Table 1).Table 1.Summary statistics of the total Partners Biobank dataset and the 2 partitions.Set-Aset-BTotalEntries12,200,31112,177,13124,377,442Patients32,42432,39564,819Patientyears369,546.33368,597.92738,144.2unique ICD codes25,05624,95326,305unique Phecodes1,8511,8531,853We found 284 clusters in set A and 295 in set B, of which 63.4% from set A could be mapped to a cluster in set B with a median (range) correlation of 0.24 (0.03 – 0.58).Clusters represented similar yet distinct clinical phenotypes; e.g. patients diagnosed with “other headache syndrome” were separated into four distinct clusters characterized by migraines, neurofibromatosis, epilepsy or brain cancer, all resulting in patients presenting with headaches (Fig. 1 & 2). Though EMR databases tend to be noisy, our method was also able to differentiate misclassification from true cases; SLE patients with RA codes clustered separately from true RA cases.Figure 1.Two dimensional representation of Set A generated using dimensionality reduction (tSNE) and clustering (DBScan).Figure 2.Phenotype Spectrographs (PheSpecs) of four clusters characterized by “Other headache syndromes”, driven by codes relating to migraine, epilepsy, neurofibromatosis or brain cancer.Conclusion:We have shown that EMR data can be used to identify and visualize latent structure in patient categorizations, using an approach based on dimension reduction and clustering machine learning techniques. Our method can identify misclassified patients as well as separate patients with similar problems into subsets with different associated medical problems. Our approach adds a new and powerful tool to aid in the discovery of novel risk factors in complex, heterogeneous diseases.References:[1] Denny, J.C. et al. Bioinformatics (2010)[2]van der Maaten et al. Journal of Machine Learning Research (2008)[3] Ester, M. et al. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. (1996)Disclosure of Interests:Marc Maurits: None declared, Thomas Huizinga Grant/research support from: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Consultant of: Ablynx, Bristol-Myers Squibb, Roche, Sanofi, Marcel Reinders: None declared, Soumya Raychaudhuri: None declared, Elizabeth Karlson: None declared, Erik van den Akker: None declared, Rachel Knevel: None declared

Download Full-text

Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance

2020 23rd International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccit51783.2020.9392694 ◽

2020 ◽

Author(s):

Kazi Amit Hasan ◽

Md. Al Mehedi Hasan

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Class Imbalance ◽

Clinical Risk Factors ◽

Machine Learning Techniques ◽

Clinical Risk ◽

Learning Techniques

Download Full-text

Variable Importance Analysis in Default Prediction using Machine Learning Techniques

Proceedings of the 7th International Conference on Data Science, Technology and Applications ◽

10.5220/0006872400560062 ◽

2018 ◽

Author(s):

Başak Gültekin ◽

Betül Erdoğdu Şakar

Keyword(s):

Machine Learning ◽

Variable Importance ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Default Prediction ◽

Importance Analysis

Download Full-text

Survival prediction models since liver transplantation - comparisons between Cox models and machine learning techniques

BMC Medical Research Methodology ◽

10.1186/s12874-020-01153-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Georgios Kantidakis ◽

Hein Putter ◽

Carlo Lancia ◽

Jacob de Boer ◽

Andries E. Braat ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neural Networks ◽

Liver Transplantation ◽

Prediction Models ◽

Machine Learning Techniques ◽

Brier Score ◽

Cox Models ◽

Learning Techniques ◽

Random Survival Forest

Abstract Background Predicting survival of recipients after liver transplantation is regarded as one of the most important challenges in contemporary medicine. Hence, improving on current prediction models is of great interest.Nowadays, there is a strong discussion in the medical field about machine learning (ML) and whether it has greater potential than traditional regression models when dealing with complex data. Criticism to ML is related to unsuitable performance measures and lack of interpretability which is important for clinicians. Methods In this paper, ML techniques such as random forests and neural networks are applied to large data of 62294 patients from the United States with 97 predictors selected on clinical/statistical grounds, over more than 600, to predict survival from transplantation. Of particular interest is also the identification of potential risk factors. A comparison is performed between 3 different Cox models (with all variables, backward selection and LASSO) and 3 machine learning techniques: a random survival forest and 2 partial logistic artificial neural networks (PLANNs). For PLANNs, novel extensions to their original specification are tested. Emphasis is given on the advantages and pitfalls of each method and on the interpretability of the ML techniques. Results Well-established predictive measures are employed from the survival field (C-index, Brier score and Integrated Brier Score) and the strongest prognostic factors are identified for each model. Clinical endpoint is overall graft-survival defined as the time between transplantation and the date of graft-failure or death. The random survival forest shows slightly better predictive performance than Cox models based on the C-index. Neural networks show better performance than both Cox models and random survival forest based on the Integrated Brier Score at 10 years. Conclusion In this work, it is shown that machine learning techniques can be a useful tool for both prediction and interpretation in the survival context. From the ML techniques examined here, PLANN with 1 hidden layer predicts survival probabilities the most accurately, being as calibrated as the Cox model with all variables. Trial registration Retrospective data were provided by the Scientific Registry of Transplant Recipients under Data Use Agreement number 9477 for analysis of risk factors after liver transplantation.

Download Full-text

Classification of Neurodegenerative Disorders Based on Major Risk Factors Employing Machine Learning Techniques

International Journal of Engineering and Technology ◽

10.7763/ijet.2010.v2.146 ◽

2010 ◽

Vol 2 (4) ◽

pp. 350-355 ◽

Cited By ~ 5

Author(s):

Sandhya Joshi ◽

P. Deepa Shenoy ◽

Vibhudendra Simha G.G. ◽

Venugopal K. R ◽

L.M. Patnaik

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neurodegenerative Disorders ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Application of Machine Learning Techniques to Identify Data Reliability and Factors Affecting Outcome After Stroke Using Electronic Administrative Records

Frontiers in Neurology ◽

10.3389/fneur.2021.670379 ◽

2021 ◽

Vol 12 ◽

Author(s):

Santu Rana ◽

Wei Luo ◽

Truyen Tran ◽

Svetha Venkatesh ◽

Paul Talman ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Ischemic Stroke ◽

Machine Learning Techniques ◽

Discharge Destination ◽

Clinical Factors ◽

Administrative Records ◽

Factors Associated ◽

Learning Techniques ◽

Discharge Outcomes

Aim: To use available electronic administrative records to identify data reliability, predict discharge destination, and identify risk factors associated with specific outcomes following hospital admission with stroke, compared to stroke specific clinical factors, using machine learning techniques.Method: The study included 2,531 patients having at least one admission with a confirmed diagnosis of stroke, collected from a regional hospital in Australia within 2009–2013. Using machine learning (penalized regression with Lasso) techniques, patients having their index admission between June 2009 and July 2012 were used to derive predictive models, and patients having their index admission between July 2012 and June 2013 were used for validation. Three different stroke types [intracerebral hemorrhage (ICH), ischemic stroke, transient ischemic attack (TIA)] were considered and five different comparison outcome settings were considered. Our electronic administrative record based predictive model was compared with a predictive model composed of “baseline” clinical features, more specific for stroke, such as age, gender, smoking habits, co-morbidities (high cholesterol, hypertension, atrial fibrillation, and ischemic heart disease), types of imaging done (CT scan, MRI, etc.), and occurrence of in-hospital pneumonia. Risk factors associated with likelihood of negative outcomes were identified.Results: The data was highly reliable at predicting discharge to rehabilitation and all other outcomes vs. death for ICH (AUC 0.85 and 0.825, respectively), all discharge outcomes except home vs. rehabilitation for ischemic stroke, and discharge home vs. others and home vs. rehabilitation for TIA (AUC 0.948 and 0.873, respectively). Electronic health record data appeared to provide improved prediction of outcomes over stroke specific clinical factors from the machine learning models. Common risk factors associated with a negative impact on expected outcomes appeared clinically intuitive, and included older age groups, prior ventilatory support, urinary incontinence, need for imaging, and need for allied health input.Conclusion: Electronic administrative records from this cohort produced reliable outcome prediction and identified clinically appropriate factors negatively impacting most outcome variables following hospital admission with stroke. This presents a means of future identification of modifiable factors associated with patient discharge destination. This may potentially aid in patient selection for certain interventions and aid in better patient and clinician education regarding expected discharge outcomes.

Download Full-text

Analytical Statistics Techniques of Classification and Regression in Machine Learning

10.5772/intechopen.84922 ◽

2020 ◽

Author(s):

Pramod Kumar ◽

Sameer Ambekar ◽

Manish Kumar ◽

Subarna Roy

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Machine Learning Techniques ◽

Statistical Machine Learning ◽

Building Models ◽

Learning Techniques ◽

Implementation Techniques ◽

Regression Techniques ◽

The Common ◽

Classification And Regression

This chapter aims to introduce the common methods and practices of statistical machine learning techniques. It contains the development of algorithms, applications of algorithms and also the ways by which they learn from the observed data by building models. In turn, these models can be used to predict. Although one assumes that machine learning and statistics are not quite related to each other, it is evident that machine learning and statistics go hand in hand. We observe how the methods used in statistics such as linear regression and classification are made use of in machine learning. We also take a look at the implementation techniques of classification and regression techniques. Although machine learning provides standard libraries to implement tons of algorithms, we take a look on how to tune the algorithms and what parameters of the algorithm or the features of the algorithm affect the performance of the algorithm based on the statistical methods.

Download Full-text

Development of a Simulation Prediction System Using Statistical Machine Learning Techniques

KIPS Transactions on Software and Data Engineering ◽

10.3745/ktsde.2016.5.11.593 ◽

2016 ◽

Vol 5 (11) ◽

pp. 593-606

Author(s):

Ki Yong Lee ◽

YoonJae Shin ◽

YeonJeong Choe ◽

SeonJeong Kim ◽

Young-Kyoon Suh ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Prediction System ◽

Statistical Machine Learning ◽

Learning Techniques

Download Full-text

Determining the parameters of high amplification microlensing events by means of statistical machine learning techniques

Proceedings of the International Astronomical Union ◽

10.1017/s1743921316012977 ◽

2016 ◽

Vol 12 (S325) ◽

pp. 213-216

Author(s):

Elena Fedorova

Keyword(s):

Machine Learning ◽

Dark Matter ◽

Machine Learning Techniques ◽

Statistical Machine Learning ◽

Density Profiles ◽

Important Clue ◽

Gravitational Microlensing ◽

Learning Techniques

AbstractStrong gravitational microlensing (GM) events provide us a possibility to determine both the parameters of microlensed source and microlens. GM can be an important clue to understand the nature of dark matter on comparably small spatial and mass scales (i.e. substructure), especially when speaking about the combination of astrometrical and photometrical data about high amplification microlensing events (HAME). In the same time, fitting of HAME lightcurves of microlensed sources is quite time-consuming process. That is why we test here the possibility to apply the statistical machine learning techniques to determine the source and microlens parameters for the set of HAME lightcurves, using the simulated set of amplification curves of sources microlensed by point masses and clumps of DM with various density profiles.

Download Full-text