scholarly journals Predicting Motor Insurance Claims Using Telematics Data—XGBoost versus Logistic Regression

Risks ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 70 ◽  
Author(s):  
Jessica Pesantez-Narvaez ◽  
Montserrat Guillen ◽  
Manuela Alcañiz

XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims versus no claims can be used to identify the determinants of traffic accidents. This study compared the relative performances of logistic regression and XGBoost approaches for predicting the existence of accident claims using telematics data. The dataset contained information from an insurance company about the individuals’ driving patterns—including total annual distance driven and percentage of total distance driven in urban areas. Our findings showed that logistic regression is a suitable model given its interpretability and good predictive capacity. XGBoost requires numerous model-tuning procedures to match the predictive performance of the logistic regression model and greater effort as regards to interpretation.

Author(s):  
Jessica Pesantez-Narvaez ◽  
Montserrat Guillen ◽  
Manuela Alcañiz

XGBoost is recognized as an algorithm with exceptional predictive capacity. Models for a binary response indicating the existence of accident claims vs. no claims can be used to identify the determinants of traffic accidents. We compare the relative performances of logistic regression and XGBoost approaches for predicting the existence of accident claims using telematics data. The dataset contains information from an insurance company about individuals’ driving patterns – including total annual distance driven and percentage of total distance driven in urban areas. Our findings show that logistic regression is a suitable model given its interpretability and good predictive capacity. XGBoost requires numerous model-tuning procedures to match the predictive performance of the logistic regression model and greater effort as regards interpretation.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 636.1-636
Author(s):  
Y. Santamaria-Alza ◽  
J. Sanchez-Bautista ◽  
T. Urrego Callejas ◽  
J. Moreno ◽  
F. Jaimes ◽  
...  

Background:The most common complication in patients with SLE is infection, and its clinical presentation is often indistinguishable from SLE flares. Therefore, laboratory ratios have been evaluated to differentiate between those events. Among them, ESR/CRP1, neutrophil/lymphocyte (NLR)2, and platelet/lymphocyte (PLR)3 ratios have been previously assessed with acceptable performance; however, there is no validation of those ratios in our SLE population.Objectives:To examine the predictive capacity of infection of the lymphocyte/C4 (LC4R), lymphocyte/C3 (LC3R), and ferritin/ESR (FER) ratios in SLE patients, and to evaluate the performance of ESR/CRP, NLR, AND PLR ratios in our SLE population.Methods:We conducted a cross-sectional study of SLE patients admitted to the emergency service at Hospital San Vicente Fundación (HSVF). The HSVF ethics committee approved the execution of the project.Patients were categorized into four groups according to the main cause of hospitalization: (1) infection, (2) flare, (3) infection and flare and, (4) neither infection nor flare.We calculated the median values of the ratios and their respective interquartile ranges for each group. Then, we compared those summary measures using the Kruskal-Wallis test. Subsequently, we assessed the predictive capacity of infection of each ratio using ROC curve. Finally, we carried out a logistic regression model.Results:A total of 246 patients were included, among them 90.7% were women. The median age was 28 years (IQR: 20-35 years). Regarding the outcomes, 37.0% of the patients had flares, 30.9% had neither infection nor flare, 16.7% had an infection and, 15.5% had simultaneously infection and flare. When compared the four groups, statistical significance (p<0.05) was observed. Area under the ROC curve (AUC) for infection prediction was as follows: 0.752 (sensitivity 60.5%, specificity 80.5%) for LC4R, 0.740 (sensitivity 73.2%, specificity 68.3%) for FER, 0.731 (sensitivity 77.6%, specificity 80.5%) for LC3R.In the logistic regression modeling, we observed that an increase in the risk of infection was associated with an LC4R below 66.7 (OR: 6.3, CI: 2.7 – 14.3, p <0.0001), a FER greater than 13.6 (OR: 5.9, CI: 2.8 – 12.1, p <0.0001) and an LC3R below 11.2 (OR: 4.9, CI: 2.4 – 9.8, p <0.0001).The ESR/CRP and PLR performed poorly with an AUC of 0.580 and 0.655, respectively. In contrast, the NLR showed better performance (AUC of 0.709, with a sensitivity of 80.2% and specificity of 55.7%).Figure 1.ROC curves of the evaluated ratiosConclusion:These laboratory ratios could be easy to assay and inexpensive biomarkers to differentiate between infection and activity in SLE patients. The LC4R, FER, and LC3R have a significant diagnostic performance for detecting infection among SLE patients. Of the ratios previously evaluated, ESR/CRP, LPR, NLR, only the latest has an adequate performance in our population.References:[1]Littlejohn E, Marder W, Lewis E, et al. The ratio of erythrocyte sedimentation rate to C-reactive protein is useful in distinguishing infection from flare in systemic lupus erythematosus patients presenting with fever. Lupus. 2018;27(7):1123-1129.[2]Broca-Garcia BE, Saavedra MA, Martínez-Bencomo MA, et al. Utility of neutrophil-to-lymphocyte ratio plus C-reactive protein for infection in systemic lupus erythematosus. Lupus. 2019;28(2):217-222.[3]Soliman WM, Sherif NM, Ghanima IM, EL-Badawy MA. Neutrophil to lymphocyte and platelet to lymphocyte ratios in systemic lupus erythematosus: Relation with disease activity and lupus nephritis. Reumatol Clin. 2020;16(4):255-261s.Disclosure of Interests:None declared


Author(s):  
Kazutaka Uchida ◽  
Junichi Kouno ◽  
Shinichi Yoshimura ◽  
Norito Kinjo ◽  
Fumihiro Sakakibara ◽  
...  

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.


2018 ◽  
Vol 8 (1) ◽  
pp. 16 ◽  
Author(s):  
Irina Matijosaitiene ◽  
Peng Zhao ◽  
Sylvain Jaume ◽  
Joseph Gilkey Jr

Predicting the exact urban places where crime is most likely to occur is one of the greatest interests for Police Departments. Therefore, the goal of the research presented in this paper is to identify specific urban areas where a crime could happen in Manhattan, NY for every hour of a day. The outputs from this research are the following: (i) predicted land uses that generates the top three most committed crimes in Manhattan, by using machine learning (random forest and logistic regression), (ii) identifying the exact hours when most of the assaults are committed, together with hot spots during these hours, by applying time series and hot spot analysis, (iii) built hourly prediction models for assaults based on the land use, by deploying logistic regression. Assault, as a physical attack on someone, according to criminal law, is identified as the third most committed crime in Manhattan. Land use (residential, commercial, recreational, mixed use etc.) is assigned to every area or lot in Manhattan, determining the actual use or activities within each particular lot. While plotting assaults on the map for every hour, this investigation has identified that the hot spots where assaults occur were ‘moving’ and not confined to specific lots within Manhattan. This raises a number of questions: Why are hot spots of assaults not static in an urban environment? What makes them ‘move’—is it a particular urban pattern? Is the ‘movement’ of hot spots related to human activities during the day and night? Answering these questions helps to build the initial frame for assault prediction within every hour of a day. Knowing a specific land use vulnerability to assault during each exact hour can assist the police departments to allocate forces during those hours in risky areas. For the analysis, the study is using two datasets: a crime dataset with geographical locations of crime, date and time, and a geographic dataset about land uses with land use codes for every lot, each obtained from open databases. The study joins two datasets based on the spatial location and classifies data into 24 classes, based on the time range when the assault occurred. Machine learning methods reveal the effect of land uses on larceny, harassment and assault, the three most committed crimes in Manhattan. Finally, logistic regression provides hourly prediction models and unveils the type of land use where assaults could occur during each hour for both day and night.


2021 ◽  
pp. 1-10
Author(s):  
I. Krug ◽  
J. Linardon ◽  
C. Greenwood ◽  
G. Youssef ◽  
J. Treasure ◽  
...  

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.


2018 ◽  
Vol 16 (4) ◽  
pp. 296-306
Author(s):  
Justin T McDaniel ◽  
Robert J McDermott ◽  
Mary P Martinasek ◽  
Robin M White

Objective We sought to determine variables associated with asthma among children from military and non-military families. Methods We performed secondary data analysis on the 2016 Behavioral Risk Factor Surveillance System. Parents with and without military experience ( n = 61,079) were asked whether a child ever had asthma and currently has asthma. We used two multiple logistic regression models to determine the influence of rurality and geographic region on “ever” and “current” asthma in children of military and non-military families, while controlling for socio-demographic and behavioral variables. Results Overall childhood asthma prevalence for children in military families was lower than non-military families (ever, 9.7% vs. 12.9%; currently, 6.2% vs. 8.2%) in 2016. However, multiple logistic regression showed variation in “ever” and “current” asthma among children of military and non-military families by rurality and race. Discussion Developers of public health asthma interventions should consider targeting African-American children of military families living in urban areas. This population is approximately twice as likely to have asthma as Caucasian children of non-military families.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Philip Wilson ◽  
Fiona McQuaige ◽  
Lucy Thompson ◽  
Alex McConnachie

Aims. To investigate factors associated with language delay in a cohort of 30-month-old children and determine if identification of language delay requires active contact with families.Methods. Data were collected at a pilot universal 30-month health contact. Health visitors used a simple two-item language screen. Data were obtained for 315 children; language delay was found in 33. The predictive capacity of 13 variables which could realistically be known before the 30-month contact was analysed.Results. Seven variables were significantly associated with language delay in univariate analysis, but in logistic regression only five of these variables remained significant.Conclusion. The presence of one or more risk factors had a sensitivity of 89% and specificity of 45%, but a positive predictive value of only 15%. The presence of one or more of these risk factors thus can not reliably be used to identify language delayed children, nor is it possible to define an “at risk” population because male gender was the only significant demographic factor and it had an unacceptably low specificity (52.5%). It is not possible to predict which children will have language delay at 30 months. Identification of this important ESSENCE disorder requires direct clinical contact with all families.


2020 ◽  
Vol 2 (1) ◽  
pp. 107
Author(s):  
Nesyana Dewi ◽  
Melti Roza Adry

This study aims to determine the effect of education, income per capita, age and knowledge on waste management in urban areas West Sumatera. This study uses secondary data in the form of cross section data of urban West Sumatera. Data obtained from BPS- Susenas West Sumatera. This study uses logistic regression analysis. The result of this study indicate that (1) education has not significant effect on waste management in urban areas West Sumatera (2) income per capita has not significant effect on waste management  in urban areas West Sumatera (3) age has not significant effect on waste management in urban areas West Sumatera (4) knowledge has a significant effect on waste management in urban areas West Sumatera


2022 ◽  
Vol 12 (1) ◽  
pp. 112
Author(s):  
Rui Guo ◽  
Renjie Zhang ◽  
Ran Liu ◽  
Yi Liu ◽  
Hao Li ◽  
...  

Spontaneous intracerebral hemorrhage (SICH) has been common in China with high morbidity and mortality rates. This study aims to develop a machine learning (ML)-based predictive model for the 90-day evaluation after SICH. We retrospectively reviewed 751 patients with SICH diagnosis and analyzed clinical, radiographic, and laboratory data. A modified Rankin scale (mRS) of 0–2 was defined as a favorable functional outcome, while an mRS of 3–6 was defined as an unfavorable functional outcome. We evaluated 90-day functional outcome and mortality to develop six ML-based predictive models and compared their efficacy with a traditional risk stratification scale, the intracerebral hemorrhage (ICH) score. The predictive performance was evaluated by the areas under the receiver operating characteristic curves (AUC). A total of 553 patients (73.6%) reached the functional outcome at the 3rd month, with the 90-day mortality rate of 10.2%. Logistic regression (LR) and logistic regression CV (LRCV) showed the best predictive performance for functional outcome (AUC = 0.890 and 0.887, respectively), and category boosting presented the best predictive performance for the mortality (AUC = 0.841). Therefore, ML might be of potential assistance in the prediction of the prognosis of SICH.


Sign in / Sign up

Export Citation Format

Share Document