Predict Health Insurance Cost by using Machine Learning and DNN Regression Models

Insurance is a policy that eliminates or decreases loss costs occurred by various risks. Various factors influence the cost of insurance. These considerations contribute to the insurance policy formulation. Machine learning (ML) for the insurance industry sector can make the wording of insurance policies more efficient. This study demonstrates how different models of regression can forecast insurance costs. And we will compare the results of models, for example, Multiple Linear Regression, Generalized Additive Model, Support Vector Machine, Random Forest Regressor, CART, XGBoost, k-Nearest Neighbors, Stochastic Gradient Boosting, and Deep Neural Network. This paper offers the best approach to the Stochastic Gradient Boosting model with an MAE value of 0.17448, RMSE value of 0.38018and R -squared value of 85.8295.

Download Full-text

A Computational Intelligence Approach for Predicting Medical Insurance Cost

Mathematical Problems in Engineering ◽

10.1155/2021/1162553 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Ch. Anwar ul Hassan ◽

Jawaid Iqbal ◽

Saddam Hussain ◽

Hussain AlSalman ◽

Mogeeb A. A. Mosleh ◽

...

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Computational Intelligence ◽

Medical Insurance ◽

Stochastic Gradient ◽

Gradient Boosting ◽

Stochastic Gradient Boosting ◽

Insurance Cost ◽

Insurance Costs ◽

Intelligence Approach

In the domains of computational and applied mathematics, soft computing, fuzzy logic, and machine learning (ML) are well-known research areas. ML is one of the computational intelligence aspects that may address diverse difficulties in a wide range of applications and systems when it comes to exploitation of historical data. Predicting medical insurance costs using ML approaches is still a problem in the healthcare industry that requires investigation and improvement. Using a series of machine learning algorithms, this study provides a computational intelligence approach for predicting healthcare insurance costs. The proposed research approach uses Linear Regression, Support Vector Regression, Ridge Regressor, Stochastic Gradient Boosting, XGBoost, Decision Tree, Random Forest Regressor, Multiple Linear Regression, and k-Nearest Neighbors A medical insurance cost dataset is acquired from the KAGGLE repository for this purpose, and machine learning methods are used to show how different regression models can forecast insurance costs and to compare the models’ accuracy. The results shows that the Stochastic Gradient Boosting (SGB) model outperforms the others with a cross-validation value of 0.0.858 and RMSE value of 0.340 and gives 86% accuracy.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text

Mapping of the Canopy Openings in Mixed Beech–Fir Forest at Sentinel-2 Subpixel Level Using UAV and Machine Learning Approach

Remote Sensing ◽

10.3390/rs12233925 ◽

2020 ◽

Vol 12 (23) ◽

pp. 3925

Author(s):

Ivan Pilaš ◽

Mateo Gašparović ◽

Alan Novkinić ◽

Damir Klobučar

Keyword(s):

Machine Learning ◽

Forest Canopy ◽

Vegetation Index ◽

Predictive Performance ◽

Spatial Extent ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Sentinel 2

The presented study demonstrates a bi-sensor approach suitable for rapid and precise up-to-date mapping of forest canopy gaps for the larger spatial extent. The approach makes use of Unmanned Aerial Vehicle (UAV) red, green and blue (RGB) images on smaller areas for highly precise forest canopy mask creation. Sentinel-2 was used as a scaling platform for transferring information from the UAV to a wider spatial extent. Various approaches to an improvement in the predictive performance were examined: (I) the highest R2 of the single satellite index was 0.57, (II) the highest R2 using multiple features obtained from the single-date, S-2 image was 0.624, and (III) the highest R2 on the multitemporal set of S-2 images was 0.697. Satellite indices such as Atmospherically Resistant Vegetation Index (ARVI), Infrared Percentage Vegetation Index (IPVI), Normalized Difference Index (NDI45), Pigment-Specific Simple Ratio Index (PSSRa), Modified Chlorophyll Absorption Ratio Index (MCARI), Color Index (CI), Redness Index (RI), and Normalized Difference Turbidity Index (NDTI) were the dominant predictors in most of the Machine Learning (ML) algorithms. The more complex ML algorithms such as the Support Vector Machines (SVM), Random Forest (RF), Stochastic Gradient Boosting (GBM), Extreme Gradient Boosting (XGBoost), and Catboost that provided the best performance on the training set exhibited weaker generalization capabilities. Therefore, a simpler and more robust Elastic Net (ENET) algorithm was chosen for the final map creation.

Download Full-text

Predicting 90-Day and 1-Year Mortality in Spinal Metastatic Disease: Development and Internal Validation

Neurosurgery ◽

10.1093/neuros/nyz070 ◽

2019 ◽

Vol 85 (4) ◽

pp. E671-E681 ◽

Cited By ~ 22

Author(s):

Aditya V Karhade ◽

Quirina C B S Thio ◽

Paul T Ogink ◽

Christopher M Bono ◽

Marco L Ferrone ◽

...

Keyword(s):

Metastatic Disease ◽

Web Application ◽

Performance Status ◽

Predictive Performance ◽

Operative Management ◽

Stochastic Gradient ◽

Patient Specific ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

Abstract BACKGROUND Increasing prevalence of metastatic disease has been accompanied by increasing rates of surgical intervention. Current tools have poor to fair predictive performance for intermediate (90-d) and long-term (1-yr) mortality. OBJECTIVE To develop predictive algorithms for spinal metastatic disease at these time points and to provide patient-specific explanations of the predictions generated by these algorithms. METHODS Retrospective review was conducted at 2 large academic medical centers to identify patients undergoing initial operative management for spinal metastatic disease between January 2000 and December 2016. Five models (penalized logistic regression, random forest, stochastic gradient boosting, neural network, and support vector machine) were developed to predict 90-d and 1-yr mortality. RESULTS Overall, 732 patients were identified with 90-d and 1-yr mortality rates of 181 (25.1%) and 385 (54.3%), respectively. The stochastic gradient boosting algorithm had the best performance for 90-d mortality and 1-yr mortality. On global variable importance assessment, albumin, primary tumor histology, and performance status were the 3 most important predictors of 90-d mortality. The final models were incorporated into an open access web application able to provide predictions as well as patient-specific explanations of the results generated by the algorithms. The application can be found at https://sorg-apps.shinyapps.io/spinemetssurvival/ CONCLUSION Preoperative estimation of 90-d and 1-yr mortality was achieved with assessment of more flexible modeling techniques such as machine learning. Integration of these models into applications and patient-centered explanations of predictions represent opportunities for incorporation into healthcare systems as decision tools in the future.

Download Full-text

Reliable photometric membership (RPM) of galaxies in clusters – I. A machine learning method and its performance in the local universe

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa486 ◽

2020 ◽

Vol 493 (3) ◽

pp. 3429-3441

Author(s):

Paulo A A Lopes ◽

André L B Ribeiro

Keyword(s):

Machine Learning ◽

Galaxy Evolution ◽

Large Scale ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Validation Data ◽

Membership Probability ◽

Cluster Membership ◽

Stochastic Gradient Boosting

ABSTRACT We introduce a new method to determine galaxy cluster membership based solely on photometric properties. We adopt a machine learning approach to recover a cluster membership probability from galaxy photometric parameters and finally derive a membership classification. After testing several machine learning techniques (such as stochastic gradient boosting, model averaged neural network and k-nearest neighbours), we found the support vector machine algorithm to perform better when applied to our data. Our training and validation data are from the Sloan Digital Sky Survey main sample. Hence, to be complete to $M_r^* + 3$, we limit our work to 30 clusters with $z$phot-cl ≤ 0.045. Masses (M200) are larger than $\sim 0.6\times 10^{14} \, \mathrm{M}_{\odot }$ (most above $3\times 10^{14} \, \mathrm{M}_{\odot }$). Our results are derived taking in account all galaxies in the line of sight of each cluster, with no photometric redshift cuts or background corrections. Our method is non-parametric, making no assumptions on the number density or luminosity profiles of galaxies in clusters. Our approach delivers extremely accurate results (completeness, C $\sim 92{\rm{ per\ cent}}$ and purity, P $\sim 87{\rm{ per\ cent}}$) within R200, so that we named our code reliable photometric membership. We discuss possible dependencies on magnitude, colour, and cluster mass. Finally, we present some applications of our method, stressing its impact to galaxy evolution and cosmological studies based on future large-scale surveys, such as eROSITA, EUCLID, and LSST.

Download Full-text

Machine Learning Application for Gas Lift Performance and Well Integrity

10.2118/205134-ms ◽

2021 ◽

Author(s):

Mostafa Sa'eed Yakoot ◽

Adel Mohamed Salem Ragab ◽

Omar Mahmoud

Keyword(s):

Machine Learning ◽

Business Performance ◽

Risk Level ◽

Gradient Boosting ◽

Support Vector ◽

Risk Category ◽

K Nearest Neighbors ◽

Failure Data ◽

Well Integrity ◽

Gas Lift

Abstract Constructing and maintaining integrity for different types of wells requires accurate assessment of posed risk level, especially when one barrier element or group of barriers fails. Risk assessment and well integrity (WI) categorization is conducted typically using traditional spreadsheets and in-house software that contain their own inherent errors. This is mainly because they are subjected to the understanding and the interpretation of the assigned team to WI data. Because of these limitations, industrial practices involve the collection and analysis of failure data to estimate risk level through certain established probability/likelihood matrices. However, those matrices have become less efficient due to the possible bias in failure data and consequent misleading assessment. The main objective of this work is to utilize machine learning (ML) algorithms to develop a powerful model and predict WI risk category of gas-lifted wells. ML algorithms implemented in this study are; logistic regression, decision trees, random forest, support vector machines, k-nearest neighbors, and gradient boosting algorithms. In addition, those algorithms are used to develop physical equation to predict risk category. Three thousand WI and gas-lift datasets were collected, preprocessed, and fed into the ML model. The newly developed model can predict well risk level and provide a unique methodology to convert associated failure risk of each element in the well envelope into tangible value. This shows the total potential risk and hence the status of well-barrier integrity overall. The implementation of ML can enhance brownfield asset operations, reduce intervention costs, better control WI through the field, improve business performance, and optimize production.

Download Full-text

Applying Machine Learning for Healthcare: A Case Study on Cervical Pain Assessment with Motion Capture

Applied Sciences ◽

10.3390/app10175942 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5942 ◽

Cited By ~ 2

Author(s):

Juan de la Torre ◽

Javier Marin ◽

Sergio Ilarri ◽

Jose J. Marin

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Gradient Boosting ◽

Support Vector ◽

Cervical Pain ◽

K Nearest Neighbors ◽

Network Algorithms ◽

Vector Machines ◽

Real Scenario

Given the exponential availability of data in health centers and the massive sensorization that is expected, there is an increasing need to manage and analyze these data in an effective way. For this purpose, data mining (DM) and machine learning (ML) techniques would be helpful. However, due to the specific characteristics of the field of healthcare, a suitable DM and ML methodology adapted to these particularities is required. The applied methodology must structure the different stages needed for data-driven healthcare, from the acquisition of raw data to decision-making by clinicians, considering the specific requirements of this field. In this paper, we focus on a case study of cervical assessment, where the goal is to predict the potential presence of cervical pain in patients affected with whiplash diseases, which is important for example in insurance-related investigations. By analyzing in detail this case study in a real scenario, we show how taking care of those particularities enables the generation of reliable predictive models in the field of healthcare. Using a database of 302 samples, we have generated several predictive models, including logistic regression, support vector machines, k-nearest neighbors, gradient boosting, decision trees, random forest, and neural network algorithms. The results show that it is possible to reliably predict the presence of cervical pain (accuracy, precision, and recall above 90%). We expect that the procedure proposed to apply ML techniques in the field of healthcare will help technologists, researchers, and clinicians to create more objective systems that provide support to objectify the diagnosis, improve test treatment efficacy, and save resources.

Download Full-text

Machine Learning Models for COVID-19 Detection in Brazil Based on Symptoms (Preprint)

10.2196/preprints.27293 ◽

2021 ◽

Author(s):

Íris Viana dos Santos Santana ◽

Andressa C. M. da Silveira ◽

Álvaro Sobrinho ◽

Lenardo Chaves e Silva ◽

Leandro Dias da Silva ◽

...

Keyword(s):

Machine Learning ◽

Early Stage ◽

Area Under The Curve ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Support Vector ◽

Accuracy Score ◽

K Nearest Neighbors ◽

Runny Nose ◽

Extreme Gradient Boosting

BACKGROUND controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities. OBJECTIVE to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms. METHODS firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance. RESULTS the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache. CONCLUSIONS supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.

Download Full-text

Classifier Selection for the Prediction of Dominant Transmission Mode of Coronavirus Within Localities

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20211101.oa1 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-12

Author(s):

Donald Douglas Atsa'am ◽

Ruth Wario

Keyword(s):

Predictive Accuracy ◽

Multinomial Logistic Regression ◽

Geographic Area ◽

Stochastic Gradient ◽

Transmission Mode ◽

Gradient Boosting ◽

Support Vector ◽

Linear Discriminant ◽

Classifier Selection ◽

Stochastic Gradient Boosting

The coronavirus disease-2019 (COVID-19) pandemic is an ongoing concern that requires research in all disciplines to tame its spread. Nine classification algorithms were selected for evaluating the most appropriate in predicting the prevalent COVID-19 transmission mode in a geographic area. These include; multinomial logistic regression, k-nearest neighbour, support vector machines, linear discriminant analysis, naïve Bayes, C5.0, bagged classification and regression trees, random forest, and stochastic gradient boosting. Five COVID-19 datasets were employed for classification. Predictive accuracy was determined using 10-fold cross validation with three repeats. The Friedman’s test was conducted and the outcome showed the performance of each algorithm is significantly different. The stochastic gradient boosting yielded the highest predictive accuracy, 81%. This finding should be valuable to health informaticians, health analysts and others regarding which machine learning tool to adopt in the efforts to detect dominant transmission mode of the virus within localities.

Download Full-text

Predicting Safe Parking Spaces: A Machine Learning Approach to Geospatial Urban and Crime Data

Sustainability ◽

10.3390/su11102848 ◽

2019 ◽

Vol 11 (10) ◽

pp. 2848 ◽

Cited By ~ 1

Author(s):

Irina Matijosaitiene ◽

Anthony McDowald ◽

Vishal Juneja

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Linear Models ◽

Hot Spot ◽

Elastic Net ◽

Motor Vehicles ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

This research aims to identify spatial and time patterns of theft in Manhattan, NY, to reveal urban factors that contribute to thefts from motor vehicles and to build a prediction model for thefts. Methods include time series and hot spot analysis, linear regression, elastic-net, Support vector machines SVM with radial and linear kernels, decision tree, bagged CART, random forest, and stochastic gradient boosting. Machine learning methods reveal that linear models perform better on our data (linear regression, elastic-net), specifying that a higher number of subway entrances, graffiti, and restaurants on streets contribute to higher theft rates from motor vehicles. Although the prediction model for thefts meets almost all assumptions (five of six), its accuracy is 77%, suggesting that there are other undiscovered factors making a contribution to the generation of thefts. As an output demonstrating final results, the application prototype for searching safer parking in Manhattan, NY based on the prediction model, has been developed.

Download Full-text