A two-stage super learner for healthcare expenditures

Objective. To improve the estimation of healthcare expenditures by introducing a novel method that is well-suited to situations where data exhibit strong skewness and zero-inflation. Data Sources. Simulations, and two sources of real-world data: the 2016-2017 Medical Expenditure Panel Survey (MEPS) and the Back Pain Outcomes using Longitudinal Data (BOLD) datasets. Study Design. The super learner is an ensemble machine learning approach that can combine several algorithms to improve estimation. We propose a two-stage super learner that is well suited for use with healthcare expenditure data by separately estimating the probability of any healthcare expenditure and the mean amount of healthcare expenditure conditional on having healthcare expenditures. These estimates can be combined to yield a single estimate of expenditures for each observation. The method can flexibly incorporate a range of individual estimation approaches for each stage of estimation, including both regression-based approaches and machine learning algorithms such as random forests. We compare the performance of the two-stage super learner with a one-stage super learner, and with multiple individual algorithms for estimation of healthcare cost under a broad range of data settings in simulated and real data. The predictive performance was compared using Mean Squared Error and R2. Data collection/Extraction methods. MEPS data include only adults and exclude observations with missingness, BOLD data include observations without missingness. Principal Findings. Our results indicate that the two-stage super learner has a better performance compared with a one-stage super learner and individual algorithms, for healthcare cost estimation under a wide variety of settings in simulations and empirical analyses. The improvement of the two-stage super learner over the one-stage super learner was particularly evident in settings when zero-inflation is high. Conclusions. The two-stage super learner provides researchers an effective approach for healthcare cost analyses in environments where they cannot know the best single algorithm a priori. Keywords. Semicontinuous data, two-part models, zero-inflation, super learning, healthcare expenditure.

Download Full-text

Two-stage detection of north Atlantic right whale upcalls using local binary patterns and machine learning algorithms

Applied Acoustics ◽

10.1016/j.apacoust.2017.01.025 ◽

2017 ◽

Vol 120 ◽

pp. 158-166 ◽

Cited By ~ 4

Author(s):

Mahdi Esfahanian ◽

Nurgun Erdol ◽

Edmund Gerstein ◽

Hanqi Zhuang

Keyword(s):

Machine Learning ◽

North Atlantic ◽

Learning Algorithms ◽

Local Binary Patterns ◽

Machine Learning Algorithms ◽

Two Stage ◽

North Atlantic Right Whale ◽

Right Whale

Download Full-text

Prediction of Water Saturation from Well Log Data by Machine Learning Algorithms: Boosting and Super Learner

Journal of Marine Science and Engineering ◽

10.3390/jmse9060666 ◽

2021 ◽

Vol 9 (6) ◽

pp. 666

Author(s):

Fahimeh Hadavimoghaddam ◽

Mehdi Ostadhassan ◽

Mohammad Ali Sadri ◽

Tatiana Bondarenko ◽

Igor Chebyshev ◽

...

Keyword(s):

Machine Learning ◽

Water Saturation ◽

Machine Learning Algorithms ◽

Rock Properties ◽

Gradient Boosting ◽

Data Set ◽

Log Data ◽

Gamma Density ◽

Super Learner ◽

Resistivity Log

Intelligent predictive methods have the power to reliably estimate water saturation (Sw) compared to conventional experimental methods commonly performed by petrphysicists. However, due to nonlinearity and uncertainty in the data set, the prediction might not be accurate. There exist new machine learning (ML) algorithms such as gradient boosting techniques that have shown significant success in other disciplines yet have not been examined for Sw prediction or other reservoir or rock properties in the petroleum industry. To bridge the literature gap, in this study, for the first time, a total of five ML code programs that belong to the family of Super Learner along with boosting algorithms: XGBoost, LightGBM, CatBoost, AdaBoost, are developed to predict water saturation without relying on the resistivity log data. This is important since conventional methods of water saturation prediction that rely on resistivity log can become problematic in particular formations such as shale or tight carbonates. Thus, to do so, two datasets were constructed by collecting several types of well logs (Gamma, density, neutron, sonic, PEF, and without PEF) to evaluate the robustness and accuracy of the models by comparing the results with laboratory-measured data. It was found that Super Learner and XGBoost produced the highest accurate output (R2: 0.999 and 0.993, respectively), and with considerable distance, Catboost and LightGBM were ranked third and fourth, respectively. Ultimately, both XGBoost and Super Learner produced negligible errors but the latest is considered as the best amongst all.

Download Full-text

A Two‐Stage Data‐Driven Spatiotemporal Analysis to Predict Failure Risk of Urban Sewer Systems Leveraging Machine Learning Algorithms

Risk Analysis ◽

10.1111/risa.13742 ◽

2021 ◽

Author(s):

John E. Fontecha ◽

Puneet Agarwal ◽

María N. Torres ◽

Sayanti Mukherjee ◽

Jose L. Walteros ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Spatiotemporal Analysis ◽

Machine Learning Algorithms ◽

Data Driven ◽

Two Stage ◽

Sewer Systems ◽

Failure Risk

Download Full-text

The Prediction Model of Medical Expenditure Appling Machine Learning Algorithm in CABG Patients

Healthcare ◽

10.3390/healthcare9060710 ◽

2021 ◽

Vol 9 (6) ◽

pp. 710

Author(s):

Yen-Chun Huang ◽

Shao-Jung Li ◽

Mingchih Chen ◽

Tian-Shyug Lee

Keyword(s):

Machine Learning ◽

Health Insurance ◽

Prediction Model ◽

Healthcare Management ◽

Machine Learning Algorithms ◽

Medical Expenditure ◽

Research Database ◽

Cabg Surgery ◽

Medical Expenses ◽

Cabg Patients

Most patients face expensive healthcare management after coronary artery bypass grafting (CABG) surgery, which brings a substantial financial burden to the government. The National Health Insurance Research Database (NHIRD) is a complete database containing over 99% of individuals’ medical information in Taiwan. Our research used the latest data that selected patients who accepted their first CABG surgery between January 2014 and December 2017 (n = 12,945) to predict which factors will affect medical expenses, and built the prediction model using different machine learning algorithms. After analysis, our result showed that the surgical expenditure (X4) and 1-year medical expenditure before the CABG operation (X14), and the number of hemodialysis (X15), were the key factors affecting the 1-year medical expenses of CABG patients after discharge. Furthermore, the XGBoost and SVR methods are both the best predictive models. Thus, our research suggests enhancing the healthcare management for patients with kidney-related diseases to avoid costly complications. We provide helpful information for medical management, which may decrease health insurance burdens in the future.

Download Full-text

Health Costs of Older Opioid Users with Pain and Comorbid Hypercholesterolemia or Hypertension in the United States

Diseases ◽

10.3390/diseases9020041 ◽

2021 ◽

Vol 9 (2) ◽

pp. 41

Author(s):

David R. Axon ◽

Srujitha Marupuru ◽

Shannon Vaffis

Keyword(s):

United States ◽

Healthcare Expenditure ◽

Medical Expenditure Panel Survey ◽

Prescription Medication ◽

The United States ◽

Medical Expenditure ◽

Future Research ◽

Linear Regression Models ◽

Healthcare Expenditures ◽

Opioid Users

This retrospective cross-sectional database study used 2018 Medical Expenditure Panel Survey data to quantify and assess differences in healthcare expenditures between opioid users and non-users among a non-institutionalized sample of older (≥50 years) United States adults with pain in the past four weeks and a diagnosis of comorbid hypercholesterolemia (pain–hypercholesterolemia group) or hypertension (pain–hypertension group). Hierarchical multivariable linear regression models were constructed by using logarithmically transformed positive cost data and adjusting for relevant factors to assess cost differences between groups. Percent difference between opioid users and non-users was calculated by using semi-logarithmic equations. Healthcare costs included inpatient, outpatient, office-based, emergency room, prescription medication, other, and total costs. In adjusted analyses, compared to non-users, opioid users in the pain–hypercholesterolemia and pain–hypertension groups respectively had 66% and 60% greater inpatient expenditure, 46% and 55% greater outpatient expenditure, 67% and 72% greater office-based expenditure, 50% and 60% greater prescription medication expenditure, 24% and 22% greater other healthcare expenditure, and 85% and 93% greater total healthcare expenditure. In conclusion, adjusted total healthcare expenditures were 85–93% greater among opioid users versus non-users in older United States adults with pain and comorbid hypercholesterolemia or hypertension. Future research is needed to identify opioid use predictors among these populations and reduce expenditures.

Download Full-text

Obesity-Related Healthcare Expenditures in the US 2002–2016

Current Developments in Nutrition ◽

10.1093/cdn/nzaa063_100 ◽

2020 ◽

Vol 4 (Supplement_2) ◽

pp. 1702-1702

Author(s):

Hong Xue ◽

Shuo-yu Lin ◽

Xiaolu Cheng

Keyword(s):

Prescription Drugs ◽

Healthcare Expenditure ◽

Medical Expenditure Panel Survey ◽

Temporal Trends ◽

Medical Expenditure ◽

Office Visits ◽

Healthcare Expenditures ◽

The Us ◽

Obese Population ◽

Per Capita

Abstract Objectives This study examined the temporal trends of obesity-related healthcare expenditures in the US between 2002 and 2016, and assessed the disparities across age, gender, race/ethnicity groups. Methods Nationally representative data from the Medical Expenditure Panel Survey (MEPS) between 2002 and 2016 were used. About 290,000 adults were included in the analyses. A two-part regression model was used to estimate the expenditures attributable to obesity. Results Between 2002–2016, obesity-related per capita healthcare expenditures increased from $4431 to $5638 in overweight and from $4898 to $5900 in obese populations (inflation-adjusted to 2016 USD). Our estimates suggested that obesity-related annual per capita healthcare across the lifespan (from 19 to 85 years old) for obese women could increase from $3356 to $13,630, significantly higher than their male counterparts (from $2473 to $10,813, P = 0.001). From age 19 to 85, obesity-related healthcare expenditure could increase from $3188 to $13,178 in non-Hispanic whites, greater than Hispanic (from $2210 to $9769, P < 0.001), and black ($2583 to $11,126, P = 0.02). Office visits and prescription drugs contributed most to the growth of obesity-related healthcare costs between 2002 and 2016 in the obese population, accounting for 24% and 29% of total healthcare expenditure respectively in 2016 as compared to 22% and 25% in 2002. Conclusions Obesity-related healthcare expenditure has been increasing in the US between 2002 and 2016 with evident disparities across gender and racial/ethnic subpopulations. Physician office visits and prescription drugs are the key contributing factors to the increase in the obese population. Funding Sources N/A.

Download Full-text

An Efficient Road Surveillance Approach to Detect, Recognize & Tracking Vehicles Using Deep Learning Methods

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2174106 ◽

2021 ◽

pp. 503-512

Author(s):

Vinod Kumar Yadav ◽

Dr. Pritaj Yadav ◽

Dr. Shailja Sharma

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Vehicle Detection ◽

Machine Learning Algorithms ◽

Detection Methods ◽

Motor Vehicles ◽

Detection Accuracy ◽

Two Stage ◽

One Stage ◽

Traffic Regulation

In the current scenario on the increasing number of motor vehicles day by day, so traffic regulation faces many challenges on intelligent road surveillance and governance, this is one of the important research areas in the artificial intelligence or deep learning. Among various technologies, computer vision and machine learning algorithms have the most efficient, as a huge vehicles video or image data on road is available for study. In this paper, we proposed computer vision-based an efficient approach to vehicle detection, recognition and Tracking. We merge with one-stage (YOLOv4) and two-stage (R-FCN) detectors methods to improve vehicle detection accuracy and speed results. Two-stage object detection methods provide high localization and object recognition precision, even as one-stage detectors achieve high inference and test speed. Deep-SORT tracker method applied for detects bounding boxes to estimate trajectories. We analyze the performance of the Mask RCNN benchmark, YOLOv3 and Proposed YOLOv4 + R-FCN on the UA-DETRAC dataset and study with certain parameters like Mean Average Precisions (mAP), Precision recall.

Download Full-text

A Novel Two-Stage Selection of Feature Subsets in Machine Learning

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.2735 ◽

2019 ◽

Vol 9 (3) ◽

pp. 4169-4175

Author(s):

R. F. Kamala ◽

P. R. J. Thangaiah

Keyword(s):

Machine Learning ◽

Variable Selection ◽

Selection Procedure ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Feature Subset Selection ◽

Feature Subset ◽

Two Stage ◽

Fitness Evaluation ◽

Selection Of

In feature subset selection the variable selection procedure selects a subset of the most relevant features. Filter and wrapper methods are categories of variable selection methods. Feature subsets are similar to data pre-processing and are applied to reduce feature dimensions in a very large dataset. In this paper, in order to deal with this kind of problems, the selection of feature subset methods depending on the fitness evaluation of the classifier is introduced to alleviate the classification task and to progress the classification performance. To curtail the dimensions of the feature space, a novel approach for selecting optimal features on two-stage selection of feature subsets (TSFS) method is done, both theoretically and experimentally. The results of this method include improvements in the performance measures like efficiency, accuracy, and scalability of machine learning algorithms. Comparison of the proposed method is made with known relevant methods using benchmark databases. The proposed method performs better than the earlier hybrid feature selection methodologies discussed in relevant works, regarding classifiers’ accuracy and error.

Download Full-text