Imputation of Contiguous Gaps and Extremes of Subhourly Groundwater Time Series Using Random Forests

Author(s):  
Dipankar Dwivedi ◽  
Utkarsh Mital ◽  
Boris Faybishenko ◽  
Baptiste Dafflon ◽  
Charuleka Varadharajan ◽  
...  
Keyword(s):  
10.2196/14609 ◽  
2019 ◽  
Vol 21 (10) ◽  
pp. e14609 ◽  
Author(s):  
Lawrence Fulton ◽  
Clemens Scott Kruse

Background Hospital-based back surgery in the United States increased by 60% from January 2012 to December 2017, yet the supply of neurosurgeons remained relatively constant. During this time, adult obesity grew by 5%. Objective This study aimed to evaluate the demand and associated costs for hospital-based back surgery by geolocation over time to evaluate provider practice variation. The study then leveraged hierarchical time series to generate tight demand forecasts on an unobserved test set. Finally, explanatory financial, technical, workload, geographical, and temporal factors as well as state-level obesity rates were investigated as predictors for the demand for hospital-based back surgery. Methods Hospital data from January 2012 to December 2017 were used to generate geospatial-temporal maps and a video of the Current Procedural Terminology codes beginning with the digit 63 claims. Hierarchical time series modeling provided forecasts for each state, the census regions, and the nation for an unobserved test set and then again for the out-years of 2018 and 2019. Stepwise regression, lasso regression, ridge regression, elastic net, and gradient-boosted random forests were built on a training set and evaluated on a test set to evaluate variables important to explaining the demand for hospital-based back surgery. Results Widespread, unexplained practice variation over time was seen using geographical information systems (GIS) multimedia mapping. Hierarchical time series provided accurate forecasts on a blind dataset and suggested a 6.52% (from 497,325 procedures in 2017 to 529,777 in 2018) growth of hospital-based back surgery in 2018 (529,777 and up to 13.00% by 2019 [from 497,325 procedures in 2017 to 563,023 procedures in 2019]). The increase in payments by 2019 are estimated to be US $323.9 million. Extreme gradient-boosted random forests beat constrained and unconstrained regression models on a 20% unobserved test set and suggested that obesity is one of the most important factors in explaining the increase in demand for hospital-based back surgery. Conclusions Practice variation and obesity are factors to consider when estimating demand for hospital-based back surgery. Federal, state, and local planners should evaluate demand-side and supply-side interventions for this emerging problem.


2019 ◽  
Author(s):  
Lawrence Fulton ◽  
Clemens Scott Kruse

BACKGROUND Hospital-based back surgery in the United States increased by 60% from January 2012 to December 2017, yet the supply of neurosurgeons remained relatively constant. During this time, adult obesity grew by 5%. OBJECTIVE This study aimed to evaluate the demand and associated costs for hospital-based back surgery by geolocation over time to evaluate provider practice variation. The study then leveraged hierarchical time series to generate tight demand forecasts on an unobserved test set. Finally, explanatory financial, technical, workload, geographical, and temporal factors as well as state-level obesity rates were investigated as predictors for the demand for hospital-based back surgery. METHODS Hospital data from January 2012 to December 2017 were used to generate geospatial-temporal maps and a video of the Current Procedural Terminology codes beginning with the digit 63 claims. Hierarchical time series modeling provided forecasts for each state, the census regions, and the nation for an unobserved test set and then again for the out-years of 2018 and 2019. Stepwise regression, lasso regression, ridge regression, elastic net, and gradient-boosted random forests were built on a training set and evaluated on a test set to evaluate variables important to explaining the demand for hospital-based back surgery. RESULTS Widespread, unexplained practice variation over time was seen using geographical information systems (GIS) multimedia mapping. Hierarchical time series provided accurate forecasts on a blind dataset and suggested a 6.52% (from 497,325 procedures in 2017 to 529,777 in 2018) growth of hospital-based back surgery in 2018 (529,777 and up to 13.00% by 2019 [from 497,325 procedures in 2017 to 563,023 procedures in 2019]). The increase in payments by 2019 are estimated to be US $323.9 million. Extreme gradient-boosted random forests beat constrained and unconstrained regression models on a 20% unobserved test set and suggested that obesity is one of the most important factors in explaining the increase in demand for hospital-based back surgery. CONCLUSIONS Practice variation and obesity are factors to consider when estimating demand for hospital-based back surgery. Federal, state, and local planners should evaluate demand-side and supply-side interventions for this emerging problem.


2020 ◽  
Vol 77 (4) ◽  
pp. 1379-1390 ◽  
Author(s):  
Roland Proud ◽  
Richard Mangeni-Sande ◽  
Robert J Kayanda ◽  
Martin J Cox ◽  
Chrisphine Nyamweya ◽  
...  

Abstract Biomass of the schooling fish Rastrineobola argentea (dagaa) is presently estimated in Lake Victoria by acoustic survey following the simple “rule” that dagaa is the source of most echo energy returned from the top third of the water column. Dagaa have, however, been caught in the bottom two-thirds, and other species occur towards the surface: a more robust discrimination technique is required. We explored the utility of a school-based random forest (RF) classifier applied to 120 kHz data from a lake-wide survey. Dagaa schools were first identified manually using expert opinion informed by fishing. These schools contained a lake-wide biomass of 0.68 million tonnes (MT). Only 43.4% of identified dagaa schools occurred in the top third of the water column, and 37.3% of all schools in the bottom two-thirds were classified as dagaa. School metrics (e.g. length, echo energy) for 49 081 manually classified dagaa and non-dagaa schools were used to build an RF school classifier. The best RF model had a classification test accuracy of 85.4%, driven largely by school length, and yielded a biomass of 0.71 MT, only c. 4% different from the manual estimate. The RF classifier offers an efficient method to generate a consistent dagaa biomass time series.


2020 ◽  
Vol 24 ◽  
pp. 801-826
Author(s):  
Benjamin Goehry

Random forests were introduced by Breiman in 2001. We study theoretical aspects of both original Breiman’s random forests and a simplified version, the centred random forests. Under the independent and identically distributed hypothesis, Scornet, Biau and Vert proved the consistency of Breiman’s random forest, while Biau studied the simplified version and obtained a rate of convergence in the sparse case. However, the i.i.d hypothesis is generally not satisfied for example when dealing with time series. We extend the previous results to the case where observations are weakly dependent, more precisely when the sequences are stationary β−mixing.


2020 ◽  
Vol 14 (2) ◽  
pp. 3644-3671 ◽  
Author(s):  
Richard A. Davis ◽  
Mikkel S. Nielsen

2020 ◽  
Vol 30 (16) ◽  
pp. 2050250
Author(s):  
Angeliki Papana ◽  
Ariadni Papana-Dagiasis ◽  
Elsa Siggiridou

Transfer entropy (TE) captures the directed relationships between two variables. Partial transfer entropy (PTE) accounts for the presence of all confounding variables of a multivariate system and infers only about direct causality. However, the computation of partial transfer entropy involves high dimensional distributions and thus may not be robust in case of many variables. In this work, different variants of the partial transfer entropy are introduced, by building a reduced number of confounding variables based on different scenarios in terms of their interrelationships with the driving or response variable. Connectivity-based PTE variants utilizing the random forests (RF) methodology are evaluated on synthetic time series. The empirical findings indicate the superiority of the suggested variants over transfer entropy and partial transfer entropy, especially in the case of high dimensional systems. The above findings are further highlighted when applying the causality measures on financial time series.


Sign in / Sign up

Export Citation Format

Share Document