Evaluating prediction of continuous clinical values: a glucose case study (Preprint)
BACKGROUND Background: It would be useful to be able to assess the utility of predictive models of continuous values before clinical trials are carried out. OBJECTIVE Objective: To compare metrics to assess the potential clinical utility of models that produce continuous value forecasts. METHODS Methods: We ran a set of data assimilation forecast algorithms on time series of glucose measurements from intensive care unit patients. We evaluated the forecasts using four sets of metrics: glucose root mean square error, a set of metrics on a transformed glucose value, the estimated effect on clinical care based on an insulin guideline, and a glucose measurement error grid (Parkes grid). We assessed correlation among the metrics and created a set of factor models. RESULTS Results: The metrics generally correlated with each other, but those that estimated the effect on clinical care correlated with the others the least and were generally associated with their own independent factors. The other metrics appeared to separate into those that emphasized errors in low glucose versus errors in high glucose. The Parkes grid was well correlated with the transformed glucose but not the estimation of clinical care. CONCLUSIONS Discussion: Our results indicate that we need to be careful before we assume that commonly used metrics like RMS error in raw glucose or even metrics like the Parkes grid that are designed to measure importance of differences will correlate well with actual effect on clinical care processes. A combination of metrics appeared to explain the most variance between cases. As prediction algorithms move into practice, it will be important to measure actual effects.