scholarly journals 4501 Statistical Modeling for Predicting Correct Drug Dose in the Presence of Conflicting Dose Information Extracted from Electronic Health Records

2020 ◽  
Vol 4 (s1) ◽  
pp. 51-51
Author(s):  
Michael Lee Williams ◽  
Hannah L Weeks ◽  
Cole Beck ◽  
Elizabeth McNeer ◽  
Leena Choi

OBJECTIVES/GOALS: Diverse medication-based studies require longitudinal drug dose information. EHRs can provide such data, but multiple mentions of a drug in the same clinical note can yield conflicting dose. We aimed to develop statistical methods which address this challenge by predicting the valid dose in the event that conflicting doses are extracted. METHODS/STUDY POPULATION: We extracted dose information for two test drugs, tacrolimus and lamotrigine, from Vanderbilt EHRs using a natural language processing system, medExtractR, which was developed by our team. A random forest classifier was used to estimate the probability of correctness for each extracted dose on the basis of subject longitudinal dosing patterns and extracted EHR note context. Using this feasibility measure and other features such as a summary of subject dosing history, we developed several statistical models to predict the dose on the basis of the extracted doses. The models developed based on supervised methods included a separate random forest regression, a transition model, and a boosting model. We also considered unsupervised methods and developed a Bayesian hierarchical model. RESULTS/ANTICIPATED RESULTS: We compared model-predicted doses to physician-validated doses to evaluate model performance. A random forest regression model outperformed all proposed models. As this model is a supervised model, its utility would depend on availability of validated dose. Our preliminary result from a Bayesian hierarchical model showed that it can be a promising alternative although performing less optimally. The Bayesian hierarchical model would be especially useful when validated dose data are not available, as it was developed in unsupervised modeling framework and hence does not require validated dose that can be difficult and time consuming to obtain. We evaluated the feasibility of each method for automatic implementation in our drug dosing extraction and processing system we have been developing. DISCUSSION/SIGNIFICANCE OF IMPACT: We will incorporate the developed methods as a part of our complete medication extraction system, which will allow to automatically prepare large longitudinal medication dose datasets for researchers. Availability of such data will enable diverse medication-based studies with drastically reduced barriers to data collection.

Author(s):  
William Smith

A Bayesian hierarchical model that integrated information about state and observation processes was used to estimate the number of adult Delta Smelt entrained into the southern Sacramento−San Joaquin Delta during water export operations by the California State Water Project and the Central Valley Project. The model hierarchy accounted for dynamic processes of transport, survival, sampling efficiency, and observation. Water export, mark−recapture, and fish facility count data informed each process. Model diagnostics and simulation testing indicated a good fit of the model, and that parameters were jointly estimable in the Bayesian hierarchical model framework. The model was limited, however, by sparse data to estimate survival and State Water Project sampling efficiency. Total December to March entrainment of adult Delta Smelt ranged from an estimated 142,488 fish in 2000 to 53 fish in 2014, and the efficiency of louvers used to divert entrained fish to fish facilities appeared to decline at high and low primary intake channel velocities. Though applied to Delta Smelt, the hierarchical modeling framework was sufficiently flexible to estimate the entrainment of other pelagic species.


2020 ◽  
Vol 16 (4) ◽  
pp. 271-289
Author(s):  
Nathan Sandholtz ◽  
Jacob Mortensen ◽  
Luke Bornn

AbstractEvery shot in basketball has an opportunity cost; one player’s shot eliminates all potential opportunities from their teammates for that play. For this reason, player-shot efficiency should ultimately be considered relative to the lineup. This aspect of efficiency—the optimal way to allocate shots within a lineup—is the focus of our paper. Allocative efficiency should be considered in a spatial context since the distribution of shot attempts within a lineup is highly dependent on court location. We propose a new metric for spatial allocative efficiency by comparing a player’s field goal percentage (FG%) to their field goal attempt (FGA) rate in context of both their four teammates on the court and the spatial distribution of their shots. Leveraging publicly available data provided by the National Basketball Association (NBA), we estimate player FG% at every location in the offensive half court using a Bayesian hierarchical model. Then, by ordering a lineup’s estimated FG%s and pairing these rankings with the lineup’s empirical FGA rate rankings, we detect areas where the lineup exhibits inefficient shot allocation. Lastly, we analyze the impact that sub-optimal shot allocation has on a team’s overall offensive potential, demonstrating that inefficient shot allocation correlates with reduced scoring.


2019 ◽  
Vol 15 (4) ◽  
pp. 313-325 ◽  
Author(s):  
Martin Ingram

Abstract A well-established assumption in tennis is that point outcomes on each player’s serve in a match are independent and identically distributed (iid). With this assumption, it is enough to specify the serve probabilities for both players to derive a wide variety of event distributions, such as the expected winner and number of sets, and number of games. However, models using this assumption, which we will refer to as “point-based”, have typically performed worse than other models in the literature at predicting the match winner. This paper presents a point-based Bayesian hierarchical model for predicting the outcome of tennis matches. The model predicts the probability of winning a point on serve given surface, tournament and match date. Each player is given a serve and return skill which is assumed to follow a Gaussian random walk over time. In addition, each player’s skill varies by surface, and tournaments are given tournament-specific intercepts. When evaluated on the ATP’s 2014 season, the model outperforms other point-based models, predicting match outcomes with greater accuracy (68.8% vs. 66.3%) and lower log loss (0.592 vs. 0.641). The results are competitive with approaches modelling the match outcome directly, demonstrating the forecasting potential of the point-based modelling approach.


Sign in / Sign up

Export Citation Format

Share Document