Abstract
Background
The Observatory Evidence Service (OES) at Public Health Wales supports evidence informed decision making by conducting evidence reviews, which follow systematic review methodology, on complex public health topics. Machine-learning technologies have the potential to aid in screening studies for inclusion in reviews, and the OES have undertaken testing of one such system, RobotAnalyst, to assess its accuracy and to determine if it would increase the efficiency of the review process.
Methods
Retrospective testing was undertaken using three previously completed evidence reviews. For each test, references were uploaded into RobotAnalyst and the decisions made by the original reviewers were input in blocks of 25 to form a training set. The “update predictions” function generated a predicted inclusion decision for the remaining references at each test point and these were compared to the original review decisions.
We calculated RobotAnalysts sensitivity, specificity, positive predictive value, false include and exclude rate and the proportion of missed references.
Results
Mixed levels of performance were observed. An overall increase in sensitivity as more studies were added to the training set was detected for two of the three reference sets when screened at title stage, but only in one case did RobotAnalyst produce relatively high levels of sensitivity (over 90%). This was observed in reference test set one (n = 500 references), where sensitivity increased from 51% at the start of testing to 91% after 250 references had been manually marked on the system. Although performance tended to be higher as more studies were added to the training sets, the increases were not always linear.
Conclusions
There may be some promise in using RobotAnalyst as a second screener, especially on larger reference sets when the human resource demands of duplicate screening are considerable. We are continuing to test RobotAnalyst both retrospectively and prospectively.
Key messages
Retrospective testing of RobotAnalyst observed mixed levels of performance. RobotAnalyst could potentially be utilised as a second screener for evidence review.