scholarly journals An evaluation of DistillerSR’s machine learning-based prioritization tool for title/abstract screening – impact on reviewer-relevant outcomes

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
C. Hamel ◽  
S. E. Kelly ◽  
K. Thavorn ◽  
D. B. Rice ◽  
G. A. Wells ◽  
...  

Abstract Background Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings. Methods Using a true recall @ 95%, response sets from 10 completed systematic reviews were used to evaluate: (i) the reduction of screening burden; (ii) the accuracy of the prioritization algorithm; and (iii) the hours saved when a modified screening approach was implemented. To account for variation in the simulations, and to introduce randomness (through shuffling the references), 10 simulations were run for each review. Means, standard deviations, medians and interquartile ranges (IQR) are presented. Results Among the 10 systematic reviews, using true recall @ 95% there was a median reduction in screening burden of 47.1% (IQR: 37.5 to 58.0%). A median of 41.2% (IQR: 33.4 to 46.9%) of the excluded records needed to be screened to achieve true recall @ 95%. The median title/abstract screening hours saved using a modified screening approach at a true recall @ 95% was 29.8 h (IQR: 28.1 to 74.7 h). This was increased to a median of 36 h (IQR: 32.2 to 79.7 h) when considering the time saved not retrieving and screening full texts of the remaining 5% of records not yet identified as included at title/abstract. Among the 100 simulations (10 simulations per review), none of these 5% of records were a final included study in the systematic review. The reduction in screening burden to achieve true recall @ 95% compared to @ 100% resulted in a reduced screening burden median of 40.6% (IQR: 38.3 to 54.2%). Conclusions The prioritization tool in DistillerSR can reduce screening burden. A modified or stop screening approach once a true recall @ 95% is achieved appears to be a valid method for rapid reviews, and perhaps systematic reviews. This needs to be further evaluated in prospective reviews using the estimated recall.

2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Gerald Gartlehner ◽  
Gernot Wagner ◽  
Linda Lux ◽  
Lisa Affengruber ◽  
Andreea Dobrescu ◽  
...  

Abstract Background Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool. Methods We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard. Results The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%). Conclusions The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Candyce Hamel ◽  
Mona Hersi ◽  
Shannon E. Kelly ◽  
Andrea C. Tricco ◽  
Sharon Straus ◽  
...  

Abstract Background Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening. Methods We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow. Results We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction). Conclusion Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.


2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For 16 reviews we screened a 200-record training set in Abstrackr and downloaded the predicted relevance of the remaining records. We retrospectively simulated the liberal-accelerated screening approach: one reviewer screened the records predicted as relevant; a second reviewer screened those predicted as irrelevant and those excluded by the first reviewer. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics using Fisher’s Exact and unpaired t-tests. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records but saved a median (IQR) 26 (33) hours of screening time. Removing missed studies from meta-analyses did not alter the reviews’ conclusions. Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. ML-assisted screening may be most trustworthy for reviews that seek to include only trials. Several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.


2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenient sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level. Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) − 1.53 (− 2.92, − 0.15) to − 1.17 (− 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.


2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level.Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant.Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.


BMJ Open ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. e043665
Author(s):  
Srinivasa Rao Kundeti ◽  
Manikanda Krishnan Vaidyanathan ◽  
Bharath Shivashankar ◽  
Sankar Prasad Gorthi

IntroductionThe use of artificial intelligence (AI) to support the diagnosis of acute ischaemic stroke (AIS) could improve patient outcomes and facilitate accurate tissue and vessel assessment. However, the evidence in published AI studies is inadequate and difficult to interpret which reduces the accountability of the diagnostic results in clinical settings. This study protocol describes a rigorous systematic review of the accuracy of AI in the diagnosis of AIS and detection of large-vessel occlusions (LVOs).Methods and analysisWe will perform a systematic review and meta-analysis of the performance of AI models for diagnosing AIS and detecting LVOs. We will adhere to the Preferred Reporting Items for Systematic Reviews and Meta-analyses Protocols guidelines. Literature searches will be conducted in eight databases. For data screening and extraction, two reviewers will use a modified Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies checklist. We will assess the included studies using the Quality Assessment of Diagnostic Accuracy Studies guidelines. We will conduct a meta-analysis if sufficient data are available. We will use hierarchical summary receiver operating characteristic curves to estimate the summary operating points, including the pooled sensitivity and specificity, with 95% CIs, if pooling is appropriate. Furthermore, if sufficient data are available, we will use Grading of Recommendations, Assessment, Development and Evaluations profiler software to summarise the main findings of the systematic review, as a summary of results.Ethics and disseminationThere are no ethical considerations associated with this study protocol, as the systematic review focuses on the examination of secondary data. The systematic review results will be used to report on the accuracy, completeness and standard procedures of the included studies. We will disseminate our findings by publishing our analysis in a peer-reviewed journal and, if required, we will communicate with the stakeholders of the studies and bibliographic databases.PROSPERO registration numberCRD42020179652.


2021 ◽  
Author(s):  
Alaa Abd-Alrazaq ◽  
Jens Schneider ◽  
Dari Alhuwail ◽  
Carla T Toro ◽  
Arfan Ahmed ◽  
...  

BACKGROUND Diagnosing mental disorders is usually not an easy task and requires a large amount of time and effort given the complex nature of mental disorders. Artificial intelligence (AI) has been successfully exploited in diagnosing many mental disorders. Numerous systematic reviews summarize the evidence on the accuracy of AI models in diagnosing different mental disorders. OBJECTIVE This umbrella review aims to synthesize results of previous systematic reviews on the performance of AI models in diagnosing mental disorders. METHODS To identify relevant systematic reviews, we searched 11 electronic databases, checked the reference list of the included reviews, and checked the reviews that cited the included reviews. Two reviewers independently selected the relevant reviews, extracted the data from them, and appraised their quality. We synthesized the extracted data using the narrative approach. Specifically, results of the included reviews were grouped based on the target mental disorders that the AI classifiers distinguish. RESULTS We included 15 systematic reviews of 852 citations identified by searching all databases. The included reviews assessed the performance of AI models in diagnosing Alzheimer’s disease (n=7), mild cognitive impairment (n=6), schizophrenia (n=3), bipolar disease (n=2), autism spectrum disorder (n=1), obsessive-compulsive disorder (n=1), post-traumatic stress disorder (n=1), and psychotic disorders (n=1). The performance of the AI models in diagnosing these mental disorders ranged between 21% and 100%. CONCLUSIONS AI technologies offer great promise in diagnosing mental health disorders. The reported performance metrics paint a vivid picture of a bright future for AI in this field. To expedite progress towards these technologies being incorporated into routine practice, we recommend that healthcare professionals in the field cautiously and consciously begin to explore the opportunities of AI-based tools for their daily routine. It would also be encouraging to see a greater number of meta-analyses and further systematic reviews on performance of AI models in diagnosing other common mental disorders such as depression and anxiety. CLINICALTRIAL CRD42021231558


2019 ◽  
Vol 116 ◽  
pp. 98-105 ◽  
Author(s):  
Ingrid Arevalo-Rodriguez ◽  
Paloma Moreno-Nunez ◽  
Barbara Nussbaumer-Streit ◽  
Karen R. Steingart ◽  
Laura del Mar González Peña ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document