Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level.Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant.Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.

Download Full-text

Decoding semi-automated title-abstract screening: findings from a convenience sample of reviews

Systematic Reviews ◽

10.1186/s13643-020-01528-x ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Allison Gates ◽

Michelle Gates ◽

Daniel DaRosa ◽

Sarah A. Elliott ◽

Jennifer Pillay ◽

...

Keyword(s):

Systematic Reviews ◽

A Priori ◽

Convenience Sample ◽

Screening Approach ◽

Independent Screening ◽

Review Systematic ◽

Meta Analyses ◽

Abstract Screening ◽

Adequate Data ◽

Screening Approaches

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) − 1.53 (− 2.92, − 0.15) to − 1.17 (− 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.

Download Full-text

Decoding semi-automated title-abstract screening: a retrospective exploration of the review, study, and publication characteristics associated with accurate relevance predictions

10.21203/rs.3.rs-40780/v2 ◽

2020 ◽

Author(s):

Allison Gates ◽

Michelle Gates ◽

Daniel DaRosa ◽

Sarah A. Elliott ◽

Jennifer Pillay ◽

...

Keyword(s):

Training Set ◽

Screening Approach ◽

Independent Screening ◽

Review Systematic ◽

Time Savings ◽

Research Questions ◽

Review Study ◽

Meta Analyses ◽

Abstract Screening ◽

Adequate Data

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenient sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level. Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.

Download Full-text

Decoding semi-automated title-abstract screening: a retrospective exploration of the review, study, and publication characteristics associated with accurate relevance predictions

10.21203/rs.3.rs-40780/v1 ◽

2020 ◽

Author(s):

Allison Gates ◽

Michelle Gates ◽

Daniel DaRosa ◽

Sarah A. Elliott ◽

Jennifer Pillay ◽

...

Keyword(s):

Observational Studies ◽

Training Set ◽

Screening Approach ◽

Independent Screening ◽

Review Systematic ◽

Time Savings ◽

Research Questions ◽

Review Study ◽

Meta Analyses ◽

Abstract Screening

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For 16 reviews we screened a 200-record training set in Abstrackr and downloaded the predicted relevance of the remaining records. We retrospectively simulated the liberal-accelerated screening approach: one reviewer screened the records predicted as relevant; a second reviewer screened those predicted as irrelevant and those excluded by the first reviewer. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics using Fisher’s Exact and unpaired t-tests. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records but saved a median (IQR) 26 (33) hours of screening time. Removing missed studies from meta-analyses did not alter the reviews’ conclusions. Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. ML-assisted screening may be most trustworthy for reviews that seek to include only trials. Several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.

Download Full-text

Best practice guidelines for abstract screening large‐evidence systematic reviews and meta‐analyses

Research Synthesis Methods ◽

10.1002/jrsm.1354 ◽

2019 ◽

Vol 10 (3) ◽

pp. 330-342 ◽

Cited By ~ 6

Author(s):

Joshua R. Polanin ◽

Terri D. Pigott ◽

Dorothy L. Espelage ◽

Jennifer K. Grotpeter

Keyword(s):

Systematic Reviews ◽

Practice Guidelines ◽

Best Practice ◽

Best Practice Guidelines ◽

Meta Analyses ◽

Abstract Screening

Download Full-text

Quality assessment of systematic reviews and meta-analyses that examine preventive antibiotic uses and management practices designed to prevent disease in livestock

Animal Health Research Reviews ◽

10.1017/s146625231900029x ◽

2019 ◽

Vol 20 (2) ◽

pp. 305-318

Author(s):

Rachael Vriezen ◽

Jan M. Sargeant ◽

Ellen Vriezen ◽

Charlotte B. Winder ◽

Annette M. O'Connor

Keyword(s):

Quality Assessment ◽

Systematic Reviews ◽

Management Practices ◽

Disease Risk ◽

Data Extraction ◽

A Priori ◽

Antibiotic Use ◽

Study Selection ◽

Meta Analyses

AbstractTo implement effective stewardship in food animal production, it is essential that producers and veterinarians are aware of preventive interventions to reduce illness in livestock. Systematic reviews and meta-analyses (SR/MA) provide transparent, replicable, and quality-assessed overviews. At present, it is unknown how many SR/MA evaluate preventive antibiotic use or management practices aimed at reducing disease risk in animal agriculture. Further, the quality of existing reviews is unknown. Our aim was to identify reviews investigating these topics and to provide an assessment of their quality. Thirty-eight relevant reviews were identified. Quality assessment was based on the AMSTAR 2 framework for the critical appraisal of systematic reviews. The quality of most of the reviews captured was classified as critically low (84.2%, n = 32/38), and only a small percentage of the evaluated reviews did not contain critical weaknesses (7.9%, n = 3/38). Particularly, a small number of reviews reported the development of an a priori protocol (15.8%, n = 6/38), and few reviews stated that key review steps were conducted in duplicate (study selection/screening: 26.3%, n = 10/38; data extraction: 15.8%, n = 6/38). The development of high-quality reviews summarizing evidence on approaches to antibiotic reduction is essential, and thus greater adherence to quality conduct guidelines for synthesis research is crucial.

Download Full-text

The PRISMA 2020 statement: an updated guideline for reporting systematic reviews

10.31222/osf.io/v7gm2 ◽

2020 ◽

Cited By ~ 4

Author(s):

Matthew James Page ◽

Joanne McKenzie ◽

Patrick Bossuyt ◽

Isabelle Boutron ◽

Tammy Hoffmann ◽

...

Keyword(s):

Systematic Review ◽

Health Care Providers ◽

Systematic Reviews ◽

Online Survey ◽

Flow Diagram ◽

Care Providers ◽

Systematic Review Methodology ◽

Convenience Sample ◽

Item Checklist ◽

Meta Analyses

Background: The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did and what they found. Over the last decade, there have been many advances in systematic review methodology and terminology, which have necessitated an update to the guideline.Objectives: To develop the PRISMA 2020 statement for reporting systematic reviews.Methods: We reviewed 60 documents with reporting guidance for systematic reviews to generate suggested modifications to the PRISMA 2009 statement. We sought feedback on the suggested modifications through an online survey of 110 systematic review methodologists and journal editors. The results of the review and survey were discussed at a 21-member in-person meeting. Following the meeting, drafts of the PRISMA 2020 checklist, abstract checklist, explanation and elaboration and flow diagram were generated and refined iteratively based on feedback from co-authors and a convenience sample of 15 systematic reviewers.Results: In this statement paper, we present the PRISMA 2020 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and the revised flow diagrams for original and updated reviews. The checklist includes new reporting guidance that reflects advances in methods to identify, select, appraise and synthesise studies. The structure and presentation of the items have been modified to facilitate implementation. The PRISMA 2020 statement replaces the 2009 statement.Conclusions: The PRISMA 2020 statement is intended to facilitate transparent, complete and accurate reporting of systematic reviews. Improved reporting should benefit users of reviews, including guideline developers, policy makers, health care providers, patients and other stakeholders. In order to achieve this, we encourage authors, editors and peer-reviewers to adopt the guideline.

Download Full-text

Cancer-related fatigue: Systematic reviews and meta-analyses of mind–body intervention

Palliative & Supportive Care ◽

10.1017/s1478951520001194 ◽

2020 ◽

pp. 1-6

Author(s):

Huda Anshasi ◽

Muayyad Ahmad

Keyword(s):

Systematic Reviews ◽

Methodological Quality ◽

A Priori ◽

Publication Type ◽

Gray Literature ◽

Cancer Related Fatigue ◽

Comprehensive Search ◽

Meta Analyses

Abstract Objectives This study aimed to evaluate the methodological quality of systematic reviews and meta-analyses of mind–body interventions (MBIs) for the management of cancer-related fatigue. Methods A comprehensive search on multiple databases was conducted to identify relevant systematic reviews and meta-analyses published from January 2008 to December 2019. Two authors independently selected reviews, extracted data, and evaluated the methodological quality of included reviews using Assessing the Methodological Quality of Systematic Reviews (AMSTAR). Results Sixteen reviews published between 2010 and 2018 were eligible for inclusion. The methodological quality of the 16 included systematic reviews was moderate (score 4–7) to high (score ≥ 8) on the 11-point AMSTAR scale. The most common methodological weaknesses were the lack of a list of excluded studies (n = 15, 93.8%) and a priori protocol (n = 2,87.5%). Furthermore, most of the systematic reviews did not search the gray literature for eligible studies (n = 13, 81.3%). Significance of the study This study has revealed the need for high methodological quality systematic reviews on the MBIs for the management of cancer-related fatigue. Thus, further research should focus on methodologically strong systematic reviews by providing a priori design, not limiting the publication type, and providing an excluded primary studies list. Additionally, the researchers should conduct systematic reviews according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guideline.

Download Full-text

Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study

Systematic Reviews ◽

10.1186/s13643-019-1221-3 ◽

2019 ◽

Vol 8 (1) ◽

Cited By ~ 4

Author(s):

Gerald Gartlehner ◽

Gernot Wagner ◽

Linda Lux ◽

Lisa Affengruber ◽

Andreea Dobrescu ◽

...

Keyword(s):

Language Processing ◽

Systematic Reviews ◽

User Study ◽

Screening Tool ◽

Area Under The Curve ◽

Reference Standard ◽

Processing Technologies ◽

Screening Approach ◽

Automated Screening ◽

Abstract Screening

Abstract Background Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool. Methods We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard. Results The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%). Conclusions The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.

Download Full-text

AMSTAR 2 Appraisal of Systematic Reviews and Meta-analyses in the Field of Heart Failure from High-impact Journals

10.21203/rs.3.rs-992988/v1 ◽

2021 ◽

Author(s):

LIN LI ◽

IRIAGBONSE ROTIMI ASEMOTA ◽

BOLUN LIU ◽

JAVIER GOMEZ-VALENZIA ◽

LIFENG LIN ◽

...

Keyword(s):

Heart Failure ◽

Systematic Reviews ◽

Meta Analysis ◽

A Priori ◽

Compliance Rate ◽

High Impact ◽

Measurement Tool ◽

High Quality ◽

Critical Appraisal Tool ◽

Meta Analyses

Abstract Background: The MeaSurement Tool to Assess systematic Reviews (AMSTAR) 2 is a critical appraisal tool for systematic reviews (SRs) and meta-analyses (MAs) of interventions. We aimed to perform the first AMSTAR 2-based quality assessment of heart failure-related studies. Methods: Eleven high-impact journals were searched from 2009 to 2019. The included studies were assessed on the basis of 16 domains. Seven domains were deemed critical for high-quality studies. On the basis of the performance in these 16 domains with different weights, overall ratings were generated and the quality was determined to be “high,” “moderate,” “low,” or “critically low.” Results: Eighty-one heart failure-related SRs with MAs were included. Overall, 79 studies were of “critically low quality” and two were of “low quality.” These findings were attributed to insufficiency in the following critical domains: a priori protocols (compliance rate, 5%), complete list of exclusions with justification (5%), risk of bias assessment (69%), meta-analysis methodology (78%), and investigation of publication bias (60%).Conclusions: The low ratings for these potential high-quality heart failure-related SRs and MAs challenge the discrimination capacity of AMSTAR 2. In addition to identifying certain areas of insufficiency, these findings indicate the need to justify or modify AMSTAR 2’s rating rules.

Download Full-text

Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement

10.31222/osf.io/jb4dx ◽

2020 ◽

Cited By ~ 1

Author(s):

Matthew James Page ◽

Joanne McKenzie ◽

Patrick Bossuyt ◽

Isabelle Boutron ◽

Tammy Hoffmann ◽

...

Keyword(s):

Systematic Review ◽

Systematic Reviews ◽

Development Process ◽

Reporting Guidelines ◽

Systematic Review Methodology ◽

Convenience Sample ◽

Journal Editors ◽

Prisma Statement ◽

Meta Analyses

Background: The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Statement, published in 2009, aimed to help systematic reviewers prepare a transparent report of their review. Advances in systematic review methodology and terminology over the last decade necessitated an update to the guideline. A detailed description of the updating process may provide a useful roadmap for others embarking on a similar initiative.Objectives: To (i) describe the processes used to update the PRISMA 2009 statement for reporting systematic reviews, (ii) present results of a survey conducted to inform the update, (iii) summarise decisions made at the PRISMA update meeting, and (iv) describe and justify changes made to the guideline.Methods: We reviewed 60 documents with reporting guidance for systematic reviews to generate suggested modifications to the PRISMA 2009 statement. We invited 220 systematic review methodologists and journal editors to complete a survey about the suggested modifications. The results of these projects were discussed at a 21-member in-person meeting. Following the meeting, we drafted the PRISMA 2020 statement and refined it based on feedback from co-authors and a convenience sample of 15 systematic reviewers. Results: The review of 60 documents with reporting guidance for systematic reviews resulted in a bank of 221 unique reporting items and revealed that all topics addressed by the PRISMA 2009 statement could be modified or supplemented with additional guidance. Of the 110 respondents to the survey, more than 66% recommended keeping six of the 27 PRISMA 2009 checklist items as they were and modifying 15 of the checklist items using wording suggested by us; there was no consensus on what to do with the remaining six items. Attendees at the in-person meeting supported the revised wording for several items but suggested rewording for most items to enhance clarity, and further refinements were made over six drafts of the guideline. Conclusions: The PRISMA 2020 statement consists of updated reporting guidance for systematic reviews and reflects advances over the last decade in methods to identify, select, appraise and synthesise studies. We hope that providing this detailed description of the development process will enhance the acceptance and uptake of the guideline and assist those developing and updating future reporting guidelines.

Download Full-text