scholarly journals Decoding semi-automated title-abstract screening: a retrospective exploration of the review, study, and publication characteristics associated with accurate relevance predictions

2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenient sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level. Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.

2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For 16 reviews we screened a 200-record training set in Abstrackr and downloaded the predicted relevance of the remaining records. We retrospectively simulated the liberal-accelerated screening approach: one reviewer screened the records predicted as relevant; a second reviewer screened those predicted as irrelevant and those excluded by the first reviewer. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics using Fisher’s Exact and unpaired t-tests. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records but saved a median (IQR) 26 (33) hours of screening time. Removing missed studies from meta-analyses did not alter the reviews’ conclusions. Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. ML-assisted screening may be most trustworthy for reviews that seek to include only trials. Several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening and explored whether Abstrackr’s predictions varied by review or study-level characteristics. Methods For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews), we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level characteristics. Results Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) h of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) − 1.53 (− 2.92, − 0.15) to − 1.17 (− 2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P = 0.37) or intervention type (simple or complex, P = 0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P = 0.01), or that included only trials (95%) vs. multiple designs (86%) (P = 0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P = 0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P = 0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P = 0.02) were more often correctly predicted as relevant. Conclusion Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. Several of our findings are paradoxical and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample, the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.


2020 ◽  
Author(s):  
Allison Gates ◽  
Michelle Gates ◽  
Daniel DaRosa ◽  
Sarah A. Elliott ◽  
Jennifer Pillay ◽  
...  

Abstract Background. We evaluated the benefits and risks of using the Abstrackr machine learning (ML) tool to semi-automate title-abstract screening, and explored whether Abstrackr’s predictions varied by review or study-level characteristics.Methods. For a convenience sample of 16 reviews for which adequate data were available to address our objectives (11 systematic reviews and 5 rapid reviews) we screened a 200-record training set in Abstrackr and downloaded the relevance (relevant or irrelevant) of the remaining records, as predicted by the tool. We retrospectively simulated the liberal-accelerated screening approach. We estimated the time savings and proportion missed compared with dual independent screening. For reviews with pairwise meta-analyses, we evaluated changes to the pooled effects after removing the missed studies. We explored whether the tool’s predictions varied by review and study-level.Results. Using the ML-assisted liberal-accelerated approach, we wrongly excluded 0 to 3 (0 to 14%) records that were included in the final reports, but saved a median (IQR) 26 (9, 42) hours of screening time. One missed study was included in eight pairwise meta-analyses in one systematic review. The pooled effect for just one of those meta-analyses changed considerably (from MD (95% CI) -1.53 (-2.92, -0.15) to -1.17 (-2.70, 0.36)). Of 802 records in the final reports, 87% were correctly predicted as relevant. The correctness of the predictions did not differ by review (systematic or rapid, P=0.37) or intervention type (simple or complex, P=0.47). The predictions were more often correct in reviews with multiple (89%) vs. single (83%) research questions (P=0.01), or that included only trials (95%) vs. multiple designs (86%) (P=0.003). At the study level, trials (91%), mixed methods (100%), and qualitative (93%) studies were more often correctly predicted as relevant compared with observational studies (79%) or reviews (83%) (P=0.0006). Studies at high or unclear (88%) vs. low risk of bias (80%) (P=0.039), and those published more recently (mean (SD) 2008 (7) vs. 2006 (10), P=0.02) were more often correctly predicted as relevant.Conclusion. Our screening approach saved time and may be suitable in conditions where the limited risk of missing relevant records is acceptable. As several of our findings are paradoxical, and require further study to fully understand the tasks to which ML-assisted screening is best suited. The findings should be interpreted in light of the fact that the protocol was prepared for the funder, but not published a priori. Because we used a convenience sample the findings may be prone to selection bias. The results may not be generalizable to other samples of reviews, ML tools, or screening approaches. The small number of missed studies across reviews with pairwise meta-analyses hindered strong conclusions about the effect of missed studies on the results and conclusions of systematic reviews.


2021 ◽  
Vol 10 (1) ◽  
pp. 96
Author(s):  
Saeid Eslami ◽  
Raheleh Ganjali

Introduction: On March 20, 2020, the World Health Organization (WHO) announced the spread of SARS-CoV-2 infection in most countries worldwide as a pandemic. COVID-19 is mainly disseminated through human-to-human transmission route via direct contact and respiratory droplets. Telehealth and/or telemedicine technologies are beneficial methods that could be employed to deal with pandemic situation of communicable infections. The purpose of this proposed systematic review study is to sum up the functionalities, applications, and technologies of telemedicine during COVID-19 outbreak.Material and Methods: This review will be carried out in accordance with the Cochrane Handbook and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) reporting guidelines. PubMed and Scopus databases were searched for related articles. Randomized and non-randomized controlled trials published in English in scientific journals were identified to be evaluated for eligibility. Articles conducted on telemedicine services (TMS) during COVID-19 outbreak (2019-2020) were identified to be evaluated.Results: The literature search for related articles in PubMed and Scopus databases led to the identification and retrieval of a total of 1118 and 485 articles, respectively. After eliminating duplicate articles, title and abstract screening process was performed for the remaining 1440 articles. The current study findings are anticipated to be used as a guide by researchers, decision makers, and managers to design, implement, and assess TMS during COVID-19 crisis.Conclusion: As far as we know, this systematic review is conducted to comprehensively evaluate TM methods and technologies developed with the aim of controlling and managing COVID-19 pandemic. This study highlights important applications of telemedicine in pandemic conditions, which could be employed by future health systems in controlling and managing communicable infections when an outbreak occurs.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
C. Hamel ◽  
S. E. Kelly ◽  
K. Thavorn ◽  
D. B. Rice ◽  
G. A. Wells ◽  
...  

Abstract Background Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings. Methods Using a true recall @ 95%, response sets from 10 completed systematic reviews were used to evaluate: (i) the reduction of screening burden; (ii) the accuracy of the prioritization algorithm; and (iii) the hours saved when a modified screening approach was implemented. To account for variation in the simulations, and to introduce randomness (through shuffling the references), 10 simulations were run for each review. Means, standard deviations, medians and interquartile ranges (IQR) are presented. Results Among the 10 systematic reviews, using true recall @ 95% there was a median reduction in screening burden of 47.1% (IQR: 37.5 to 58.0%). A median of 41.2% (IQR: 33.4 to 46.9%) of the excluded records needed to be screened to achieve true recall @ 95%. The median title/abstract screening hours saved using a modified screening approach at a true recall @ 95% was 29.8 h (IQR: 28.1 to 74.7 h). This was increased to a median of 36 h (IQR: 32.2 to 79.7 h) when considering the time saved not retrieving and screening full texts of the remaining 5% of records not yet identified as included at title/abstract. Among the 100 simulations (10 simulations per review), none of these 5% of records were a final included study in the systematic review. The reduction in screening burden to achieve true recall @ 95% compared to @ 100% resulted in a reduced screening burden median of 40.6% (IQR: 38.3 to 54.2%). Conclusions The prioritization tool in DistillerSR can reduce screening burden. A modified or stop screening approach once a true recall @ 95% is achieved appears to be a valid method for rapid reviews, and perhaps systematic reviews. This needs to be further evaluated in prospective reviews using the estimated recall.


2013 ◽  
Vol 12 (4) ◽  
pp. 157-169 ◽  
Author(s):  
Philip L. Roth ◽  
Allen I. Huffcutt

The topic of what interviews measure has received a great deal of attention over the years. One line of research has investigated the relationship between interviews and the construct of cognitive ability. A previous meta-analysis reported an overall corrected correlation of .40 ( Huffcutt, Roth, & McDaniel, 1996 ). A more recent meta-analysis reported a noticeably lower corrected correlation of .27 ( Berry, Sackett, & Landers, 2007 ). After reviewing both meta-analyses, it appears that the two studies posed different research questions. Further, there were a number of coding judgments in Berry et al. that merit review, and there was no moderator analysis for educational versus employment interviews. As a result, we reanalyzed the work by Berry et al. and found a corrected correlation of .42 for employment interviews (.15 higher than Berry et al., a 56% increase). Further, educational interviews were associated with a corrected correlation of .21, supporting their influence as a moderator. We suggest a better estimate of the correlation between employment interviews and cognitive ability is .42, and this takes us “back to the future” in that the better overall estimate of the employment interviews – cognitive ability relationship is roughly .40. This difference has implications for what is being measured by interviews and their incremental validity.


2021 ◽  
pp. 001440292110508
Author(s):  
Gena Nelson ◽  
Soyoung Park ◽  
Tasia Brafford ◽  
Nicole A. Heller ◽  
Angela R. Crawford ◽  
...  

Researchers and practitioners alike often look to meta-analyses to identify effective practices to use with students with disabilities. The number of meta-analyses in special education has also expanded in recent years. The purpose of this systematic review is to evaluate the quality of reporting in meta-analyses focused on mathematics interventions for students with or at risk of disabilities. We applied 53 quality indicators (QIs) across eight categories based on recommendations from Talbott et al. to 22 mathematics intervention meta-analyses published between 2000 and 2020. Overall, the meta-analyses met 61% of QIs and results indicated that meta-analyses most frequently met QIs related to providing a clear purpose (95%) and data analysis plan (77%), whereas meta-analyses typically met fewer QIs related to describing participants (39%) and explaining the abstract screening process (48%). We discuss the variation in quality indicator scores within and across the quality categories and provide recommendations for future researchers.


2022 ◽  
Vol 22 (1) ◽  
pp. 1-46
Author(s):  
Sarah Heckman ◽  
Jeffrey C. Carver ◽  
Mark Sherriff ◽  
Ahmed Al-zubidy

Context. Computing Education Research (CER) is critical to help the computing education community and policy makers support the increasing population of students who need to learn computing skills for future careers. For a community to systematically advance knowledge about a topic, the members must be able to understand published work thoroughly enough to perform replications, conduct meta-analyses, and build theories. There is a need to understand whether published research allows the CER community to systematically advance knowledge and build theories. Objectives. The goal of this study is to characterize the reporting of empiricism in Computing Education Research literature by identifying whether publications include content necessary for researchers to perform replications, meta-analyses, and theory building. We answer three research questions related to this goal: (RQ1) What percentage of papers in CER venues have some form of empirical evaluation? (RQ2) Of the papers that have empirical evaluation, what are the characteristics of the empirical evaluation? (RQ3) Of the papers that have empirical evaluation, do they follow norms (both for inclusion and for labeling of information needed for replication, meta-analysis, and, eventually, theory-building) for reporting empirical work? Methods. We conducted a systematic literature review of the 2014 and 2015 proceedings or issues of five CER venues: Technical Symposium on Computer Science Education (SIGCSE TS), International Symposium on Computing Education Research (ICER), Conference on Innovation and Technology in Computer Science Education (ITiCSE), ACM Transactions on Computing Education (TOCE), and Computer Science Education (CSE). We developed and applied the CER Empiricism Assessment Rubric to the 427 papers accepted and published at these venues over 2014 and 2015. Two people evaluated each paper using the Base Rubric for characterizing the paper. An individual person applied the other rubrics to characterize the norms of reporting, as appropriate for the paper type. Any discrepancies or questions were discussed between multiple reviewers to resolve. Results. We found that over 80% of papers accepted across all five venues had some form of empirical evaluation. Quantitative evaluation methods were the most frequently reported. Papers most frequently reported results on interventions around pedagogical techniques, curriculum, community, or tools. There was a split in papers that had some type of comparison between an intervention and some other dataset or baseline. Most papers reported related work, following the expectations for doing so in the SIGCSE and CER community. However, many papers were lacking properly reported research objectives, goals, research questions, or hypotheses; description of participants; study design; data collection; and threats to validity. These results align with prior surveys of the CER literature. Conclusions. CER authors are contributing empirical results to the literature; however, not all norms for reporting are met. We encourage authors to provide clear, labeled details about their work so readers can use the study methodologies and results for replications and meta-analyses. As our community grows, our reporting of CER should mature to help establish computing education theory to support the next generation of computing learners.


2020 ◽  
Author(s):  
Daniel Lakens ◽  
Lisa Marie DeBruine

Making scientific information machine-readable greatly facilitates its re-use. Many scientific articles have the goal to test a hypothesis, so making the tests of statistical predictions easier to find and access could be very beneficial. We propose an approach that can be used to make hypothesis tests machine readable. We believe there are two benefits to specifying a hypothesis test in a way that a computer can evaluate whether the statistical prediction is corroborated or not. First, hypothesis test will become more transparent, falsifiable, and rigorous. Second, scientists will benefit if information related to hypothesis tests in scientific articles is easily findable and re-usable, for example when performing meta-analyses, during peer review, and when examining meta-scientific research questions. We examine what a machine readable hypothesis test should look like, and demonstrate the feasibility of machine readable hypothesis tests in a real-life example using the fully operational prototype R package scienceverse.


2021 ◽  
Author(s):  
Alireza Razzaghi ◽  
Fatemeh Sadat Asgarian ◽  
Hossein Akbari

BACKGROUND The pandemic of Covid-19 started in China in late 2019 and has spread rapidly around the world. Psychological problems such as suicide ideation of Covid-19 is one of the main consequences of this pandemic, which needs more attention OBJECTIVE This study aims to determine a comprehensive estimate of the prevalence of suicidal ideation in patients with Covid-19 in the world. METHODS The review study will be based on the following databases: databases of Web of Sciences, Mediline/Pubmed, Proquest, Scopus, Science Direct. This study is limited to original studies published in peer-reviewed journals in English. RESULTS The quality assessment of studies will be done using the Joanna Briggs Institute’s (JBI) critical appraisal checklist for reporting the prevalence data. The overall synthetized measurement will be presented as Prevalence with 95% confidence intervals. CONCLUSIONS This review and meta-analyses will be the first study that explores the prevalence of suicide ideation related to Covid-19. Summarizing the related data can create an image of the dimensions of the problem across the world and provide plans to prevent that.


Sign in / Sign up

Export Citation Format

Share Document