scholarly journals Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies

BMJ ◽  
2020 ◽  
pp. m689 ◽  
Author(s):  
Myura Nagendran ◽  
Yang Chen ◽  
Christopher A Lovejoy ◽  
Anthony C Gordon ◽  
Matthieu Komorowski ◽  
...  

Abstract Objective To systematically examine the design, reporting standards, risk of bias, and claims of studies comparing the performance of diagnostic deep learning algorithms for medical imaging with that of expert clinicians. Design Systematic review. Data sources Medline, Embase, Cochrane Central Register of Controlled Trials, and the World Health Organization trial registry from 2010 to June 2019. Eligibility criteria for selecting studies Randomised trial registrations and non-randomised studies comparing the performance of a deep learning algorithm in medical imaging with a contemporary group of one or more expert clinicians. Medical imaging has seen a growing interest in deep learning research. The main distinguishing feature of convolutional neural networks (CNNs) in deep learning is that when CNNs are fed with raw data, they develop their own representations needed for pattern recognition. The algorithm learns for itself the features of an image that are important for classification rather than being told by humans which features to use. The selected studies aimed to use medical imaging for predicting absolute risk of existing disease or classification into diagnostic groups (eg, disease or non-disease). For example, raw chest radiographs tagged with a label such as pneumothorax or no pneumothorax and the CNN learning which pixel patterns suggest pneumothorax. Review methods Adherence to reporting standards was assessed by using CONSORT (consolidated standards of reporting trials) for randomised studies and TRIPOD (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) for non-randomised studies. Risk of bias was assessed by using the Cochrane risk of bias tool for randomised studies and PROBAST (prediction model risk of bias assessment tool) for non-randomised studies. Results Only 10 records were found for deep learning randomised clinical trials, two of which have been published (with low risk of bias, except for lack of blinding, and high adherence to reporting standards) and eight are ongoing. Of 81 non-randomised clinical trials identified, only nine were prospective and just six were tested in a real world clinical setting. The median number of experts in the comparator group was only four (interquartile range 2-9). Full access to all datasets and code was severely limited (unavailable in 95% and 93% of studies, respectively). The overall risk of bias was high in 58 of 81 studies and adherence to reporting standards was suboptimal (<50% adherence for 12 of 29 TRIPOD items). 61 of 81 studies stated in their abstract that performance of artificial intelligence was at least comparable to (or better than) that of clinicians. Only 31 of 81 studies (38%) stated that further prospective studies or trials were required. Conclusions Few prospective deep learning studies and randomised trials exist in medical imaging. Most non-randomised trials are not prospective, are at high risk of bias, and deviate from existing reporting standards. Data and code availability are lacking in most studies, and human comparator groups are often small. Future studies should diminish risk of bias, enhance real world clinical relevance, improve reporting and transparency, and appropriately temper conclusions. Study registration PROSPERO CRD42019123605.

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Andre Esteva ◽  
Katherine Chou ◽  
Serena Yeung ◽  
Nikhil Naik ◽  
Ali Madani ◽  
...  

AbstractA decade of unprecedented progress in artificial intelligence (AI) has demonstrated the potential for many fields—including medicine—to benefit from the insights that AI techniques can extract from data. Here we survey recent progress in the development of modern computer vision techniques—powered by deep learning—for medical applications, focusing on medical imaging, medical video, and clinical deployment. We start by briefly summarizing a decade of progress in convolutional neural networks, including the vision tasks they enable, in the context of healthcare. Next, we discuss several example medical imaging applications that stand to benefit—including cardiology, pathology, dermatology, ophthalmology–and propose new avenues for continued work. We then expand into general medical video, highlighting ways in which clinical workflows can integrate computer vision to enhance care. Finally, we discuss the challenges and hurdles required for real-world clinical deployment of these technologies.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Adel Elfeky ◽  
Katie Gillies ◽  
Heidi Gardner ◽  
Cynthia Fraser ◽  
Timothy Ishaku ◽  
...  

Abstract Background Retention of participants is essential to ensure the statistical power and internal validity of clinical trials. Poor participant retention reduces power and can bias the estimates of intervention effect. There is sparse evidence from randomised comparisons of effective strategies to retain participants in randomised trials. Currently, non-randomised evaluations of trial retention interventions embedded in host clinical trials are rejected from the Cochrane review of strategies to improve retention because it only included randomised evaluations. However, the systematic assessment of non-randomised evaluations may inform trialists’ decision-making about retention methods that have been evaluated in a trial context.Therefore, we performed a systematic review to synthesise evidence from non-randomised evaluations of retention strategies in order to supplement existing randomised trial evidence. Methods We searched MEDLINE, EMBASE, and Cochrane CENTRAL from 2007 to October 2017. Two reviewers independently screened abstracts and full-text articles for non-randomised studies that compared two or more strategies to increase participant retention in randomised trials. The retention trials had to be nested in real ‘host’ trials ( including feasibility studies) but not hypothetical trials. Two investigators independently rated the risk of bias of included studies using the ROBINS-I tool and determined the certainty of evidence using GRADE (Grading of Recommendations Assessment, Development and Evaluation) framework. Results Fourteen non-randomised studies of retention were included in this review. Most retention strategies (in 10 studies) aimed to increase questionnaire response rate. Favourable strategies for increasing questionnaire response rate were telephone follow-up compared to postal questionnaire completion, online questionnaire follow-up compared to postal questionnaire, shortened version of questionnaires versus longer questionnaires, electronically transferred monetary incentives compared to cash incentives, cash compared with no incentive and reminders to non-responders (telephone or text messaging). However, each retention strategy was evaluated in a single observational study. This, together with risk of bias concerns, meant that the overall GRADE certainty was low or very low for all included studies. Conclusions This systematic review provides low or very low certainty evidence on the effectiveness of retention strategies evaluated in non-randomised studies. Some strategies need further evaluation to provide confidence around the size and direction of the underlying effect.


2020 ◽  
Vol 50 (8) ◽  
pp. 1233-1240 ◽  
Author(s):  
Emily R. Cox ◽  
Katie F. M. Marwick ◽  
Robert W. Hunter ◽  
Josef Priller ◽  
Stephen M. Lawrie

AbstractIncreasing evidence suggests that circulating factors and immune dysfunction may contribute to the pathogenesis of schizophrenia. In particular, proinflammatory cytokines, complement and autoantibodies against CNS epitopes have recently been associated with psychosis. Related concepts in previous decades led to several clinical trials of dialysis and plasmapheresis as treatments for schizophrenia. These trials may have relevance for the current understanding of schizophrenia. We aimed to identify whether dialysis or plasmapheresis are beneficial interventions in schizophrenia. We conducted a systematic search in major electronic databases for high-quality studies (double-blinded randomised trials with sham controls) applying either haemodialysis or plasmapheresis as an intervention in patients with schizophrenia, published in English from the start of records until September 2018. We found nine studies meeting inclusion criteria, reporting on 105 patients in total who received either sham or active intervention. One out of eight studies reported a beneficial effect of haemodialysis on schizophrenia, one a detrimental effect and six no effect. The sole trial of plasmapheresis found it to be ineffective. Adverse events were reported in 23% of patients. Studies were at unclear or high risk of bias. It is unlikely that haemodialysis is a beneficial treatment in schizophrenia, although the studies were of small size and could not consider potential subgroups. Plasmapheresis was only addressed by one study and warrants further exploration as a treatment modality in schizophrenia.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Melina von Wernsdorff ◽  
Martin Loef ◽  
Brunna Tuschen-Caffier ◽  
Stefan Schmidt

AbstractOpen-label placebos (OLPs) are placebos without deception in the sense that patients know that they are receiving a placebo. The objective of our study is to systematically review and analyze the effect of OLPs in comparison to no treatment in clinical trials. A systematic literature search was carried out in February 2020. Randomized controlled trials of any medical condition or mental disorder comparing OLPs to no treatment were included. Data extraction and risk of bias rating were independently assessed. 1246 records were screened and thirteen studies were included into the systematic review. Eleven trials were eligible for meta-analysis. These trials assessed effects of OLPs on back pain, cancer-related fatigue, attention deficit hyperactivity disorder, allergic rhinitis, major depression, irritable bowel syndrome and menopausal hot flushes. Risk of bias was moderate among all studies. We found a significant overall effect (standardized mean difference = 0.72, 95% Cl 0.39–1.05, p < 0.0001, I2 = 76%) of OLP. Thus, OLPs appear to be a promising treatment in different conditions but the respective research is in its infancy. More research is needed, especially with respect to different medical and mental disorders and instructions accompanying the OLP administration as well as the role of expectations and mindsets.


Author(s):  
Falk Schwendicke ◽  
Akhilanand Chaurasia ◽  
Lubaina Arsiwala ◽  
Jae-Hong Lee ◽  
Karim Elhennawy ◽  
...  

Abstract Objectives Deep learning (DL) has been increasingly employed for automated landmark detection, e.g., for cephalometric purposes. We performed a systematic review and meta-analysis to assess the accuracy and underlying evidence for DL for cephalometric landmark detection on 2-D and 3-D radiographs. Methods Diagnostic accuracy studies published in 2015-2020 in Medline/Embase/IEEE/arXiv and employing DL for cephalometric landmark detection were identified and extracted by two independent reviewers. Random-effects meta-analysis, subgroup, and meta-regression were performed, and study quality was assessed using QUADAS-2. The review was registered (PROSPERO no. 227498). Data From 321 identified records, 19 studies (published 2017–2020), all employing convolutional neural networks, mainly on 2-D lateral radiographs (n=15), using data from publicly available datasets (n=12) and testing the detection of a mean of 30 (SD: 25; range.: 7–93) landmarks, were included. The reference test was established by two experts (n=11), 1 expert (n=4), 3 experts (n=3), and a set of annotators (n=1). Risk of bias was high, and applicability concerns were detected for most studies, mainly regarding the data selection and reference test conduct. Landmark prediction error centered around a 2-mm error threshold (mean; 95% confidence interval: (–0.581; 95 CI: –1.264 to 0.102 mm)). The proportion of landmarks detected within this 2-mm threshold was 0.799 (0.770 to 0.824). Conclusions DL shows relatively high accuracy for detecting landmarks on cephalometric imagery. The overall body of evidence is consistent but suffers from high risk of bias. Demonstrating robustness and generalizability of DL for landmark detection is needed. Clinical significance Existing DL models show consistent and largely high accuracy for automated detection of cephalometric landmarks. The majority of studies so far focused on 2-D imagery; data on 3-D imagery are sparse, but promising. Future studies should focus on demonstrating generalizability, robustness, and clinical usefulness of DL for this objective.


2021 ◽  
Vol 54 (6) ◽  
pp. 1-35
Author(s):  
Ninareh Mehrabi ◽  
Fred Morstatter ◽  
Nripsuta Saxena ◽  
Kristina Lerman ◽  
Aram Galstyan

With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them. In this survey, we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and ways they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.


2021 ◽  
Vol 15 ◽  
pp. 175346662110280
Author(s):  
Roberto Ariel Abeldaño Zuñiga ◽  
Ruth Ana María González-Villoria ◽  
María Vanesa Elizondo ◽  
Anel Yaneli Nicolás Osorio ◽  
David Gómez Martínez ◽  
...  

Aims: Given the variability of previously reported results, this systematic review aims to determine the clinical effectiveness of convalescent plasma employed in the treatment of hospitalized patients diagnosed with COVID-19. Methods: We conducted a systematic review of controlled clinical trials assessing treatment with convalescent plasma for hospitalized patients diagnosed with SARS-CoV-2 infection. The outcomes were mortality, clinical improvement, and ventilation requirement. Results: A total of 51 studies were retrieved from the databases. Five articles were finally included in the data extraction and qualitative and quantitative synthesis of results. The overall risk of bias in the reviewed articles was established at low-risk only in two trials. The meta-analysis suggests that there is no benefit of convalescent plasma compared with standard care or placebo in reducing the overall mortality and the ventilation requirement. However, there could be a benefit for the clinical improvement in patients treated with plasma. Conclusion: Current results led to assume that the convalescent plasma transfusion cannot reduce the mortality or ventilation requirement in hospitalized patients diagnosed with SARS-CoV-2 infection. More controlled clinical trials conducted with methodologies that ensure a low risk of bias are still needed. The reviews of this paper are available via the supplemental material section.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shelly Soffer ◽  
Eyal Klang ◽  
Orit Shimon ◽  
Yiftach Barash ◽  
Noa Cahan ◽  
...  

AbstractComputed tomographic pulmonary angiography (CTPA) is the gold standard for pulmonary embolism (PE) diagnosis. However, this diagnosis is susceptible to misdiagnosis. In this study, we aimed to perform a systematic review of current literature applying deep learning for the diagnosis of PE on CTPA. MEDLINE/PUBMED were searched for studies that reported on the accuracy of deep learning algorithms for PE on CTPA. The risk of bias was evaluated using the QUADAS-2 tool. Pooled sensitivity and specificity were calculated. Summary receiver operating characteristic curves were plotted. Seven studies met our inclusion criteria. A total of 36,847 CTPA studies were analyzed. All studies were retrospective. Five studies provided enough data to calculate summary estimates. The pooled sensitivity and specificity for PE detection were 0.88 (95% CI 0.803–0.927) and 0.86 (95% CI 0.756–0.924), respectively. Most studies had a high risk of bias. Our study suggests that deep learning models can detect PE on CTPA with satisfactory sensitivity and an acceptable number of false positive cases. Yet, these are only preliminary retrospective works, indicating the need for future research to determine the clinical impact of automated PE detection on patient care. Deep learning models are gradually being implemented in hospital systems, and it is important to understand the strengths and limitations of these algorithms.


Sign in / Sign up

Export Citation Format

Share Document