Combining Results of Performance-Based and Informant Test Accuracy Studies: Bayes or Boole?

Author(s):  
Andrew J. Larner

<b><i>Background/Aims:</i></b> Since screening and diagnostic tests for dementia do not have perfect accuracy, &#x3e;1 test is often administered when assessing patients with cognitive complaints. Use of both patient performance tests and informant questionnaires has been recommended. Combination of individual test results may be based on methods originally defined by Thomas Bayes (revision or updating of pretest probabilities to post-test probabilities given the test results) and by George Boole (application of associative “AND” or “OR” operator). This study sought to apply these methods in clinical practice. <b><i>Methods:</i></b> Using the dataset of a pragmatic test accuracy study of the Six-Item Cognitive Impairment Test (6CIT) and informant Ascertain Dementia 8 (AD8), post-test probabilities for the combination were calculated using Bayes’ formula and compared to Boolean “AND” combination. Combined test sensitivity and specificity was calculated using either Boolean “AND” or “OR” operator and compared to results using equations based on individual test sensitivity and specificity. <b><i>Results:</i></b> Both Bayesian and Boolean methods produced similar improvements from pretest probability (0.288) to combined post-test probability for dementia (≈0.5). Likewise, the 2 different methods for calculating combined sensitivities and specificities gave similar results, with, as anticipated, the “AND” combination improving overall specificity (to ≈0.65) whereas the “OR” combination improved sensitivity (to ≈1.00). <b><i>Conclusion:</i></b> Combination of individual screening test results using Bayesian and Boolean methods is relatively straightforward and may add to clinicians’ intuitive judgements when combining test results.

2021 ◽  
Author(s):  
Alfred Kipyegon Keter ◽  
Lutgarde Lynen ◽  
Alastair van Heerden ◽  
Els Goetghebeur ◽  
Bart K.M. Jacobs

Abstract Background Lack of a perfect reference standard for pulmonary tuberculosis (PTB) diagnosis complicates assessment of accuracy of new diagnostic tests. Alternative strategies such as discrepant resolution and use of composite reference standards may lead to incorrect inferences on disease prevalence and diagnostic test sensitivity and specificity. Latent class analysis (LCA), a statistical method for analyzing diagnostic test results in the absence of a gold standard, allows correct estimation under strict assumptions. The model assumes that the diagnostic tests are independent conditional on the true disease status and that the diagnostic test sensitivity and specificity remain constant across subpopulations. These assumptions are violated when a factor such as severe comorbidity affects the prevalence and/or alters the diagnostic test performance. We aim to provide guidance on correct estimation of the prevalence and diagnostic test accuracy based on LCA when a known factor induces dependence among the diagnostic tests. If unaccounted for, this dependence may lead to misleading inferences. Methods Through likelihood evaluation and simulation we examined implications of likely model violations on estimation of prevalence, sensitivity and specificity among passive case-finding presumptive PTB patients with or without HIV. We generated independent results for five diagnostic tests conditional on PTB and HIV. We performed Bayesian LCA, separately for five and three diagnostic tests using four working models with or without constant PTB prevalence and diagnostic test accuracy across HIV subpopulations. Results In evaluating three diagnostic tests, the models accounting for heterogeneity in diagnostic accuracy produced consistent estimates while the models ignoring it produced biased estimates. The model ignoring heterogeneity in PTB prevalence is less problematic. When evaluating five diagnostic tests, the models were robust to violation of the assumptions. Conclusions Well-chosen covariate-specific adaptations of the model can avoid bias implied by recognized heterogeneity in PTB patient populations generating otherwise dependent test results in LCA.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Srinivasa Murthy Doreswamy ◽  
Amulya Ramakrishnegowda

Abstract Objectives Neonates who develop moderate to severe encephalopathy following perinatal asphyxia will benefit from therapeutic hypothermia. Current National Institute of Child Health and Human Development (NICHD) criteria for identifying encephalopathic neonates needing therapeutic hypothermia has high specificity. This results in correctly identifying neonates who have already developed moderate to severe encephalopathy but miss out many potential beneficiaries who progress to develop moderate to severe encephalopathy later. The need is therefore not just to diagnose encephalopathy, but to predict development of encephalopathy and extend the therapeutic benefit for all eligible neonates. The primary objective of the study was to develop and validate the statistical model for prediction of moderate to severe encephalopathy following perinatal asphyxia and compare with current NICHD criteria. Methods The study was designed as prospective observational study. It was carried out in a single center Level 3 perinatal unit in India. Neonates>35 weeks of gestation and requiring resuscitation at birth were included. Levels of resuscitation and blood gas lactate were used to determine the pre-test probability, Thompson score between 3 and 5 h of life was used to determine post-test probability of developing encephalopathy. Primary outcome measure: Validation of Prediction of Encephalopathy in Perinatal Asphyxia (PEPA) score by Holdout method. Results A total of 55 babies were included in the study. The PEPA score was validated by Holdout method where the fitted receiver-operating characteristic (ROC) area for the training and test sample were comparable (p=0.758). The sensitivity and specificity of various PEPA scores for prediction of encephalopathy ranged between 74 and 100% in contrast to NICHD criteria which was 42%. PEPA score of 30 had a best combination of sensitivity and specificity of 95 and 89% respectively. Conclusions PEPA score has a higher sensitivity than NICHD criteria for prediction of Encephalopathy in asphyxiated neonates.


2007 ◽  
Vol 53 (10) ◽  
pp. 1725-1729 ◽  
Author(s):  
Corné Biesheuvel ◽  
Les Irwig ◽  
Patrick Bossuyt

Abstract Before a new test is introduced in clinical practice, its accuracy should be assessed. In the past decade, researchers have put an increased emphasis on exploring differences in test sensitivity and specificity between patient subgroups. If the reference standard is imperfect and the prevalence of the target condition differs among subgroups, apparent differences in test sensitivity and specificity between subgroups may be caused by reference standard misclassification. We provide guidance on how to determine whether observed differences may be explained by reference standard misclassification. Such misclassification may be ascertained by examining how the apparent sensitivity and specificity change with the prevalence of the target condition in the subgroups.


2017 ◽  
Vol 20 (10) ◽  
pp. 955-961 ◽  
Author(s):  
Matthew R Krecic ◽  
Brian A DiGangi ◽  
Brenda Griffin

Objectives The aim of this study was to determine the accuracy of a commercial luteinizing hormone (LH) test as an aid in distinguishing between sexually intact and ovariectomized or castrated domestic cats. Methods Convenience serum samples collected from sexually intact female and male cats (n = 67) undergoing elective sterilization surgery and archived sera from ovariectomized and castrated cats (n = 54) were tested for LH using a commercial diagnostic assay. Test results were compared with the known reproductive status of the cats. Additionally, sera from sexually intact (n = 54) and ovariectomized (n = 94) queens were collected at specific times of the year to evaluate possible seasonal effects on test results. Results Overall test sensitivity was 89.3% (95% confidence interval [CI] 82.3–94.2%), specificity was 92.6% (95% CI 87.1–96.2%) and accuracy was 91.1%. Analysis of results of female cats (n = 216) – sexually intact (n = 87) and ovariectomized (n = 129) – yielded a test sensitivity of 90.8% (95% CI 82.7–96.0%), a specificity of 92.3% (95% CI 86.2–96.2%) and accuracy of 91.7%. Analysis of the results of male cats (n = 53) – sexually intact (n = 19) and neutered (n = 34) – yielded test a sensitivity of 85.3% (95% CI 68.9–95.1%), a specificity of 94.7% (95% CI 74.0–99.9%) and accuracy of 88.7%. The sera of 10 intact queens unexpectedly yielded positive LH results; two of these cats were in estrus, based on visual inspection at the time of ovariohysterectomy. Test accuracy was 94.6% for those 148 samples collected at specific times of the year, with two samples each over three, 3 month periods yielding false-positive results. Conclusions and relevance The commercial point-of-care LH test is a useful adjunct to historical and physical examination findings for determination of reproductive status in domestic cats. Repeat testing 24 h later should be considered for those female cats with signs of estrus and initial positive test results.


Author(s):  
Zoe Brooks ◽  
Saswati Das ◽  
Tom Pliura

During coronavirus pandemic testing and identifying the virus has been a unique and constant challenge for the scientific community. In this paper, we discuss a practical solution to help guide clinicians and public health staff with the interpretation of the probability that a positive, or negative, COVID-19 test result indicates an infected person, based on their clinical estimate of pre-test probability of infection. The LinkedIn survey confirmed that the pre-test probability of COVID-19 increases with patient age, known contact, and severity of symptoms, as well as prevalence of disease in the local population. PPA (Positive Percent Agreement, PPA) and NPA (Negative Percent Agreement, specificity), differ between individual methods. Results vary between laboratories and the manufacturer for the same method. The confidence intervals of results vary with the number of samples tested, often adding a large range of possibilities to the reported test result. The online calculator met the objective.The authors postulated that the clinical pre-test probability of COVID-19 increases relative to local prevalence of disease plus patient age, known contact, and severity of symptoms. We conducted a small survey on LinkedIn to confirm that hypothesis. We examined results of PPA (Positive Percent Agreement, sensitivity) and NPA (Negative Percent Agreement, specificity) from 73 individual laboratory experiments for molecular tests for SARS-CoV-2as reported to the FIND database,(1) and for selected methods in FDA EUA submissions (2,3). We calculated likelihood ratios to convert pre-test to post-test probability of disease, then further calculated the number of true and false results expected in every ten positive or negative test results, plus an estimate that one in &lsquo;x&rsquo; test results is true. We designed an online calculator to create graphics and text to fulfill the objective. A positive or negative test result from one laboratory conveys a higher probability for the presence or absence of COVID-19 than the same result from another laboratory, depending on clinical pre-test probability of disease plus proven method PPA and NPA in each laboratory. Likelihood ratios and confidence intervals provide valuable information but are seldom used in clinical settings. We recommend that testing laboratories verify PPA and NPA, and utilize a tool such as the &ldquo;Clinician&rsquo;s Probability Calculator&rdquo; to verify acceptable test performance and create reports to help guide clinicians and public health staff with estimation of post-test probability of COVID-19 .


Nutrients ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 640
Author(s):  
Carlo Caffarelli ◽  
Carla Mastrorilli ◽  
Angelica Santoro ◽  
Massimo Criscione ◽  
Michela Procaccianti

Hazelnuts commonly elicit allergic reactions starting from childhood and adolescence, with a rare resolution over time. The definite diagnosis of a hazelnut allergy relies on an oral food challenge. The role of component resolved diagnostics in reducing the need for oral food challenges in the diagnosis of hazelnut allergies is still debated. Therefore, three electronic databases were systematically searched for studies on the diagnostic accuracy of specific-IgE (sIgE) on hazelnut proteins for identifying children with a hazelnut allergy. Studies regarding IgE testing on at least one hazelnut allergen component in children whose final diagnosis was determined by oral food challenges or a suggestive history of serious symptoms due to a hazelnut allergy were included. Study quality was assessed by the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Eight studies enrolling 757 children, were identified. Overall, sensitivity, specificity, area under the curve and diagnostic odd ratio of Cor a 1 sIgE were lower than those of Cor a 9 and Cor a 14 sIge. When the test results were positive, the post-test probability of a hazelnut allergy was 34% for Cor a 1 sIgE, 60% for Cor a9 sIgE and 73% for Cor a 14 sIgE. When the test results were negative, the post-test probability of a hazelnut allergy was 55% for Cor a 1 sIgE, 16% for Cor a9 sIgE and 14% for Cor a 14 sIgE. Measurement of IgE levels to Cor a 9 and Cor a 14 might have the potential to improve specificity in detecting clinically tolerant children among hazelnut-sensitized ones, reducing the need to perform oral food challenges.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 369
Author(s):  
Wouter Aukema ◽  
Bobby Rajesh Malhotra ◽  
Simon Goddek ◽  
Ulrike Kämmerer ◽  
Peter Borger ◽  
...  

The performance of diagnostic tests crucially depends on the disease prevalence, test sensitivity, and test specificity. However, these quantities are often not well known when tests are performed outside defined routine lab procedures which make the rating of the test results somewhat problematic. A current example is the mass testing taking place within the context of the world-wide SARS-CoV-2 crisis. Here, for the first time in history, laboratory test results have a dramatic impact on political decisions. Therefore, transparent, comprehensible, and reliable data is mandatory. It is in the nature of wet lab tests that their quality and outcome are influenced by multiple factors reducing their performance by handling procedures, underlying test protocols, and analytical reagents. These limitations in sensitivity and specificity have to be taken into account when calculating the real test results. As a resolution method, we have developed a Bayesian calculator, the Bayes Lines Tool (BLT), for analyzing disease prevalence, test sensitivity, test specificity, and, therefore, true positive, false positive, true negative, and false negative numbers from official test outcome reports. The calculator performs a simple SQL (Structured Query Language) query and can easily be implemented on any system supporting SQL. We provide an example of influenza test results from California, USA, as well as two examples of SARS-CoV-2 test results from official government reports from The Netherlands and Germany-Bavaria, to illustrate the possible parameter space of prevalence, sensitivity, and specificity consistent with the observed data. Finally, we discuss this tool’s multiple applications, including its putative importance for informing policy decisions.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 761
Author(s):  
Andrianto Andrianto ◽  
Ni Made Mertaniasih ◽  
Parama Gandi ◽  
Makhyan Jibril Al-Farabi ◽  
Yusuf Azmi ◽  
...  

Introduction: Xpert MTB/RIF is a rapid diagnostic instrument for pulmonary tuberculosis (TB). However, studies reported varied accuracy of Xpert MTB/RIF in detecting Mycobacterium tuberculosis in pericardial effusion. Methods: We performed a systematic review of literature in PubMed, published up to February 1, 2020, according to PRISMA guidelines. We screened cross-sectional studies, observational cohort studies, and randomized control trials that evaluated the accuracy of Xpert MTB/RIF in diagnosing TB pericarditis. Papers with noninterpretable results of sensitivity and specificity, non-English articles, and unpublished studies were excluded. The primary outcomes were the sensitivity and specificity of Xpert MTB/RIF. We conducted a quality assessment using QUADAS-2 to evaluate the quality of the studies. A bivariate model pooled the overall sensitivity, specificity, positive likelihood ratios (PLRs), and negative likelihood ratios (NLRs) of included studies. Results: In total, 581 subjects from nine studies were analyzed in this meta-analysis. Our pooled analysis showed that the overall sensitivity, specificity, PLRs and NLRs of included studies were 0.676 (95% CI: 0.580–0.759), 0.994 (95% CI: 0.919–1.000), 110.11 (95% CI: 7.65–1584.57) and 0.326 (95% CI: 0.246–0.433), respectively. Conclusions: Xpert MTB/RIF had a robust specificity but unsatisfactory sensitivity in diagnosing TB pericarditis. These findings indicated that although positive Xpert MTB/RIF test results might be valuable in swiftly distinguishing the diagnosis of TB pericarditis, negative test results might not be able to rule out TB pericarditis. Registration: PROSPERO CRD42020167480 28/04/2020


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Karin Due Bruun ◽  
Hanne Irene Jensen ◽  
Morten Rune Blichfeldt-Eckhardt ◽  
Henrik Bjarke Vaegter ◽  
Palle Toft ◽  
...  

Abstract Objectives With the International Classification of Diseases 11th revision (classifying fibromyalgia as a primary pain disorder) soon to be implemented, the importance of pain physicians being able to identify patients with fibromyalgia is emphasized. The diagnostic criteria proposed in 2016 are based on self-reported pain distribution and symptom severity. The study aimed to evaluate the diagnostic accuracy of the 2016 diagnostic criteria for fibromyalgia applied in a population of patients with high impact chronic pain referred for pain rehabilitation. Methods The study was performed as a diagnostic accuracy study at two Danish interdisciplinary pain rehabilitation centers, including 215 participants. All participants were evaluated clinically to identify patients with fibromyalgia. The diagnosis was based on expert opinion, but the minimum requirements were: (1) pain in all four body quadrants and axially for at least three months and (2) minimum 8 of 18 positive tender points. Participants filled in the fibromyalgia survey questionnaire, the patient version of the 2016 diagnostic criteria. Sensitivity, specificity, likelihood ratios, and positive and negative post-test probabilities were calculated using a clinical diagnosis of fibromyalgia as the reference standard. Results Based on clinical diagnosis 45% of the participants were diagnosed with fibromyalgia; of these, only 19% had been diagnosed previously. The 2016 diagnostic criteria demonstrated a sensitivity of 88.5%, a specificity of 81.5%, a positive likelihood ratio of 4.79, a negative likelihood ratio of 0.14, a positive post-test probability of 79.4%, and a negative post-test probability of 10.2%. Conclusions Fibromyalgia was severely under-diagnosed among patients with high impact chronic pain referred to tertiary care in two pain rehabilitation centers in Denmark. The 2016 diagnostic criteria showed sufficient discriminatory properties suggesting that the fibromyalgia survey questionnaire can be used as a screening tool assisting the identification of fibromyalgia in this patient population.


Sign in / Sign up

Export Citation Format

Share Document