scholarly journals Too good to be true: when overwhelming evidence fails to convince

Author(s):  
Lachlan J. Gunn ◽  
François Chapeau-Blondeau ◽  
Mark D. McDonnell ◽  
Bruce R. Davis ◽  
Andrew Allison ◽  
...  

Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith; however, rarely is consideration given to whether a systemic failure has occurred. Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence and (iii) cryptographic primality testing. In this paper, we investigate the effects of small error rates in a set of measurements or observations. We find that even with very low systemic failure rates, high confidence is surprisingly difficult to achieve; in particular, we find that certain analyses of cryptographically important numerical tests are highly optimistic, underestimating their false-negative rate by as much as a factor of 2 80 .

2018 ◽  
Vol 25 (12) ◽  
pp. 1618-1625 ◽  
Author(s):  
George Hripcsak ◽  
Matthew E Levine ◽  
Ning Shang ◽  
Patrick B Ryan

Abstract Objective To study the effect on patient cohorts of mapping condition (diagnosis) codes from source billing vocabularies to a clinical vocabulary. Materials and Methods Nine International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) concept sets were extracted from eMERGE network phenotypes, translated to Systematized Nomenclature of Medicine - Clinical Terms concept sets, and applied to patient data that were mapped from source ICD9-CM and ICD10-CM codes to Systematized Nomenclature of Medicine - Clinical Terms codes using Observational Health Data Sciences and Informatics (OHDSI) Observational Medical Outcomes Partnership (OMOP) vocabulary mappings. The original ICD9-CM concept set and a concept set extended to ICD10-CM were used to create patient cohorts that served as gold standards. Results Four phenotype concept sets were able to be translated to Systematized Nomenclature of Medicine - Clinical Terms without ambiguities and were able to perform perfectly with respect to the gold standards. The other 5 lost performance when 2 or more ICD9-CM or ICD10-CM codes mapped to the same Systematized Nomenclature of Medicine - Clinical Terms code. The patient cohorts had a total error (false positive and false negative) of up to 0.15% compared to querying ICD9-CM source data and up to 0.26% compared to querying ICD9-CM and ICD10-CM data. Knowledge engineering was required to produce that performance; simple automated methods to generate concept sets had errors up to 10% (one outlier at 250%). Discussion The translation of data from source vocabularies to Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT) resulted in very small error rates that were an order of magnitude smaller than other error sources. Conclusion It appears possible to map diagnoses from disparate vocabularies to a single clinical vocabulary and carry out research using a single set of definitions, thus improving efficiency and transportability of research.


2015 ◽  
Vol 36 (6) ◽  
pp. 3671 ◽  
Author(s):  
Gilberto Rodrigues Liska ◽  
Fortunato Silva de Menezes ◽  
Marcelo Angelo Cirillo ◽  
Flávio Meira Borém ◽  
Ricardo Miguel Cortez ◽  
...  

Automatic classification methods have been widely used in numerous situations and the boosting method has become known for use of a classification algorithm, which considers a set of training data and, from that set, constructs a classifier with reweighted versions of the training set. Given this characteristic, the aim of this study is to assess a sensory experiment related to acceptance tests with specialty coffees, with reference to both trained and untrained consumer groups. For the consumer group, four sensory characteristics were evaluated, such as aroma, body, sweetness, and final score, attributed to four types of specialty coffees. In order to obtain a classification rule that discriminates trained and untrained tasters, we used the conventional Fisher’s Linear Discriminant Analysis (LDA) and discriminant analysis via boosting algorithm (AdaBoost). The criteria used in the comparison of the two approaches were sensitivity, specificity, false positive rate, false negative rate, and accuracy of classification methods. Additionally, to evaluate the performance of the classifiers, the success rates and error rates were obtained by Monte Carlo simulation, considering 100 replicas of a random partition of 70% for the training set, and the remaining for the test set. It was concluded that the boosting method applied to discriminant analysis yielded a higher sensitivity rate in regard to the trained panel, at a value of 80.63% and, hence, reduction in the rate of false negatives, at 19.37%. Thus, the boosting method may be used as a means of improving the LDA classifier for discrimination of trained tasters.


2021 ◽  
Vol 8 (11) ◽  
Author(s):  
Yair Daon ◽  
Amit Huppert ◽  
Uri Obolski

Pooling is a method of simultaneously testing multiple samples for the presence of pathogens. Pooling of SARS-CoV-2 tests is increasing in popularity, due to its high testing throughput. A popular pooling scheme is Dorfman pooling: test N individuals simultaneously, if the test is positive, each individual is then tested separately; otherwise, all are declared negative. Most analyses of the error rates of pooling schemes assume that including more than a single infected sample in a pooled test does not increase the probability of a positive outcome. We challenge this assumption with experimental data and suggest a novel and parsimonious probabilistic model for the outcomes of pooled tests. As an application, we analyse the false-negative rate (i.e. the probability of a negative result for an infected individual) of Dorfman pooling. We show that the false-negative rates under Dorfman pooling increase when the prevalence of infection decreases. However, low infection prevalence is exactly the condition when Dorfman pooling achieves highest throughput efficiency. We therefore urge the cautious use of pooling and development of pooling schemes that consider correctly accounting for tests’ error rates.


2017 ◽  
Vol 11 (3-4) ◽  
pp. 118 ◽  
Author(s):  
Rashid Khalid Sayyid ◽  
Dharmendra Dingar ◽  
Katherine Fleshner ◽  
Taylor Thorburn ◽  
Joshua Diamond ◽  
...  

Introduction: Repeat prostate biopsies in active surveillance patients are associated with significant complications. Novel imaging and blood/urine-based non-invasive tests are being developed to better predict disease grade and volume progression. We conducted a theoretical study to determine what test performance characteristics and costs would a non-invasive test(s) require in order for patients and their physicians to comfortably avoid biopsy.Methods: Surveys were administered to two populations to determine an acceptable false-negative rate and cost for such test(s). Active surveillance patients were recruited at time of followup in clinic at Princess Margaret Cancer Centre. Physician members of the Society of Urological Oncology were targeted via an online survey. Participants were questioned about their demographics and other characteristics that might influence chosen error rates and cost.Results: 136 patients and 670 physicians were surveyed, with 130 (95.6%) and 104 (15.5%) responses obtained, respectively. A vast majority of patients (90.6%) were comfortable with a non-invasive test(s) in place of biopsy, with 64.8% accepting a false-negative rate of 5‒20%. Most physicians (93.3%) were comfortable with a non-invasive test, with 77.9% accepting a rate of 5‒20%. Most patients and physicians felt that a cost of less than $1000 per administration would be reasonable.Conclusions: Most patients/physicians are comfortable with a non-invasive test(s). Although a 5% error rate seems acceptable to many, a substantial subset feels that 99% or higher negative predictive value is required. Thus, a personalized approach with shared decision-making between patients and physicians is essential to optimize patient care in such situations.


1997 ◽  
Vol 31 (3) ◽  
pp. 391-397 ◽  
Author(s):  
Fiona K. Judd ◽  
Alexandra Cockram ◽  
Anne Mijch ◽  
Dean McKenzie

Objective: To provide an overview of the work of a liaison psychiatry service to an HIV/AIDS inpatient unit, and particularly to examine the identification of mood and related disorders by referring doctors. Method: The micro-cares prospective clinical database system was used to obtain data on all patients referred to the HIV/AIDS consultation–liaison psychiatry service in an infectious diseases hospital in Melbourne. Results: Three hundred and ninety-two inpatient referrals were made in the 2 years from 1993–1995: a referral rate of 16.7%. The most frequent reasons for referral were evaluation of coping problems (42%), assessment of possible depression (31%), and assessment of psychotropic medication (24.5%). The most common psychiatric diagnoses were mood disorders (36.5%), psychoactive substance use disorders (22.7%) and organic mental disorders (18.1%). Overall concordance of recognition of depression by the referring doctor and diagnosis of depression by the consultant psychiatrist was 79%; 20% false positive rate, 23% false negative rate. Conclusions: Psychiatric comorbidity is common in patients with HIV/AIDS. Reasons for referral vary from those seen in other inpatient settings. Previously noted problems such as the misdiagnosis of psychiatric disorder and the mislabelling of the syndrome recognised by psychiatrists as depression were noted here.


2021 ◽  
Author(s):  
◽  
Asher Cook

<p>Electronic bioacoustic techniques are providing new and effective ways of monitoring birds and have a number of advantages over other traditional monitoring methods. Given the increasing popularity of bioacoustic methods, and the difficulties associated with automated analyses (e.g. high Type I error rates), it is important that the most effective ways of scoring audio recordings are investigated. In Chapter Two I describe a novel sub-sampling and scoring technique (the ‘10 in 60 sec’ method) which estimates the vocal conspicuousness of bird species through the use of repeated presence-absence counts and compare its performance with a current manual method. The ‘10 in 60 sec’ approach reduced variability in estimates of vocal conspicuousness, significantly increased the number of species detected per count and reduced temporal autocorrelation. I propose that the ‘10 in 60 sec’ method will have greater overall ability to detect changes in underlying birdsong parameters and hence provide more informative data to scientists and conservation managers.  It is often anecdotally suggested that forests ‘fall silent’ and are devoid of birdsong following aerial 1080 operations. However, it is difficult to objectively assess the validity of this claim without quantitative information that addresses the claim specifically. Therefore in Chapter Three I applied the methodological framework outlined in Chapter Two to answer a controversial conservation question: Do New Zealand forests ‘fall silent’ after aerial 1080 operations? At the community level I found no evidence for a reduction in birdsong after the 1080 operation and eight out of the nine bird taxa showed no evidence for a decline in vocal conspicuousness. Only one species, tomtit (Petroica macrocephala), showed evidence for a decline in vocal conspicuousness, though this effect was non-significant after applying a correction for multiple tests.  In Chapter Four I used tomtits as a case study species to compare manual and automated approaches to: (1) estimating vocal conspicuousness and (2) determine the feasibility of using an automated detector on a New Zealand passerine. I found that data from the automated method were significantly positively correlated with the manual method although the relationship was not particularly strong (Pearson’s r = 0.62, P < 0.0001). The automated method suffered from a relatively high false negative rate and the data it produced did not reveal a decline in tomtit call rates following the 1080 drop. Given the relatively poor performance of the automated method, I propose that the automatic detector developed in this thesis requires further refinement before it is suitable for answering management-level questions for tomtit populations. However, as pattern recognition technology continues to improve automated methods are likely to become more viable in the future.</p>


2021 ◽  
Author(s):  
◽  
Asher Cook

<p>Electronic bioacoustic techniques are providing new and effective ways of monitoring birds and have a number of advantages over other traditional monitoring methods. Given the increasing popularity of bioacoustic methods, and the difficulties associated with automated analyses (e.g. high Type I error rates), it is important that the most effective ways of scoring audio recordings are investigated. In Chapter Two I describe a novel sub-sampling and scoring technique (the ‘10 in 60 sec’ method) which estimates the vocal conspicuousness of bird species through the use of repeated presence-absence counts and compare its performance with a current manual method. The ‘10 in 60 sec’ approach reduced variability in estimates of vocal conspicuousness, significantly increased the number of species detected per count and reduced temporal autocorrelation. I propose that the ‘10 in 60 sec’ method will have greater overall ability to detect changes in underlying birdsong parameters and hence provide more informative data to scientists and conservation managers.  It is often anecdotally suggested that forests ‘fall silent’ and are devoid of birdsong following aerial 1080 operations. However, it is difficult to objectively assess the validity of this claim without quantitative information that addresses the claim specifically. Therefore in Chapter Three I applied the methodological framework outlined in Chapter Two to answer a controversial conservation question: Do New Zealand forests ‘fall silent’ after aerial 1080 operations? At the community level I found no evidence for a reduction in birdsong after the 1080 operation and eight out of the nine bird taxa showed no evidence for a decline in vocal conspicuousness. Only one species, tomtit (Petroica macrocephala), showed evidence for a decline in vocal conspicuousness, though this effect was non-significant after applying a correction for multiple tests.  In Chapter Four I used tomtits as a case study species to compare manual and automated approaches to: (1) estimating vocal conspicuousness and (2) determine the feasibility of using an automated detector on a New Zealand passerine. I found that data from the automated method were significantly positively correlated with the manual method although the relationship was not particularly strong (Pearson’s r = 0.62, P < 0.0001). The automated method suffered from a relatively high false negative rate and the data it produced did not reveal a decline in tomtit call rates following the 1080 drop. Given the relatively poor performance of the automated method, I propose that the automatic detector developed in this thesis requires further refinement before it is suitable for answering management-level questions for tomtit populations. However, as pattern recognition technology continues to improve automated methods are likely to become more viable in the future.</p>


2009 ◽  
Vol 14 (3) ◽  
pp. 230-238 ◽  
Author(s):  
Xiaohua Douglas Zhang ◽  
Shane D. Marine ◽  
Marc Ferrer

For hit selection in genome-scale RNAi research, we do not want to miss small interfering RNAs (siRNAs) with large effects; meanwhile, we do not want to include siRNAs with small or no effects in the list of selected hits. There is a strong need to control both the false-negative rate (FNR), in which the siRNAs with large effects are not selected as hits, and the restricted false-positive rate (RFPR), in which the siRNAs with no or small effects are selected as hits. An error control method based on strictly standardized mean difference (SSMD) has been proposed to maintain a flexible and balanced control of FNR and RFPR. In this article, the authors illustrate how to maintain a balanced control of both FNR and RFPR using the plot of error rate versus SSMD as well as how to keep high powers using the plot of power versus SSMD in RNAi high-throughput screening experiments. There are relationships among FNR, RFPR, Type I and II errors, and power. Understanding the differences and links among these concepts is essential for people to use statistical terminology correctly and effectively for data analysis in genome-scale RNAi screens. Here the authors explore these differences and links. (Journal of Biomolecular Screening 2009:230-238)


Methodology ◽  
2019 ◽  
Vol 15 (3) ◽  
pp. 97-105
Author(s):  
Rodrigo Ferrer ◽  
Antonio Pardo

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.


Author(s):  
Brian M. Katt ◽  
Casey Imbergamo ◽  
Fortunato Padua ◽  
Joseph Leider ◽  
Daniel Fletcher ◽  
...  

Abstract Introduction There is a known false negative rate when using electrodiagnostic studies (EDS) to diagnose carpal tunnel syndrome (CTS). This can pose a management dilemma for patients with signs and symptoms that correlate with CTS but normal EDS. While corticosteroid injection into the carpal tunnel has been used in this setting for diagnostic purposes, there is little data in the literature supporting this practice. The purpose of this study is to evaluate the prognostic value of a carpal tunnel corticosteroid injection in patients with a normal electrodiagnostic study but exhibiting signs and symptoms suggestive of carpal tunnel, who proceed with a carpal tunnel release. Materials and Methods The group included 34 patients presenting to an academic orthopedic practice over the years 2010 to 2019 who had negative EDS, a carpal tunnel corticosteroid injection, and a carpal tunnel release. One patient (2.9%), where the response to the corticosteroid injection was not documented, was excluded from the study, yielding a study cohort of 33 patients. Three patients had bilateral disease, yielding 36 hands for evaluation. Statistical analysis was performed using Chi-square analysis for nonparametric data. Results Thirty-two hands (88.9%) demonstrated complete or partial relief of neuropathic symptoms after the corticosteroid injection, while four (11.1%) did not experience any improvement. Thirty-one hands (86.1%) had symptom improvement following surgery, compared with five (13.9%) which did not. Of the 32 hands that demonstrated relief following the injection, 29 hands (90.6%) improved after surgery. Of the four hands that did not demonstrate relief after the injection, two (50%) improved after surgery. This difference was statistically significant (p = 0.03). Conclusion Patients diagnosed with a high index of suspicion for CTS do well with operative intervention despite a normal electrodiagnostic test if they have had a positive response to a preoperative injection. The injection can provide reassurance to both the patient and surgeon before proceeding to surgery. Although patients with a normal electrodiagnostic test and no response to cortisone can still do well with surgical intervention, the surgeon should carefully review both the history and physical examination as surgical success may decrease when both diagnostic tests are negative. Performing a corticosteroid injection is an additional diagnostic tool to consider in the management of patients with CTS and normal electrodiagnostic testing.


Sign in / Sign up

Export Citation Format

Share Document