Preprint of Too good to be false: Nonsignificant results revisited

Due to its probabilistic nature, Null Hypothesis Significance Testing (NHST) is subject to decision errors. The concern for false positives has overshadowed the concern for false negatives in the recent debates in psychology. This is unwarranted, since reported statistically nonsignificant findings may just be 'too good to be false'. We examined evidence for false negatives in nonsignificant results in three different ways. We adapted the Fisher method to detect the presence of at least one false negative in a set of statistically nonsignificant results. Simulations show that the adapted Fisher method generally is a powerful method to detect false negatives. We examined evidence for false negatives in the psychology literature in three applications of the adapted Fisher method. These applications indicate that (i) the observed effect size distribution of nonsignificant effects exceeds the expected distribution assuming a null-effect, and approximately two out of three (66.7%) psychology articles reporting nonsignificant results contain evidence for at least one false negative, (ii) nonsignificant results on gender effects contain evidence of true nonzero effects, and (iii) the statistically nonsignificant replications from the Reproducibility Project Psychology (RPP) do not warrant conclusions about the absence or presence of true zero effects underlying these nonsignificant results. We conclude that false negatives deserve more attention in the current debate on statistical practices in psychology. Neglecting effects due to a lack of statistical power can lead to a waste of research resources and stifle the scientific discovery process.

Download Full-text

CORP: Minimizing the chances of false positives and false negatives

Journal of Applied Physiology ◽

10.1152/japplphysiol.00937.2016 ◽

2017 ◽

Vol 122 (1) ◽

pp. 91-95 ◽

Cited By ~ 16

Author(s):

Douglas Curran-Everett

Keyword(s):

Hypothesis Testing ◽

Experimental Technique ◽

False Positive ◽

Scientific Discovery ◽

False Negative ◽

False Positives ◽

Experimental Result ◽

False Negatives ◽

Scientific Equipment ◽

Recent Initiative

Statistics is essential to the process of scientific discovery. An inescapable tenet of statistics, however, is the notion of uncertainty which has reared its head within the arena of reproducibility of research. The Journal of Applied Physiology’s recent initiative, “Cores of Reproducibility in Physiology,” is designed to improve the reproducibility of research: each article is designed to elucidate the principles and nuances of using some piece of scientific equipment or some experimental technique so that other researchers can obtain reproducible results. But other researchers can use some piece of equipment or some technique with expert skill and still fail to replicate an experimental result if they neglect to consider the fundamental concepts of statistics of hypothesis testing and estimation and their inescapable connection to the reproducibility of research. If we want to improve the reproducibility of our research, then we want to minimize the chance that we get a false positive and—at the same time—we want to minimize the chance that we get a false negative. In this review I outline strategies to accomplish each of these things. These strategies are related intimately to fundamental concepts of statistics and the inherent uncertainty embedded in them.

Download Full-text

Clinically Meaningful Change

Methodology ◽

10.1027/1614-2241/a000168 ◽

2019 ◽

Vol 15 (3) ◽

pp. 97-105

Author(s):

Rodrigo Ferrer ◽

Antonio Pardo

Keyword(s):

Effect Size ◽

False Negative ◽

False Negative Rate ◽

Point Of View ◽

Skewed Distribution ◽

Effect Sizes ◽

False Negatives ◽

Large Size ◽

Before And After ◽

Post Test

Abstract. In a recent paper, Ferrer and Pardo (2014) tested several distribution-based methods designed to assess when test scores obtained before and after an intervention reflect a statistically reliable change. However, we still do not know how these methods perform from the point of view of false negatives. For this purpose, we have simulated change scenarios (different effect sizes in a pre-post-test design) with distributions of different shapes and with different sample sizes. For each simulated scenario, we generated 1,000 samples. In each sample, we recorded the false-negative rate of the five distribution-based methods with the best performance from the point of view of the false positives. Our results have revealed unacceptable rates of false negatives even with effects of very large size, starting from 31.8% in an optimistic scenario (effect size of 2.0 and a normal distribution) to 99.9% in the worst scenario (effect size of 0.2 and a highly skewed distribution). Therefore, our results suggest that the widely used distribution-based methods must be applied with caution in a clinical context, because they need huge effect sizes to detect a true change. However, we made some considerations regarding the effect size and the cut-off points commonly used which allow us to be more precise in our estimates.

Download Full-text

No evidence of false-negative Plasmodium falciparum rapid diagnostic results in Monrovia, Liberia

Malaria Journal ◽

10.1186/s12936-021-03774-3 ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Mandella King ◽

Alexander E. George ◽

Pau Cisteró ◽

Christine K. Tarr-Attia ◽

Beatriz Arregui ◽

...

Keyword(s):

Plasmodium Falciparum ◽

False Negative ◽

Malaria Diagnosis ◽

Rapid Diagnostic Tests ◽

Molecular Testing ◽

False Negatives ◽

Gene Deletions ◽

History Of ◽

Catholic Hospital ◽

Or History

Abstract Background Malaria diagnosis in many malaria-endemic countries relies mainly on the use of rapid diagnostic tests (RDTs). The majority of commercial RDTs used in Africa detect the Plasmodium falciparum histidine-rich protein 2 (PfHRP2). pfhrp2/3 gene deletions can therefore lead to false-negative RDT results. This study aimed to evaluate the frequency of PCR-confirmed, false-negative P. falciparum RDT results in Monrovia, Liberia. Methods PfHRP2-based RDT (Paracheck Pf®) and microscopy results from 1038 individuals with fever or history of fever (n = 951) and pregnant women at first antenatal care (ANC) visit (n = 87) enrolled in the Saint Joseph’s Catholic Hospital (Monrovia) from March to July 2019 were used to assess the frequency of false-negative RDT results. True–false negatives were confirmed by detecting the presence of P. falciparum DNA by quantitative PCR in samples from individuals with discrepant RDT and microscopy results. Samples that were positive by 18S rRNA qPCR but negative by PfHRP2-RDT were subjected to multiplex qPCR assay for detection of pfhrp2 and pfhrp3. Results One-hundred and eighty-six (19.6%) and 200 (21.0%) of the 951 febrile participants had a P. falciparum-positive result by RDT and microscopy, respectively. Positivity rate increased with age and the reporting of joint pain, chills and shivers, vomiting and weakness, and decreased with the presence of coughs and nausea. The positivity rate at first ANC visit was 5.7% (n = 5) and 8% (n = 7) by RDT and microscopy, respectively. Out of 207 Plasmodium infections detected by microscopy, 22 (11%) were negative by RDT. qPCR confirmed absence of P. falciparum DNA in the 16 RDT-negative but microscopy-positive samples which were available for molecular testing. Among the 14 samples that were positive by qPCR but negative by RDT and microscopy, 3 only amplified pfldh, and among these 3 all were positive for pfhrp2 and pfhrp3. Conclusion There is no qPCR-confirmed evidence of false-negative RDT results due to pfhrp2/pfhrp3 deletions in this study conducted in Monrovia (Liberia). This indicates that these deletions are not expected to affect the performance of PfHRP2-based RDTs for the diagnosis of malaria in Liberia. Nevertheless, active surveillance for the emergence of PfHRP2 deletions is required.

Download Full-text

Testing Segmentation Popular Loss and Variations in Three Multiclass Medical Imaging Problems

Journal of Imaging ◽

10.3390/jimaging7020016 ◽

2021 ◽

Vol 7 (2) ◽

pp. 16

Author(s):

Pedro Furtado

Keyword(s):

False Negative ◽

Magnetic Resonance Images ◽

Medical Image Segmentation ◽

Cross Entropy ◽

Loss Functions ◽

Optic Disk ◽

False Negatives ◽

Convolutional Network ◽

Percentage Points ◽

Class Background

Image structures are segmented automatically using deep learning (DL) for analysis and processing. The three most popular base loss functions are cross entropy (crossE), intersect-over-the-union (IoU), and dice. Which should be used, is it useful to consider simple variations, such as modifying formula coefficients? How do characteristics of different image structures influence scores? Taking three different medical image segmentation problems (segmentation of organs in magnetic resonance images (MRI), liver in computer tomography images (CT) and diabetic retinopathy lesions in eye fundus images (EFI)), we quantify loss functions and variations, as well as segmentation scores of different targets. We first describe the limitations of metrics, since loss is a metric, then we describe and test alternatives. Experimentally, we observed that DeeplabV3 outperforms UNet and fully convolutional network (FCN) in all datasets. Dice scored 1 to 6 percentage points (pp) higher than cross entropy over all datasets, IoU improved 0 to 3 pp. Varying formula coefficients improved scores, but the best choices depend on the dataset: compared to crossE, different false positive vs. false negative weights improved MRI by 12 pp, and assigning zero weight to background improved EFI by 6 pp. Multiclass segmentation scored higher than n-uniclass segmentation in MRI by 8 pp. EFI lesions score low compared to more constant structures (e.g., optic disk or even organs), but loss modifications improve those scores significantly 6 to 9 pp. Our conclusions are that dice is best, it is worth assigning 0 weight to class background and to test different weights on false positives and false negatives.

Download Full-text

Characteristics of patients who had a stroke not initially identified during emergency prehospital assessment: a systematic review

Emergency Medicine Journal ◽

10.1136/emermed-2020-209607 ◽

2021 ◽

pp. emermed-2020-209607

Author(s):

Stephanie P Jones ◽

Janet E Bray ◽

Josephine ME Gibson ◽

Graham McClelland ◽

Colette Miller ◽

...

Keyword(s):

Systematic Review ◽

Diagnostic Accuracy ◽

False Negative ◽

Visual Disturbance ◽

Screening Tools ◽

False Negatives ◽

Presenting Symptoms ◽

Ischaemic Attack ◽

Stroke Type ◽

Key Terms

BackgroundAround 25% of patients who had a stroke do not present with typical ‘face, arm, speech’ symptoms at onset, and are challenging for emergency medical services (EMS) to identify. The aim of this systematic review was to identify the characteristics of acute stroke presentations associated with inaccurate EMS identification (false negatives).MethodWe performed a systematic search of MEDLINE, EMBASE, CINAHL and PubMed from 1995 to August 2020 using key terms: stroke, EMS, paramedics, identification and assessment. Studies included: patients who had a stroke or patient records; ≥18 years; any stroke type; prehospital assessment undertaken by health professionals including paramedics or technicians; data reported on prehospital diagnostic accuracy and/or presenting symptoms. Data were extracted and study quality assessed by two researchers using the Quality Assessment of Diagnostic Accuracy Studies V.2 tool.ResultsOf 845 studies initially identified, 21 observational studies met the inclusion criteria. Of the 6934 stroke and Transient Ischaemic Attack patients included, there were 1774 (26%) false negative patients (range from 4 (2%) to 247 (52%)). Commonly documented symptoms in false negative cases were speech problems (n=107; 13%–28%), nausea/vomiting (n=94; 8%–38%), dizziness (n=86; 23%–27%), changes in mental status (n=51; 8%–25%) and visual disturbance/impairment (n=43; 13%–28%).ConclusionSpeech problems and posterior circulation symptoms were the most commonly documented symptoms among stroke presentations that were not correctly identified by EMS (false negatives). However, the addition of further symptoms to stroke screening tools requires valuation of subsequent sensitivity and specificity, training needs and possible overuse of high priority resources.

Download Full-text

Validity of negative bone biopsy in suspicious bone lesions

Acta Radiologica Open ◽

10.1177/20584601211030662 ◽

2021 ◽

Vol 10 (7) ◽

pp. 205846012110306

Author(s):

Mine B Lange ◽

Lars J Petersen ◽

Michael B Nielsen ◽

Helle D Zacho

Keyword(s):

Bone Biopsy ◽

False Negative ◽

Exclusion Criterion ◽

Bone Lesions ◽

False Negatives ◽

Patient Morbidity ◽

Bone Biopsies ◽

Initial Biopsy ◽

Valid Criterion

Background The presence of malignant cells in bone biopsies is considered gold standard to verify occurrence of cancer, whereas a negative bone biopsy can represent a false negative, with a risk of increasing patient morbidity and mortality and creating misleading conclusions in cancer research. However, a paucity of literature documents the validity of negative bone biopsy as an exclusion criterion for the presence of skeletal malignancies. Purpose To investigate the validity of a negative bone biopsy in bone lesions suspicious of malignancy. Material and Method A retrospective cohort of 215 consecutive targeted non-malignant skeletal biopsies from 207 patients (43% women, 57% men, median age 64, and range 94) representing suspicious focal bone lesions, collected from January 1, 2011, to July 31, 2013, was followed over a 2-year period to examine any additional biopsy, imaging, and clinical follow-up information to categorize the original biopsy as truly benign, malignant, or equivocal. Standard deviations and 95% confidence intervals were calculated. Results 210 of 215 biopsies (98%; 95% CI 0.94–0.99) showed to be truly benign 2 years after initial biopsy. Two biopsies were false negatives (1%; 95% CI 0.001–0.03), and three were equivocal (lack of imaging description). Conclusion Our study documents negative bone biopsy as a valid criterion for the absence of bone metastasis. Since only 28% had a confirmed diagnosis of prior cancer and not all patients received adequately sensitive imaging, our results might not be applicable to all cancer patients with suspicious bone lesions.

Download Full-text

Incorporating false negative tests in epidemiological models for SARS-CoV-2 transmission and reconciling with seroprevalence estimates

Scientific Reports ◽

10.1038/s41598-021-89127-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Rupam Bhattacharyya ◽

Ritoban Kundu ◽

Ritwik Bhaduri ◽

Debashree Ray ◽

Lauren J. Beesley ◽

...

Keyword(s):

False Negative ◽

Population Based ◽

Training Data ◽

Epidemiological Models ◽

Rt Pcr ◽

Antibody Prevalence ◽

Igg Antibody ◽

False Negatives ◽

Seir Model ◽

Study Population

AbstractSusceptible-Exposed-Infected-Removed (SEIR)-type epidemiologic models, modeling unascertained infections latently, can predict unreported cases and deaths assuming perfect testing. We apply a method we developed to account for the high false negative rates of diagnostic RT-PCR tests for detecting an active SARS-CoV-2 infection in a classic SEIR model. The number of unascertained cases and false negatives being unobservable in a real study, population-based serosurveys can help validate model projections. Applying our method to training data from Delhi, India, during March 15–June 30, 2020, we estimate the underreporting factor for cases at 34–53 (deaths: 8–13) on July 10, 2020, largely consistent with the findings of the first round of serosurveys for Delhi (done during June 27–July 10, 2020) with an estimated 22.86% IgG antibody prevalence, yielding estimated underreporting factors of 30–42 for cases. Together, these imply approximately 96–98% cases in Delhi remained unreported (July 10, 2020). Updated calculations using training data during March 15-December 31, 2020 yield estimated underreporting factor for cases at 13–22 (deaths: 3–7) on January 23, 2021, which are again consistent with the latest (fifth) round of serosurveys for Delhi (done during January 15–23, 2021) with an estimated 56.13% IgG antibody prevalence, yielding an estimated range for the underreporting factor for cases at 17–21. Together, these updated estimates imply approximately 92–96% cases in Delhi remained unreported (January 23, 2021). Such model-based estimates, updated with latest data, provide a viable alternative to repeated resource-intensive serosurveys for tracking unreported cases and deaths and gauging the true extent of the pandemic.

Download Full-text

Evaluation of Positive T- and B-Cell Gene Rearrangement Studies Among Patients Without a Definitive Diagnosis by Other Assays

American Journal of Clinical Pathology ◽

10.1093/ajcp/aqz112.067 ◽

2019 ◽

Vol 152 (Supplement_1) ◽

pp. S35-S36

Author(s):

Hadrian Mendoza ◽

Christopher Tormey ◽

Alexa Siddon

Keyword(s):

T Cell ◽

False Positive ◽

Gene Rearrangement ◽

Hematologic Malignancy ◽

False Negative ◽

False Positives ◽

False Negatives ◽

True Negative ◽

Flow Cytometric ◽

Pathology Reports

Abstract In the evaluation of bone marrow (BM) and peripheral blood (PB) for hematologic malignancy, positive immunoglobulin heavy chain (IG) or T-cell receptor (TCR) gene rearrangement results may be detected despite unrevealing results from morphologic, flow cytometric, immunohistochemical (IHC), and/or cytogenetic studies. The significance of positive rearrangement studies in the context of otherwise normal ancillary findings is unknown, and as such, we hypothesized that gene rearrangement studies may be predictive of an emerging B- or T-cell clone in the absence of other abnormal laboratory tests. Data from all patients who underwent IG or TCR gene rearrangement testing at the authors’ affiliated VA hospital between January 1, 2013, and July 6, 2018, were extracted from the electronic medical record. Date of testing; specimen source; and morphologic, flow cytometric, IHC, and cytogenetic characterization of the tissue source were recorded from pathology reports. Gene rearrangement results were categorized as true positive, false positive, false negative, or true negative. Lastly, patient records were reviewed for subsequent diagnosis of hematologic malignancy in patients with positive gene rearrangement results with negative ancillary testing. A total of 136 patients, who had 203 gene rearrangement studies (50 PB and 153 BM), were analyzed. In TCR studies, there were 2 false positives and 1 false negative in 47 PB assays, as well as 7 false positives and 1 false negative in 54 BM assays. Regarding IG studies, 3 false positives and 12 false negatives in 99 BM studies were identified. Sensitivity and specificity, respectively, were calculated for PB TCR studies (94% and 93%), BM IG studies (71% and 95%), and BM TCR studies (92% and 83%). Analysis of PB IG gene rearrangement studies was not performed due to the small number of tests (3; all true negative). None of the 12 patients with false-positive IG/TCR gene rearrangement studies later developed a lymphoproliferative disorder, although 2 patients were later diagnosed with acute myeloid leukemia. Of the 14 false negatives, 10 (71%) were related to a diagnosis of plasma cell neoplasms. Results from the present study suggest that positive IG/TCR gene rearrangement studies are not predictive of lymphoproliferative disorders in the context of otherwise negative BM or PB findings. As such, when faced with equivocal pathology reports, clinicians can be practically advised that isolated positive IG/TCR gene rearrangement results may not indicate the need for closer surveillance.

Download Full-text

Application of Immunosignatures for Diagnosis of Valley Fever

Clinical and Vaccine Immunology ◽

10.1128/cvi.00228-14 ◽

2014 ◽

Vol 21 (8) ◽

pp. 1169-1177 ◽

Cited By ~ 13

Author(s):

Krupa Arun Navalkar ◽

Stephen Albert Johnston ◽

Neal Woodbury ◽

John N. Galgiani ◽

D. Mitchell Magee ◽

...

Keyword(s):

Bacterial Infections ◽

Random Sequence ◽

False Negative ◽

False Negative Rate ◽

Peptide Array ◽

Igg Antibodies ◽

False Negatives ◽

Peptide Microarray ◽

Valley Fever ◽

Antibody Levels

ABSTRACTValley fever (VF) is difficult to diagnose, partly because the symptoms of VF are confounded with those of other community-acquired pneumonias. Confirmatory diagnostics detect IgM and IgG antibodies against coccidioidal antigens via immunodiffusion (ID). The false-negative rate can be as high as 50% to 70%, with 5% of symptomatic patients never showing detectable antibody levels. In this study, we tested whether the immunosignature diagnostic can resolve VF false negatives. An immunosignature is the pattern of antibody binding to random-sequence peptides on a peptide microarray. A 10,000-peptide microarray was first used to determine whether valley fever patients can be distinguished from 3 other cohorts with similar infections. After determining the VF-specific peptides, a small 96-peptide diagnostic array was created and tested. The performances of the 10,000-peptide array and the 96-peptide diagnostic array were compared to that of the ID diagnostic standard. The 10,000-peptide microarray classified the VF samples from the other 3 infections with 98% accuracy. It also classified VF false-negative patients with 100% sensitivity in a blinded test set versus 28% sensitivity for ID. The immunosignature microarray has potential for simultaneously distinguishing valley fever patients from those with other fungal or bacterial infections. The same 10,000-peptide array can diagnose VF false-negative patients with 100% sensitivity. The smaller 96-peptide diagnostic array was less specific for diagnosing false negatives. We conclude that the performance of the immunosignature diagnostic exceeds that of the existing standard, and the immunosignature can distinguish related infections and might be used in lieu of existing diagnostics.

Download Full-text

Nipple Confusion

PEDIATRICS ◽

10.1542/peds.92.2.300 ◽

1993 ◽

Vol 92 (2) ◽

pp. 300-301

Author(s):

DOREN FREDRICKSON

Keyword(s):

Breast Feeding ◽

Statistical Power ◽

White Women ◽

Small Sample Size ◽

False Negative ◽

Small Sample ◽

Bottle Feeding ◽

Feeding Duration ◽

Breast Feed ◽

The Impact

To the Editor.— I wish to comment on the study reported by Cronenwett et al,1 which was a fascinating prospective study among married white women who planned to breast-feed. Women were randomly selected to perform either exdusive breast-feeding or partial breast-feeding with bottled human milk supplements to determine the impact of infant temperament and limited bottle-feeding on breast-feeding duration. The authors admit that small sample size and lack of statistical power make a false-negative possible.

Download Full-text