scholarly journals Preregistration is important, but not enough: Many statistical analyses can inflate the risk of false-positives

2021 ◽  
Author(s):  
Jacob Dalgaard Christensen ◽  
Jacob Lund Orquin ◽  
Sonja Perkovic ◽  
Carl Johan Lagerkvist

Even with a small number of variables researchers can test many possible models of their data thus increasing the risk of false-positive results. Using combinatorics, we show that one key independent variable and three covariates can generate 95 possible models, while six covariates can generate over 2.3 million models. Such large model sets nearly guarantee false-positive results. Using simulation, we show that preregistering a single analysis with a key independent variable heavily reduces the risk of false-positives. However, even so, many models produce false-positive results with a much higher probability than the expected 5%. The worst-case scenario are models with interactions between binary dummy coded variables and omitted main effects. Such models can generate false-positive results up to 34.5% of the time. While preregistration is a crucial step towards reducing false-positive results, researchers need to carefully consider what analyses they plan and we provide recommendations for what analyses to avoid. Our findings also suggest that interpreting p-values in exploratory analyses might be meaningless considering the high false-positive probability.

2021 ◽  
Author(s):  
Yan Lei ◽  
Xiaolan Lu ◽  
Daiyong Mou ◽  
Qin Du ◽  
Guo Bin ◽  
...  

Abstract There have been several false-positive results in the antibody detection of the COVID-19. This study aims to analyze the distribution characteristics of SARS-CoV-2 IgM and IgG in false-positive results detected using chemiluminescent immunoassay. The characteristics of the false-positive results in SARS-CoV-2 IgM and IgG testing were retrospectively analyzed. The dynamic changes in the results of SARS-CoV-2 IgM and IgG antibodies were observed. The false-positive proportion of the single SARS-CoV-2 IgM positive results was 95.88%, which was significantly higher than those of the single SARS-CoV-2 IgG positive results (67.50%) (P < 0.001) and SARS-CoV-2 IgM & IgG positive results (29.55%) (P < 0.001). The S/CO of the SARS-CoV-2 IgM and IgG in false-positive results ranged from 1.0 to 50.0. The false-positive probability of SARS-CoV-2 IgM in the S/CO range (1.0 ~ 3.0) was 91.73% (77/84), and the probability of false-positive of SARS-CoV-2 IgG in the S/CO range (1.0 ~ 2.0) was 85.71% (24/28). Dynamic monitoring showed that the S/CO values of IgM in false-positive results decreased or remained unchanged, whereas the S/CO values of IgG in false-positive results only decreased. The possibility of false-positive of the single SARS-CoV-2 IgM positive and single SARS-CoV-2 IgG positive results was high. As the value of S/CO decreased, the probability of false-positive consequently increased, especially among the single SARS-CoV-2 IgM positive results.


PEDIATRICS ◽  
1973 ◽  
Vol 52 (1) ◽  
pp. 64-68
Author(s):  
Iraj Rezvani ◽  
P. J. Collipp ◽  
Angelo M. DiGeorge

A recently developed spot test, "MPS paper," has been added to other screening tests for urinary mucopolysaccharides. The effectiveness of this test has been compared to that of the cetytrimethylammonium bromide and the acid albumin gross turbidity tests in normal children and in patients with mucopolysaccharidoses. Although all these tests are effective in the detection of excessive mucopolysaccharides in urine, their excessive sensitivity yields many weak false-positives. We found "MPS paper" test to yield 34% false-positive tests, compared to 42% for cetytrimethylammonium bromide and 8% for the acid albumin gross turbidity test. We have concluded that the acid albumin gross turbidity is the most reliable screening test for detection of mucopolysaccharide disorders. "MPS paper" spot test has the advantage of being simple and practical, but weak positive results should be interpreted with great caution; it has the added disadvantage of being the most costly of the screening tests at the present time.


2004 ◽  
Vol 50 (6) ◽  
pp. 1012-1016 ◽  
Author(s):  
Andrew W Roddam ◽  
Christopher P Price ◽  
Naomi E Allen ◽  
Anthony Milford Ward ◽  

Abstract Background: Prostate-specific antigen (PSA) is the most widely used serum biomarker to differentiate between malignant and benign prostate disease. Assays that measure PSA can be biased and/or nonequimolar and hence report significantly different PSA values for samples with the same nominal amount. This report investigates the effects of biased and nonequimolar assays on the decision to recommend a patient for a prostate biopsy based on age-specific PSA values. Methods: A simulation model, calibrated to the distribution of PSA values in the United Kingdom, was developed to estimate the effects of bias, nonequimolarity, and analytical imprecision in terms of the rates of men who are recommended to have a biopsy on the basis of their assay-reported PSA values when their true PSA values are below the threshold (false positives) or vice versa (false negatives). Results: False recommendation rates for a calibrated equimolar assay are 0.5–0.9% for analytical imprecision between 5% and 10%. Positive bias leads to significant increases in false positives and significant decreases in false negatives, whereas negative bias has the opposite effect. False-positive rates for nonequimolar assays increase from 0.5% to 13% in the worst-case scenario, whereas false-negative rates are almost always 0%. Conclusions: Biased and nonequimolar assays can have major detrimental effects on both false-negative and false-positive rates for recommending biopsy. PSA assays should therefore be calibrated to the International Standards and be unbiased and equimolar in response to minimize the likelihood of incorrect clinical decisions, which are potentially detrimental for both patient and healthcare provider.


Author(s):  
Thomas Scheier ◽  
Cyril Shah ◽  
Michael Huber ◽  
Hugo Sax ◽  
Barbara Hasse ◽  
...  

AbstractThe rapid spread of the coronavirus disease 2019 pandemic urged immense testing capacities as one cornerstone of infection control. Many institutions opened outpatient SARS-CoV-2 test centers to allow large number of tests in comparatively short time frames. With increasing positive test rates, concerns for a possible airborne or droplet contamination of specimens leading to false-positive results were raised. In our experimental series performed in a dedicated SARS-CoV-2 test center, 40 open collection tubes placed for defined time periods in proximity to individuals were found to be SARS-CoV-2 negative. These findings argue against false-positive SARS-CoV-2 results due to droplet or airborne contamination.


2021 ◽  
pp. 39-55
Author(s):  
R. Barker Bausell

This chapter explores three empirical concepts (the p-value, the effect size, and statistical power) integral to the avoidance of false positive scientific. Their relationship to reproducibility is explained in a nontechnical manner without formulas or statistical jargon, with p-values and statistical power presented in terms of probabilities from zero to 1.0 with the values of most interest to scientists being 0.05 (synonymous with a positive, hence, publishable result) and 0.80 (the most commonly recommended probability that a positive result will be obtained if the hypothesis that generated it was correct and the study will be properly designed and conducted). Unfortunately many scientists circumvent both by artifactually inflating the 0.05 criterion, overstating the available statistical power, and engaging in a number of other questionable research practices. These issues are discussed via statistical models from the genetic and psychological fields and then extended to a number of different p-values, statistical power levels, effect sizes, and the prevalence of “true,” effects expected to exist in the research literature. Among the basic conclusions of these modeling efforts are that employing more stringent p-values and larger sample sizes constitute the most effective statistical approaches for increasing the reproducibility of published results in all empirically based scientific literatures. This chapter thus lays the necessary foundation for understanding and appreciating the effects of appropriate p-values, sufficient statistical power, reaslistic effect sizes, and the avoidance of questionable research practices upon the production of reproducible results.


2020 ◽  
Vol 6 (1) ◽  
pp. 16 ◽  
Author(s):  
Gang Peng ◽  
Yishuo Tang ◽  
Tina M. Cowan ◽  
Gregory M. Enns ◽  
Hongyu Zhao ◽  
...  

Newborn screening (NBS) for inborn metabolic disorders is a highly successful public health program that by design is accompanied by false-positive results. Here we trained a Random Forest machine learning classifier on screening data to improve prediction of true and false positives. Data included 39 metabolic analytes detected by tandem mass spectrometry and clinical variables such as gestational age and birth weight. Analytical performance was evaluated for a cohort of 2777 screen positives reported by the California NBS program, which consisted of 235 confirmed cases and 2542 false positives for one of four disorders: glutaric acidemia type 1 (GA-1), methylmalonic acidemia (MMA), ornithine transcarbamylase deficiency (OTCD), and very long-chain acyl-CoA dehydrogenase deficiency (VLCADD). Without changing the sensitivity to detect these disorders in screening, Random Forest-based analysis of all metabolites reduced the number of false positives for GA-1 by 89%, for MMA by 45%, for OTCD by 98%, and for VLCADD by 2%. All primary disease markers and previously reported analytes such as methionine for MMA and OTCD were among the top-ranked analytes. Random Forest’s ability to classify GA-1 false positives was found similar to results obtained using Clinical Laboratory Integrated Reports (CLIR). We developed an online Random Forest tool for interpretive analysis of increasingly complex data from newborn screening.


2017 ◽  
Author(s):  
Yasset Perez-Riverol ◽  
Max Kun ◽  
Juan Antonio Vizcaíno ◽  
Marc-Phillip Hitz ◽  
Enrique Audain

AbstractWe are moving into the age of ‘Big Data’ in biomedical research and bioinformatics. This trend could be encapsulated in this simple formula: D = S × F, where the volume of data generated (D) increases in both dimensions: the number of samples (S) and the number of sample features (F). Frequently, a typical bioinformatics problem (e.g. classification) includes redundant and irrelevant features that can result, in the worst-case scenario, in false positive results. Then, Feature Selection (FS) constitutes an enormous challenge. Despite the number and diversity of algorithms available, the proper choice of an approach for facing a specific problem often falls in a ‘grey zone’. In this study, we select a subset of FS methods to develop an efficient workflow and an R package for bioinformatics machine learning problems. We cover relevant issues concerning FS, ranging from domain’s problems to algorithm solutions and computational tools. Finally, we use seven different proteomics and gene expression datasets to evaluate the workflow and guide the FS process.


2021 ◽  
Vol 8 (Supplement_1) ◽  
pp. S460-S461
Author(s):  
Andrew Grose ◽  
Rima McLeod

Abstract Background Part of an essential “toolbox” to eliminate Toxoplasma gondii infection is prompt recognition of acute infection acquired during gestation, in order to initiate treatment for congenital toxoplasmosis (CT). From conception to one month post-partum, screening seronegative pregnant women monthly for antibody to the parasite enables treatment that prevents trans-placental transmission of newly acquired maternal Toxoplasma, or that attenuates signs and symptoms of CT. Tests that are highly sensitive and specific—and that meet the other World Health Organization ASSURED criteria for diagnostics—are very useful for this kind of screening. Herein, we evaluated the accuracy of a test that meets these criteria—the LDBIO Toxoplasma ICT IgG-IgM device (LDBIO)—and whether it eliminated difficulties of other tests with false positive IgM results. World Health Organization A.S.S.U.R.E.D. criteria These are criteria for ideal screening or diagnostic tests, as described in a September 2017 paper in the Bulletin of the World Health Organization. Our study focused mostly on sensitivity and specificity for the LDBIO immunochromatography test for IgG and IgM specific to Toxoplasma gondii. Methods Both parts of this study examined results generated by the LDBIO device—a point-of-care immunochromatography test for Toxoplasma IgG and IgM—using serum and whole blood samples. With whole blood, thirty microliters were collected using a glass micro hematocrit tube. With both sera and whole blood, samples were loaded into the well of the LDBIO device, which took 20 minutes to generate results. In the first part of this study, we summarized results from three published U.S. studies and added new data from an ongoing clinical trial at the University of Chicago Medical Center (UCMC). In the second part of this study, we compiled data on how the LDBIO device performed on a total of 69 samples from U.S. and French studies that had led to false positive results when tested with commercially available comparator tests. Four of these false positives came from the UCMC trial. UCMC Feasiblity Study Flowchart Flowchart for ongoing feasibility study on the LDBIO device at the University of Chicago Medical Center. Data from this study may inform whether the LDBIO test—which already has the CE Mark for use in Europe—will receive 510(k) approval from the Food and Drug Administration in the U.S. Steps for Using LDBIO Device (A,B) Clean fingertip; prick with lancet (if collecting whole blood only) (C,D) Collect 30 uL in capillary tube (WB only) (E,F) Apply serum or blood sample to well; add four drops buffer and wait about 20 minutes (G) How to interpret results: black line under “T” corresponds to IgG and/or IgM to T. gondii Results LDBIO had only one false negative for a total of 664 samples from three earlier U.S. studies and the UCMC feasibility study. Meanwhile, out of 69 total false positives from various non-reference laboratory comparator tests, such as the Bio-Rad Platelia and Siemens kits, the LDBIO generated zero false positives. LDBIO's Performance on U.S. Samples Since 2014 In all four U.S. studies (total 664 patients), the LDBIO device generated one false negative result and zero false positive results. LDBIO vs. Comparator Tests Since 2017 In these three clinical settings (69 total samples), LDBIO correctly avoided generating the same false positive that had been generated by a test already cleared for widespread use in the U.S. or France. Conclusion As LDBIO shows high sensitivity and specificity and can avoid confounding false positive results, this device merits consideration as a high-quality screening test that can assist public health efforts to improve CT care worldwide. Countries Working to Implement Regular Prenatal Screening for CT Prevention The countries in green represent countries currently working with the University of Chicago to implement regular prenatal screening programs for Toxoplasma gondii: U.S., Panama, Colombia, Brazil, Morocco, and France. Screening programs in all six countries rely on low-cost, highly-accurate screening technology that meets the WHO's ASSURED criteria. The LDBIO test -- which is already in use in France -- may become a usable resource in the other five countries if it gains FDA approval. Disclosures All Authors: No reported disclosures


Chemotherapy ◽  
2018 ◽  
Vol 63 (6) ◽  
pp. 324-329 ◽  
Author(s):  
Michael S. Ewer ◽  
Jay Herson

Purpose: Cardiac ultrasound provides important structural and functional information that makes identification of cardiac abnormalities possible. Left ventricular ejection fraction (LVEF) provides the most commonly used parameter for recognition of treatment-related cardiac dysfunction. Random reading variance and physiologic factors influence LVEF and make the reported value imperfect. We attempt to quantitate the likelihood of false positive events by computer simulation. Methods: We simulated four visits on hypothetical trials. We assumed a baseline LVEF of 55% and normal distribution with regard to reading error and physiologic variation. 1,000 trials of sample size 1,500 were simulated. In a separate simulation, 1,000 patients entered with LVEFs of 45, 43, and 41% to estimate true positive incidence. Results: At each examination, less than 1.0% of false positives were noted. The cumulative false positive rate over four visits was 3.60%. True cardiotoxicity identification is satisfactory only when LVEF declines substantially. Conclusion: A 3.60% false positive rate in trials where the expected level of toxicity is low suggests that false positives are troubling and may exceed true positive results. Strategies to reduce the number of false positive results include making confirmatory studies mandatory. Evaluating increases along with decreases obtains some estimation of variance.


1981 ◽  
Vol 46 (03) ◽  
pp. 652-654 ◽  
Author(s):  
Dieter Lockner ◽  
Christer Paul ◽  
Birger Hedlund ◽  
Sam Schulman ◽  
Dag Nyman

Summary161 consecutively admitted medical patients with the clinical suspicion of acute deep venous thrombosis (DVT) were thermographed and phlebographed in order to study the congruence of these methods. The sensitivity of thermography in the detection of DVT was found to be 99%, whereas the specificity was only 49%.The low specificity is explained by the fact that all thermographs suggestive of DVT were classified as pathologic to keep the sensitivity of the method as high as possible. Patients with dilated veins which may closely resemble DVT on thermography may in these cases give false positive results.Of 76 patients with phlebographically verified DVT, 22% became thermographically normal within 22 days, whereas 78% did not normalize within the mean observation time of 31 days.In another part of the study all medical patients (101) who were residing in our wards during a period of a week were screened by means of thermography. From this unselected group 17 patients were found to have thermographs suggestive of DVT. In 5 of these patients no reason for pathological thermography could be found.Thermography is a cheap and highly sensitive screening method for DVT, but findings of false positives caused by older thromboses and dilated veins are not unusual. The frequency of such false positives may be minimized by performing thermography after exercise.


Sign in / Sign up

Export Citation Format

Share Document