Reverse Bayesian Implications of p-Values Reported in Critical Care Randomized Trials

2021 ◽  
pp. 088506662110537
Author(s):  
Sarah Nostedt ◽  
Ari R. Joffe

Background Misinterpretations of the p-value in null-hypothesis statistical testing are common. We aimed to determine the implications of observed p-values in critical care randomized controlled trials (RCTs). Methods We included three cohorts of published RCTs: Adult-RCTs reporting a mortality outcome, Pediatric-RCTs reporting a mortality outcome, and recent Consecutive-RCTs reporting p-value ≤.10 in six higher-impact journals. We recorded descriptive information from RCTs. Reverse Bayesian implications of obtained p-values were calculated, reported as percentages with inter-quartile ranges. Results Obtained p-value was ≤.005 in 11/216 (5.1%) Adult-RCTs, 2/120 (1.7%) Pediatric-RCTs, and 37/90 (41.1%) Consecutive-RCTs. An obtained p-value .05–.0051 had high False Positive Rates; in Adult-RCTs, minimum (assuming prior probability of the alternative hypothesis was 50%) and realistic (assuming prior probability of the alternative hypothesis was 10%) False Positive Rates were 16.7% [11.2, 21.8] and 64.3% [53.2, 71.4]. An obtained p-value ≤.005 had lower False Positive Rates; in Adult-RCTs the realistic False Positive Rate was 7.7% [7.7, 16.0]. The realistic probability of the alternative hypothesis for obtained p-value .05–.0051 (ie, Positive Predictive Value) was 28.0% [24.1, 34.8], 30.6% [27.7, 48.5], 29.3% [24.3, 41.0], and 32.7% [24.1, 43.5] for Adult-RCTs, Pediatric-RCTs, Consecutive-RCTs primary and secondary outcome, respectively. The maximum Positive Predictive Value for p-value category .05–.0051 was median 77.8%, 79.8%, 78.8%, and 81.4% respectively. To have maximum or realistic Positive Predictive Value >90% or >80%, RCTs needed to have obtained p-value ≤.005. The credibility of p-value .05–.0051 findings were easy to challenge, and the credibility to rule-out an effect with p-value >.05 to .10 was low. The probability that a replication study would obtain p-value ≤.05 did not approach 90% unless the obtained p-value was ≤.005. Conclusions Unless the obtained p-value was ≤.005, the False Positive Rate was high, and the Positive Predictive Value and probability of replication of “statistically significant” findings were low.

2019 ◽  
Author(s):  
Rayees Rahman ◽  
Arad Kodesh ◽  
Stephen Z Levine ◽  
Sven Sandin ◽  
Abraham Reichenberg ◽  
...  

AbstractImportanceCurrent approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, where most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome.ObjectiveDevelop a machine learning (ML) method predicting the diagnosis of ASD in offspring in a general population sample, using parental electronic medical records (EMR) available before childbirthDesignPrognostic study of EMR data within a single Israeli health maintenance organization, for the parents of 1,397 ASD children (ICD-9/10), and 94,741 non-ASD children born between January 1st, 1997 through December 31st, 2008. The complete EMR record of the parents was used to develop various ML models to predict the risk of having a child with ASD.Main outcomes and measuresRoutinely available parental sociodemographic information, medical histories and prescribed medications data until offspring’s birth were used to generate features to train various machine learning algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross validation, by computing C statistics, sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value, PPV).ResultsAll ML models tested had similar performance, achieving an average C statistics of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85% for predicting ASD in this dataset.Conclusion and relevanceML algorithms combined with EMR capture early life ASD risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.Key pointsQuestionCan autism risk in children be predicted using the pre-birth electronic medical record (EMR) of the parents?FindingsIn this population-based study that included 1,397 children with autism spectrum disorder (ASD) and 94,741 non-ASD children, we developed a machine learning classifier for predicting the likelihood of childhood diagnosis of ASD with an average C statistic of 0.70, sensitivity of 28.63%, specificity of 98.62%, accuracy of 96.05%, false positive rate of 1.37%, and positive predictive value of 45.85%.MeaningThe results presented serve as a proof-of-principle of the potential utility of EMR for the identification of a large proportion of future children at a high-risk of ASD.


Blood ◽  
2007 ◽  
Vol 110 (11) ◽  
pp. 2398-2398
Author(s):  
Steven M. Kornblau ◽  
YiHua Qiu ◽  
Wenjing Chen ◽  
Srdan Verstovsek ◽  
Kevin R. Coombes ◽  
...  

Abstract We have used Reverse Phase Protein Arrays (RPPA) to perform proteomic profiling in Acute Myelogenous Leukemia (AML) focusing on cell cycle, apoptosis and signal transduction pathway proteins (ASH 2006, abstract #107). Protein expression signatures were derived from this dataset of 436 AML patients, analyzed for 30 total and 22 phopho- proteins. The predictive ability of these RPPA derived protein expression signatures has not been prospectively tested to determine if they are valid. This dataset presented an opportunity to validate this as there was a population of patients with known FLT3-ITD and D835 mutation status (n=297) and another population where the status was unknown (n=139), among which 55 had sufficient sample available for mutation analysis. Prior to performing the mutation analysis a predictive model was built using linear regression with part of the data utilized for training and the reminder for validation. The model was designed to predict for the presence of mutation, either ITD or D835, although there are differences int eh signature of each. The total population had 85 cases with FLT3-ITD and 15 with the D835 mutation. The optimal model that was developed, using 30%, 50% and 70% of the samples for training and the remainder for validation, had a median validation accuracy of 68%, 70% and 72% respectively. Prospective predictions of FLT3-ITD or D835 mutation status were then made for all samples lacking FLT3-ITD or D835 mutation data. Mutation analysis was then performed using PCR amplification followed by 2-D gel electrophoresis (FLT3-ITD) to evaluate for PCR product size, or sequencing (D835) on 55 samples. This revealed 9 cases with FLT3-ITD, 3 with a D835 mutation, 1 with both and 43 without mutation. Among these 55 cases the model correctly predicted that 8 of the 12 mutant cases would be mutant including 8 of 10 with a FLT3-ITD, but 0 of 2 with only the D835 mutation. Among the 43 wildtype cases 36 were accurately predicted to be wildtype, while 7 were incorrectly predicted to have the mutation mutant. This yields an overall accuracy (OA) of 80%, Sensitivity =66%, Specificity=90%, Positive Predictive Value (PPV) of 53%, False positive rate of (FPR) of 16%. Since most patients with FLT3-ITD have Diploid cytogenetics we also looked at the predictive accuracy of the protein expression signature in that population. Among 23 patients with Diploid cytogenetics the overall accuracy was OA) of 83%, Sensitivity =75%, Specificity=87%, Positive Predictive Value (PPV) of 75%, False positive rate of (FPR) of 13%. Since FLT3-ITD and D835 carry different prognostic impact, and had different protein expression signatures, greater accuracy might be achieved if separate models were developed for each mutation individually. The model demonstrated that RPPA derived protein expression signatures can accurately be used to predict mutation status providing the first prospective validation of protein expression signatures in AML.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 288-288
Author(s):  
Takeyuki Wada ◽  
Takaki Yoshikawa ◽  
Ayako Kamiya ◽  
Keichi Date ◽  
Tsutomu Hayashi ◽  
...  

288 Background: D2 surgery is required for clinical T1 gastric cancer with nodal swelling, however, D2 has a higher risk for morbidity than D1/D1+. Moreover, previous study demonstrated that the false positive rate for nodal diagnosis in clinical T1 was very high. To select optimal surgery with high probability, we explored risk factors for false positivity in clinical T1 disease. Methods: Patients who underwent radical gastrectomy for clinical T1 gastric cancer between April 2015 and June 2019 were enrolled. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive values for nodal diagnosis were retrospectively investigated. The risk factors for false positivity were also analyzed by the following factors; age, sex, histological type, tumor size, tumor depth, location, tumor type, presence of ulcer, and timing of CT that is (1) the patients who underwent primary endoscopic mucosal dissection (ESD) but resulted in non-curative resection, then received CT to proceed to surgery (delayed CT group) or (2) the other patients who had received CT before primary surgery or before non-curative ESD (primary CT group). Results: A total of 679 patients were examined in the present study. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were 83.5% (567/679), 14.3% (13/91), 94.2% (554/588), 27.7% (13/47), and 87.7% (554/632), respectively. The false positive rate was 72.3% (34/47). In univariate analysis, differentiated tumor ( p= 0.012) and delayed CT (p < 0.001) were associated with the false positivity. Multivariate analysis revealed that delayed CT (OR, 4.534; p < 0.001) was a sole significant risk factor for false positivity. False positive rate was 100% (13/13) in the delayed CT group and 61.8% (21/34) in the primary CT group ( p= 0.009). Conclusions: False positive rate was high in clinical T1 disease, especially when the patients received delayed CT after non-curative ESD. D2 surgery would be unnecessary even though nodal swelling was detected in CT after non-curative ESD.


Author(s):  
Pamela Reinagel

AbstractAfter an experiment has been completed and analyzed, a trend may be observed that is “not quite significant”. Sometimes in this situation, researchers incrementally grow their sample size N in an effort to achieve statistical significance. This is especially tempting in situations when samples are very costly or time-consuming to collect, such that collecting an entirely new sample larger than N (the statistically sanctioned alternative) would be prohibitive. Such post-hoc sampling or “N-hacking” is condemned, however, because it leads to an excess of false positive results. Here Monte-Carlo simulations are used to show why and how incremental sampling causes false positives, but also to challenge the claim that it necessarily produces alarmingly high false positive rates. In a parameter regime that would be representative of practice in many research fields, simulations show that the inflation of the false positive rate is modest and easily bounded. But the effect on false positive rate is only half the story. What many researchers really want to know is the effect N-hacking would have on the likelihood that a positive result is a real effect that will be replicable: the positive predictive value (PPV). This question has not been considered in the reproducibility literature. The answer depends on the effect size and the prior probability of an effect. Although in practice these values are not known, simulations show that for a wide range of values, the PPV of results obtained by N-hacking is in fact higher than that of non-incremented experiments of the same sample size and statistical power. This is because the increase in false positives is more than offset by the increase in true positives. Therefore in many situations, adding a few samples to shore up a nearly-significant result is in fact statistically beneficial. In conclusion, if samples are added after an initial hypothesis test this should be disclosed, and if a p value is reported it should be corrected. But, contrary to widespread belief, collecting additional samples to resolve a borderline p value is not invalid, and can confer previously unappreciated advantages for efficiency and positive predictive value.


Author(s):  
Lili Yang ◽  
Yu Zhang ◽  
Jianbin Yang ◽  
Xinwen Huang

Background Birth weight influences profiles of dried blood amino-acids and acylcarnitines in newborn screening. This study aimed to define a more appropriate cut-off value to reduce the false positive rate and the number of recalled patients in newborn screening. Methods All babies who underwent newborn screening in our center were included; they were divided into groups by birth weight: 2500–3999 g (comparator group), <1000 g (group 1), 1000–1499 g (group 2), 1500–2499 g (group 3), and >4000 g (group 4). The 0.5th and 99.5th percentiles were used as the cut-off values. Comparisons were done on amino acid and acylcarnitines concentrations between the groups. False positive rate, positive predictive value, corrected false positive rate by birth weights were determined. Results Data on a total of 578,287 newborn infants were included in the analysis. The total false positive rate was 0.75%, and positive predictive value 2.89%. The false positive rate was 0.69%, 0.54% and 5.31% in infants with normal birth weight, birth weight of >4000 (group 4) and low birth weight of < 2500 g (groups 1, 2 and 3), respectively. Low-birth weight infants had much higher phenylalanine, tyrosine, methionine, arginine, propionylcarnitine, isovalerylcarnitine and octadecanoylcarnitine concentrations. Free carnitines and palmitoylcarnitine concentrations were lower. After adjusting for birth weight, false positive rate of all indices decreased to 0.53%, and positive predictive value increased to 4.31%. Conclusions Amino acid and carnitine concentrations in low-birth weight newborn infants may differ from the normal term newborn infants. The cut-off values of individual metabolites should be adjusted based on birth weight, to reduce false positive rate and increase positive predictive value.


Author(s):  
M Fabre ◽  
S Ruiz-Martinez ◽  
ME Monserrat Cantera ◽  
A Cortizo Garrido ◽  
Z Beunza Fabra ◽  
...  

Background An increasing body of evidence has revealed that SARS-CoV-2 infection in pregnant women could increase the risk of adverse maternal and fetal outcomes. Careful monitoring of pregnancies with COVID-19 and measures to prevent neonatal infection are warranted. Therefore, rapid antibody tests have been suggested as an efficient screening tool during pregnancy. Cases We analysed the clinical performance during pregnancy of a rapid, lateral-flow immunochromatographic assay for qualitative detection of SARS-CoV-2 IgG/IgM antibodies. We performed a universal screening including 169 patients during their last trimester of pregnancy. We present a series of 14 patients with positive SARS-CoV-2 immunochromatographic assay rapid test result. Immunochromatographic assay results were always confirmed by chemiluminescent microparticle immunoassays for quantitative detection of SARS-CoV-2 IgG and IgM+IgA antibodies as the gold standard. We observed a positive predictive value of 50% and a false positive rate of 50% in pregnant women, involving a significantly lower diagnostic performance than reported in non-pregnant patients. Discussion Our data suggest that although immunochromatographic assay rapid tests may be a fast and profitable screening tool for SARS-CoV-2 infection, they may have a high false positive rate and low positive predictive value in pregnant women. Therefore, immunochromatographic assay for qualitative detection of SARS-CoV-2 IgG/IgM antibodies must be verified by other test in pregnant patients.


2020 ◽  
Vol 25 (Supplement_2) ◽  
pp. e11-e11
Author(s):  
Danny Jomaa ◽  
Matthew Henderson ◽  
Steven Hawken ◽  
Pranesh Chakraborty

Abstract Background Newborn screening for congenital adrenal hyperplasia is performed using a two-tier approach. The first tier involves comparison of neonate 17-hydroxyprogesterone levels to gestational age (GA)-based thresholds. When GA is unreported, which occurs in approximately 5% of births, birth weight (BW)-based thresholds are the only available option. However, these have a lower specificity and result in more false positive results. Recently, a predictive model was developed to estimate GA based on newborn demographics and the screening analytes measured in a blood sample. Objectives The objective of this study was to determine whether supplying a predicted GA to newborns with unreported GA, and subsequent GA-based screening, has a higher positive predictive value than BW-based screening. Design/Methods Screening data was obtained for approximately 700,000 births that occurred in Canada between 2011 and 2015. Predicted GA was calculated using a model composed of demographic and screening analyte factors. The positive predictive values of BW- and predicted GA-based screening were calculated for newborns with unreported GA. A sequential approach was then developed whereby newborns with unreported GA were first screened by BW-based screening. Newborns that screened positive were then supplied with their predicted GA and screened using GA-based thresholds. Results First-tier CAH screening using GA-based 17-hydroxyprogesterone thresholds had a higher positive predictive value than using BW-based thresholds (1.30% vs. 0.82%). In the study time period, 3.61% of newborns had an unreported GA. For these newborns, predicted GA-based screening had a higher positive predictive value than BW-based screening (0.83% vs. 0.76%) and correctly identified the 2 infants with CAH whose GA was unreported. A sequential screening approach was then used: BW-based screening and, for the screen positive population, predicted GA-based screening. This further increased the positive predictive value compared to BW-based screening (0.95% vs. 0.76%), reduced the false positive rate, and correctly identified true positive cases. Conclusion Reducing the false positive rate of CAH screening is important to prevent unnecessary second-tier screening and referrals. For newborns with unreported GA (4-5% of all births), BW-based screening is the only currently available approach. However, this approach has a poor specificity and a high false positive rate compared to GA-based screening. This study is the first to demonstrate an alternative screening strategy with a higher positive predictive value for newborns with unreported GA.


2020 ◽  
pp. 019459982095309
Author(s):  
Scott H. Troob ◽  
Quinn Self ◽  
Deniz Gerecci ◽  
Macgregor Hodgson ◽  
Javier González-Castro ◽  
...  

Objective To describe the utility of venous flow couplers in monitoring free tissue flaps in the immediate postoperative setting. Study Design Retrospective case series. Setting Otolaryngology department at a single tertiary care institution. Methods A retrospective case series of free flap reconstructions in which venous flow couplers were employed to supplement flap monitoring. All free flap cases performed over the past 4 years were reviewed. Inclusion criteria were venous flow coupler and arterial flow Doppler monitored for 5 days postoperatively. Results From July 2014 through May 2018, the venous flow coupler was used with the arterial flow Doppler and clinical monitoring in 228 cases. Eleven cases did not meet criteria for inclusion; thus, 217 cases were analyzed. Twenty cases (9.2%) returned to the operating room with concern for flap compromise, and 16 were salvaged. The combination of venous flow coupler and arterial flow Doppler identified 19 of these flaps. Venous flow couplers identified 5 compromised flaps before there was an arterial signal change, and all were salvaged. Additionally, there was a 24.1% false-positive rate when 2 venous flow couplers were used in parallel. For the venous flow coupler, the positive predictive value was 64.3% and the negative predictive value, 98.9%. The false-positive rate in the series was 5.1%. The sensitivity was 90% and the specificity, 94.9%. Conclusion The venous flow coupler is able to detect venous thrombosis in the absence of arterial thrombosis and may contribute to improved flap salvage rates.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Wanaporn Burivong ◽  
Thanatorn Sricharoen ◽  
Apichart Thachang ◽  
Sunsiree Soodchuen ◽  
Panitpong Maroongroge ◽  
...  

Objective. The purpose of this study is to compare the early radiologic diagnosis of pulmonary infection between serial chest radiography (chest film) and single chest computed tomography (CT chest) in the first seven days of febrile neutropenia. Methods. This study included 78 patients with hematologic malignancies who developed 107 episodes of febrile neutropenia from January 2012 to October 2017 and had a chest film performed within the first seven days. Demographic and radiographic data were retrospectively reviewed. Three radiologists independently and blindly evaluated chest films and CT chests. The sensitivity, specificity, and correlation of chest film with absolute neutrophil count were carried out. Results. A total of 222 chest films were performed during this period and found thirty-nine episodes (36.4%) of radiographic active pulmonary infection. The diagnosis of clinical positive for pulmonary infection is 44.8% (48/107). Sensitivity, specificity, positive predictive value, and negative predictive value of serial chest film in the early radiologic diagnosis of pulmonary infection are 50%, 74%, 61%, and 64%, respectively. The false-positive rate was 14%, and the false-negative rate was 22%. For single CT chest examinations, twenty-six studies were assessed, and 42.3% was indicative of radiographic active pulmonary infection. Sensitivity, specificity, positive predictive value, and negative predictive value of CT chest in the early radiologic diagnosis of pulmonary infection are 91%, 40%, 53%, and 86%, respectively. The false-positive rate was 60%. The absolute neutrophil count was not useful for predicting radiographic active pulmonary infection. Conclusion. Serial chest film for early radiologic diagnosis of pulmonary infection within the first seven days of febrile neutropenia has lower sensitivity with higher specificity as compared to a single CT chest. Conversely, CT chest may not only have a higher sensitivity in determining early pulmonary infection but also has a higher rate of false-positives.


Sign in / Sign up

Export Citation Format

Share Document