Assessing Equivalence Tests with Respect to their Expected p-Value

2002 ◽  
Vol 44 (8) ◽  
pp. 1015-1027 ◽  
Author(s):  
Rafael Pflüger ◽  
Torsten Hothorn
Keyword(s):  
P Value ◽  
2018 ◽  
Author(s):  
Daniel Lakens ◽  
Marie Delacre

To move beyond the limitations of null-hypothesis tests, statistical approaches have been developed where the observed data is compared against a range of values that are equivalent to the absence of a meaningful effect. Specifying a range of values around zero allows researchers to statistically reject the presence of effects large enough to matter, and prevents practically insignificant effects from being interpreted as a statistically significant difference. We compare the behavior of the recently proposed second generation p-value (Blume, D’Agostino McGowan, Dupont, & Greevy, 2018) with the more established Two One-Sided Tests (TOST) equivalence testing procedure (Schuirmann, 1987). We show that the two approaches yield almost identical results under optimal conditions. Under suboptimal conditions (e.g., when the confidence interval is wider than the equivalence range, or when confidence intervals are asymmetric) the second generation p-value becomes difficult to interpret as a descriptive statistic. The second generation p-value is interpretable in a dichotomous manner (i.e., when the SGPV equals 0 or 1 because the confidence intervals lies completely within or outside of the equivalence range), but this dichotomous interpretation does not require calculations. We conclude that equivalence tests yield more consistent p-values, distinguish between datasets that yield the same second generation p-value, and allow for easier control of Type I and Type II error rates.


2018 ◽  
Author(s):  
Daniel Lakens ◽  
Neil McLatchie ◽  
Peder Mortvedt Isager ◽  
Anne M. Scheel ◽  
Zoltan Dienes

Researchers often conclude an effect is absent when a null-hypothesis significance test yields a non-significant p-value. However, it is not logically nor statistically correct to conclude an effect is absent when a hypothesis test is not significant. We present two methods to evaluate the presence or absence of effects: Equivalence testing (based on frequentist statistics) and Bayes factors (based on Bayesian statistics). In four examples from the gerontology literature we illustrate different ways to specify alternative models that can be used to reject the presence of a meaningful or predicted effect in hypothesis tests. We provide detailed explanations of how to calculate, report, and interpret Bayes factors and equivalence tests. We also discuss how to design informative studies that can provide support for a null model or for the absence of a meaningful effect. The conceptual differences between Bayes factors and equivalence tests are discussed, and we also note when and why they might lead to similar or different inferences in practice. It is important that researchers are able to falsify predictions or can provide support for predicted null-effects. Bayes factors and equivalence tests provide useful statistical tools to improve inferences about null effects.


2018 ◽  
Vol 75 (1) ◽  
pp. 45-57 ◽  
Author(s):  
Daniël Lakens ◽  
Neil McLatchie ◽  
Peder M Isager ◽  
Anne M Scheel ◽  
Zoltan Dienes

AbstractResearchers often conclude an effect is absent when a null-hypothesis significance test yields a nonsignificant p value. However, it is neither logically nor statistically correct to conclude an effect is absent when a hypothesis test is not significant. We present two methods to evaluate the presence or absence of effects: Equivalence testing (based on frequentist statistics) and Bayes factors (based on Bayesian statistics). In four examples from the gerontology literature, we illustrate different ways to specify alternative models that can be used to reject the presence of a meaningful or predicted effect in hypothesis tests. We provide detailed explanations of how to calculate, report, and interpret Bayes factors and equivalence tests. We also discuss how to design informative studies that can provide support for a null model or for the absence of a meaningful effect. The conceptual differences between Bayes factors and equivalence tests are discussed, and we also note when and why they might lead to similar or different inferences in practice. It is important that researchers are able to falsify predictions or can quantify the support for predicted null effects. Bayes factors and equivalence tests provide useful statistical tools to improve inferences about null effects.


2020 ◽  
Vol 4 ◽  
Author(s):  
Daniël Lakens ◽  
Marie Delacre

To move beyond the limitations of null-hypothesis tests, statistical approaches have been developed where the observed data are compared against a range of values that are equivalent to the absence of a meaningful effect. Specifying a range of values around zero allows researchers to statistically reject the presence of effects large enough to matter, and prevents practically insignificant effects from being interpreted as a statistically significant difference. We compare the behavior of the recently proposed second generation p-value (Blume, D’Agostino McGowan, Dupont, & Greevy, 2018) with the more established Two One-Sided Tests (TOST) equivalence testing procedure (Schuirmann, 1987). We show that the two approaches yield almost identical results under optimal conditions. Under suboptimal conditions (e.g., when the confidence interval is wider than the equivalence range, or when confidence intervals are asymmetric) the second generation p-value becomes difficult to interpret. The second generation p-value is interpretable in a dichotomous manner (i.e., when the SGPV equals 0 or 1 because the confidence intervals lies completely within or outside of the equivalence range), but this dichotomous interpretation does not require calculations. We conclude that equivalence tests yield more consistent p-values, distinguish between datasets that yield the same second generation p-value, and allow for easier control of Type I and Type II error rates.


2019 ◽  
Author(s):  
Daniel Lakens

Due to the strong overreliance on p-values in the scientific literature some researchers have argued that p-values should be abandoned or banned, and that we need to move beyond p-values and embrace practical alternatives. When proposing alternatives to p-values statisticians often commit the ‘Statistician’s Fallacy’, where they declare which statistic researchers really ‘want to know’. Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. In some situations, the answer to the question they are most interested in will be the p-value. As long as null-hypothesis tests have been criticized, researchers have suggested to include minimum-effect tests and equivalence tests in our statistical toolbox, and these tests (even though they return p-values) have the potential to greatly improve the questions researchers ask. It is clear there is room for improvement in how we teach p-values. If anyone really believes p-values are an important cause of problems in science, preventing the misinterpretation of p-values by developing better evidence-based education and user-centered statistical software should be a top priority. Telling researchers which statistic they should use has distracted us from examining more important questions, such as asking researchers what they want to know when they do scientific research. Before we can improve our statistical inferences, we need to improve our statistical questions.


2021 ◽  
pp. 174569162095801
Author(s):  
Daniël Lakens

Because of the strong overreliance on p values in the scientific literature, some researchers have argued that we need to move beyond p values and embrace practical alternatives. When proposing alternatives to p values statisticians often commit the “statistician’s fallacy,” whereby they declare which statistic researchers really “want to know.” Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. In some situations, the answer to the question they are most interested in will be the p value. As long as null-hypothesis tests have been criticized, researchers have suggested including minimum-effect tests and equivalence tests in our statistical toolbox, and these tests have the potential to greatly improve the questions researchers ask. If anyone believes p values affect the quality of scientific research, preventing the misinterpretation of p values by developing better evidence-based education and user-centered statistical software should be a top priority. Polarized discussions about which statistic scientists should use has distracted us from examining more important questions, such as asking researchers what they want to know when they conduct scientific research. Before we can improve our statistical inferences, we need to improve our statistical questions.


Author(s):  
Stephen Thomas ◽  
Ankur Patel ◽  
Corey Patrick ◽  
Gary Delhougne

AbstractDespite advancements in surgical technique and component design, implant loosening, stiffness, and instability remain leading causes of total knee arthroplasty (TKA) failure. Patient-specific instruments (PSI) aid in surgical precision and in implant positioning and ultimately reduce readmissions and revisions in TKA. The objective of the study was to evaluate total hospital cost and readmission rate at 30, 60, 90, and 365 days in PSI-guided TKA patients. We retrospectively reviewed patients who underwent a primary TKA for osteoarthritis from the Premier Perspective Database between 2014 and 2017 Q2. TKA with PSI patients were identified using appropriate keywords from billing records and compared against patients without PSI. Patients were excluded if they were < 21 years of age; outpatient hospital discharges; evidence of revision TKA; bilateral TKA in same discharge or different discharges. 1:1 propensity score matching was used to control patients, hospital, and clinical characteristics. Generalized Estimating Equation model with appropriate distribution and link function were used to estimate hospital related cost while logistic regression models were used to estimate 30, 60, and 90 days and 1-year readmission rate. The study matched 3,358 TKAs with PSI with TKA without PSI patients. Mean total hospital costs were statistically significantly (p < 0.0001) lower for TKA with PSI ($14,910; 95% confidence interval [CI]: $14,735–$15,087) than TKA without PSI patients ($16,018; 95% CI: $15,826–$16,212). TKA with PSI patients were 31% (odds ratio [OR]: 0.69; 95% CI: 0.51–0.95; p-value = 0.0218) less likely to be readmitted at 30 days; 35% (OR: 0.65; 95% CI: 0.50–0.86; p-value = 0.0022) less likely to be readmitted at 60 days; 32% (OR: 0.68; 95% CI: 0.53–0.88; p-value = 0.0031) less likely to be readmitted at 90 days; 28% (OR: 0.72; 95% CI: 0.60–0.86; p-value = 0.0004) less likely to be readmitted at 365 days than TKA without PSI patients. Hospitals and health care professionals can use retrospective real-world data to make informed decisions on using PSI to reduce hospital cost and readmission rate, and improve outcomes in TKA patients.


Author(s):  
Jason D. Tegethoff ◽  
Rafael Walker-Santiago ◽  
William M. Ralston ◽  
James A. Keeney

AbstractIsolated polyethylene liner exchange (IPLE) is infrequently selected as a treatment approach for patients with primary total knee arthroplasty (TKA) prosthetic joint instability. Potential advantages of less immediate surgical morbidity, faster recovery, and lower procedural cost need to be measured against reoperation and re-revision risk. Few published studies have directly compared IPLE with combined tibial and femoral component revision to treat patients with primary TKA instability. After obtaining institutional review board (IRB) approval, we performed a retrospective comparison of 20 patients treated with IPLE and 126 patients treated with tibial and femoral component revisions at a single institution between 2011 and 2018. Patient demographic characteristics, medical comorbidities, time to initial revision TKA, and reoperation (90 days, <2 years, and >2 years) were assessed using paired Student's t-test or Fisher's exact test with a p-value <0.01 used to determine significance. Patients undergoing IPLE were more likely to undergo reoperation (60.0 vs. 17.5%, p = 0.001), component revision surgery (45.0 vs. 8.7%, p = 0.002), and component revision within 2 years (30.0 vs. 1.6%, p < 0.0001). Differences in 90-day reoperation (p = 0.14) and revision >2 years (p = 0.19) were not significant. Reoperation for instability (30.0 vs. 4.0%, p < 0.001) and infection (20.0 vs. 1.6%, p < 0.01) were both higher in the IPLE group. IPLE does not provide consistent benefits for patients undergoing TKA revision for instability. Considerations for lower immediate postoperative morbidity and cost need to be carefully measured against long-term consequences of reoperation, delayed component revision, and increased long-term costs of multiple surgical procedures. This is a level III, case–control study.


2018 ◽  
Vol 2 (3) ◽  
pp. 111
Author(s):  
Aswindar Adhi Gumilang ◽  
Tri Pitara Mahanggoro ◽  
Qurrotul Aini

The public demand for health service professionalism and transparent financial management made some Puskesmas in Semarang regency changed the status of public health center to BLUD. The implementation of Puskesmas BLUD and non-BLUD requires resources that it can work well in order to meet the expectations of the community. The aim of this study is to know the difference of work motivation and job satisfaction of employees in Puskesmas BLUD and non-BLUD. Method of this research is a comparative descriptive with a quantitative approach. The object of this research are work motivation and job satisfaction of employees in Puskesmas BLUD and non-BLUD Semarang regency. This Research showed that Sig value. (P-value) work motivation variable was 0.019 smaller than α value (0.05). It showed that there was a difference of work motivation of employees in Puskemas BLUD and non-BLUD. Sig value (P-value) variable of job satisfaction was 0.020 smaller than α value (0.05). It showed that there was a difference of job satisfaction of BLUD and non-BLUD. The average of non-BLUD employees motivation were 76.59 smaller than the average of BLUD employees were 78.25. The average of job satisfaction of BLUD employees were 129.20 bigger than the average of non-BLUD employee were 124.26. Job satisfaction of employees in Puskesmas BLUD was higher than non-BLUD employees.


Sign in / Sign up

Export Citation Format

Share Document