Noninformative Bayesian $P$-values for testing marginal homogeneity in $2 \times 2$ contingency tables

This article considers the problem of testing marginal homogeneity in $2 \times 2$ contingency tables under the multinomial sampling scheme. From the frequentist perspective, McNemar's exact $p$-value ($p_{_{\textsl ME}}$) is the most commonly used $p$-value in practice, but it can be conservative for small to moderate sample sizes. On the other hand, from the Bayesian perspective, one can construct Bayesian $p$-values by using the proper prior and posterior distributions, which are called the prior predictive $p$-value ($p_{prior}$) and the posterior predictive $p$-value ($p_{post}$), respectively. Another Bayesian $p$-value is called the partial posterior predictive $p$-value ($p_{ppost}$), first proposed by [2], which can avoid the double use of the data that occurs in $p_{post}$. For the preceding problem, we derive $p_{prior}$, $p_{post}$, and $p_{ppost}$ based on the noninformative uniform prior. Under the criterion of uniformity in the frequentist sense, comparisons among $p_{prior}$, $p_{_{{\textsl ME}}}$, $p_{post}$ and $p_{ppost}$ are given. Numerical results show that $p_{ppost}$ has the best performance for small to moderately large sample sizes.

Download Full-text

Use of the p-values as a size-dependent function to address practical differences when analyzing large datasets

Scientific Reports ◽

10.1038/s41598-021-00199-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Estibaliz Gómez-de-Mariscal ◽

Vanesa Guerrero ◽

Alexandra Sneider ◽

Hasini Jayatilaka ◽

Jude M. Phillip ◽

...

Keyword(s):

Sample Size ◽

Null Hypothesis ◽

P Value ◽

Specific Situation ◽

Sample Sizes ◽

P Values ◽

Large Sample ◽

Size Dependent ◽

Depth Study

AbstractBiomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold, commonly set to 0.05, to assess the evidence against the hypothesis of having non-significant differences among groups, or the null hypothesis. Because the estimated p-value tends to decrease when the sample size is increased, applying this methodology to datasets with large sample sizes results in the rejection of the null hypothesis, making it not meaningful in this specific situation. We propose a new approach to detect differences based on the dependence of the p-value on the sample size. We introduce new descriptive parameters that overcome the effect of the size in the p-value interpretation in the framework of datasets with large sample sizes, reducing the uncertainty in the decision about the existence of biological differences between the compared experiments. The methodology enables the graphical and quantitative characterization of the differences between the compared experiments guiding the researchers in the decision process. An in-depth study of the methodology is carried out on simulated and experimental data. Code availability at https://github.com/BIIG-UC3M/pMoSS.

Download Full-text

HLA Haplotypes Are Associated with Multiple Myeloma Risk in the African American Multiple Myeloma Study (AAMMS)

Blood ◽

10.1182/blood.v128.22.3250.3250 ◽

2016 ◽

Vol 128 (22) ◽

pp. 3250-3250

Author(s):

Loren Gragert ◽

Amie Hwang ◽

Leon Bernal-Mizrachi ◽

Sikander Ailawadhi ◽

Seema Singhal ◽

...

Keyword(s):

Multiple Myeloma ◽

African American ◽

Hla Class I ◽

Hla Typing ◽

Class I ◽

European Ancestry ◽

Sample Sizes ◽

P Values ◽

Hla Alleles ◽

Large Sample

Abstract Background: Persons of African ancestry (AA) have a 2-3-fold higher risk of multiple myeloma (MM) than persons of European ancestry (EA). Like other B-cell malignancies, genome-wide association scans (GWAS) have identified MM risk variants in the HLA region in persons of EA. We conducted a case-control analysis with data from the National Marrow Donor Program (NMDP)1comprising MM patients typed for bone marrow transplant to donor controls matched by race-ethnicity, and found associations between specific HLA alleles/haplotypes and MM risk that varied by race and ethnicity. To confirm our results and identify additional novel signals, we have now investigated associations between HLA alleles and haplotypes and MM risk in the African American Multiple Myeloma Study (AAMMS) Cohort. Methods: The source of subjects was the AAMMS, in which AA MM patients were identified from 10 cancer centers and 4 Surveillance, Epidemiology and End-Results (SEER) Program cancer registries in order to identify genetic risk factors for MM among AAs. A GWAS was conducted using the Illumina Human Core BeadChip array on DNA samples from 1,305 AA MM patients in the AAMMS comparing results to those from 7,078 AA controls with GWAS data generated from the Illumina 1MDuo2. The major histocompatibility complex (MHC) region single nucleotide polymorphisms (SNPs) were imputed to classical HLA variants using HIBAG. Unconditional logistic regression was used to estimate HLA associations, adjusting for sex, age and the first 2 principal components. P-values were adjusted for false discovery rate (FDR) for each locus group. Results: We did not identify any single HLA alleles associated with MM risk among AAs. However, several B*07:02-containing haplotypes were associated with MM risk (odds ratios [OR] ranging from 2.38 to 2.64 and FDR P-values ranging from 1.43 x 10-6 to 3.57 x 10-8). We found associations between MM risk and genotypes containing DRB3*02:02, including DRB3*02:02~DRB1*11:01+ DRB3*02:02~DRB1*11:01 (OR=1.93, PFDR= 9.36 x 10-5) similar to those observed in the NMDP study1. Novel findings included associations between MM risk and HLA Class I haplotypes B*53:01+ B*57:01 (OR=1.94, PFDR= 0.003) and C04:01~B*53:01+C*06:02~B*57:01 (OR=1.96, PFDR= 0.0050). Results from an ongoing meta-analysis between the two data sets (one based on an imputed GWAS and one based on NMDP HLA typing) will be presented. Conclusions: This study is the second to examine HLA alleles and risk of MM among AA's and is by far the largest. We confirmed a previously observed association between an HLA Class II DRB3 variant and MM risk and confirmed an association with B*07 haplotypes previously observed among EAs1. We also identified novel associations between other HLA Class I haplotypes and MM risk in AA's. Because HLA is highly polymorphic, many HLA alleles are rare variants for which genetic associations are difficult to detect without very large sample sizes. Further investigation with large sample sizes will be necessary to refine these associations in order to better identify the underlying causal alleles and determine the functional significance of these HLA associations. 1Beksac M, Gragert L, Fingerson S, et al.: HLA polymorphism and risk of multiple myeloma.Leukemia. 2016 Jul 27. doi: 10.1038/leu.2016.199. 2Rand KA, Song C, Hwang AE, et al. Genetic susceptibility markers of multiple myeloma in African-Americans. Abstract # 2030, 56th Annual American Society of Hematology Meeting, San Francisco, California, 2014. Disclosures Ailawadhi: Pharmacyclics: Consultancy; Novartis: Consultancy; Amgen Inc: Consultancy; Takeda Oncology: Consultancy. Nooka:Spectrum, Novartis, Onyx pharmaceuticals: Consultancy. Zonder:Pharmacyclics: Other: DSMC membership; Prothena: Consultancy, Honoraria; Celgene: Consultancy, Honoraria, Research Funding; Bristol Myers Squibb: Consultancy, Honoraria; Seattle Genetics: Consultancy, Honoraria; Takeda: Consultancy, Honoraria; Janssen: Consultancy, Honoraria. Lonial:BMS: Consultancy; Novartis: Consultancy; Millenium: Consultancy; Celgene: Consultancy; Janssen: Consultancy; Merck: Consultancy; Celgene: Consultancy; BMS: Consultancy; Novartis: Consultancy; Onyx: Consultancy; Janssen: Consultancy; Onyx: Consultancy.

Download Full-text

Accuracy of p-values of approximate tests in testing for equality of means under unequal variances

Mathematica Slovaca ◽

10.2478/s12175-009-0156-x ◽

2009 ◽

Vol 59 (6) ◽

Author(s):

Júlia Volaufová

Keyword(s):

Fixed Effects ◽

Linear Models ◽

Parametric Bootstrap ◽

Small Sample ◽

P Value ◽

Sample Sizes ◽

F Test ◽

P Values ◽

Unequal Variances ◽

Equality Of Means

AbstractSeemingly, testing for fixed effects in linear models with variance-covariance components has been solved for decades. However, even in simple situations such as in fixed one-way model with heteroscedastic variances (a multiple means case of the Behrens-Fisher problem) the questions of statistical properties of various approximations of test statistics are still alive. Here we present a brief overview of several approaches suggested in the literature as well as those available in statistical software, accompanied by a simulation study in which the accuracy of p-values is studied. Our interest is limited here to the Welch’s test, the Satterthwaite-Fai-Cornelius test, the Kenward-Roger test, the simple ANOVA F-test, and the parametric bootstrap test. We conclude that for small sample sizes, regardless the number of compared means and the heterogeneity of variance, the ANOVA F-test p-value performs the best. For higher sample sizes (at least 5 per group), the parametric bootstrap performs well, and the Kenward-Roger test also performs well.

Download Full-text

Conservative bounds on extreme P-values for testing the equality of two probabilities based on very large sample sizes

Institute of Mathematical Statistics Lecture Notes - Monograph Series - A Festschrift for Herman Rubin ◽

10.1214/lnms/1196285395 ◽

2004 ◽

pp. 250-254

Author(s):

Herman Chernoff

Keyword(s):

Sample Sizes ◽

P Values ◽

Large Sample

Download Full-text

Assessing the Reliability of Blind Wine Tasting: Differentiating Levels of Clinical and Statistical Meaningfulness

Journal of Wine Economics ◽

10.1017/s1931436100000432 ◽

2007 ◽

Vol 2 (2) ◽

pp. 196-202 ◽

Cited By ~ 6

Author(s):

Domenic V. Cicchetti

Keyword(s):

Clinical Significance ◽

Statistical Significance ◽

Weighted Kappa ◽

Jel Classification ◽

The Other ◽

Sample Sizes ◽

P Values ◽

Kappa Value ◽

Clinical Meaning ◽

The U.S

AbstractThe author distinguishes between the clinical and statistical meaning of varying levels of intertaster reliability for the 11 judges who evaluated 10 Chardonnays (6 American and 4 French) in the heralded 1976 Paris wine competition. Four wines showed levels of weighted kappa values (<0.40), that are considered poor by established biostatistical criteria. These ranged between 0.10, for the French Beaune Clos des Mouches 1973 Chardonnay to 0.33 for the U.S. Veedercrest 1972 Chardonnay. However, when levels of statistical significance of the weighted kappa (Kw) values were obtained, only the Clos des Mouches failed to reach statistical significance at the .05 level. The other three wines-the U.S. Chateau Montelena, 1973, with a Kw of 0.20; the U.S. 1973 David Bruce regular, with a weighted kappa value of .27 and the U.S. Veedercrest, with one of .33-reached statistical significance at p values of <.05, <.001, and <.0001, respectively. These findings are not weighted kappa specific, and reveal that when sample sizes are large enough, even the most trivial of results will be statistically significant, while often devoid of practical or clinical meaning-fulness. A level of Kw that is clinically meaningful will most likely be statistically significant. But high levels of statistical significance are no guarantee of clinical significance. Methods for resolving this “big N phenomenon” are presented and discussed. (JEL Classification: C12, C49)

Download Full-text

High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis

Human Genomics ◽

10.1186/s40246-021-00308-5 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Weitong Cui ◽

Huaru Xue ◽

Lei Wei ◽

Jinghua Jin ◽

Xuewen Tian ◽

...

Keyword(s):

Gene Expression ◽

Differential Expression ◽

Small Sample ◽

Differentially Expressed ◽

Cancer Type ◽

Rna Seq ◽

Sample Sizes ◽

Large Sample ◽

Expression Levels ◽

Gene Expression Levels

Abstract Background RNA sequencing (RNA-Seq) has been widely applied in oncology for monitoring transcriptome changes. However, the emerging problem that high variation of gene expression levels caused by tumor heterogeneity may affect the reproducibility of differential expression (DE) results has rarely been studied. Here, we investigated the reproducibility of DE results for any given number of biological replicates between 3 and 24 and explored why a great many differentially expressed genes (DEGs) were not reproducible. Results Our findings demonstrate that poor reproducibility of DE results exists not only for small sample sizes, but also for relatively large sample sizes. Quite a few of the DEGs detected are specific to the samples in use, rather than genuinely differentially expressed under different conditions. Poor reproducibility of DE results is mainly caused by high variation of gene expression levels for the same gene in different samples. Even though biological variation may account for much of the high variation of gene expression levels, the effect of outlier count data also needs to be treated seriously, as outlier data severely interfere with DE analysis. Conclusions High heterogeneity exists not only in tumor tissue samples of each cancer type studied, but also in normal samples. High heterogeneity leads to poor reproducibility of DEGs, undermining generalization of differential expression results. Therefore, it is necessary to use large sample sizes (at least 10 if possible) in RNA-Seq experimental designs to reduce the impact of biological variability and DE results should be interpreted cautiously unless soundly validated.

Download Full-text

p-Value, Hypothesis Testing, Strength of Evidence: Comment on “The Role of p-Values in Judging the Strength of Evidence and Realistic Replication Expectations”

Statistics in Biopharmaceutical Research ◽

10.1080/19466315.2020.1811150 ◽

2021 ◽

Vol 13 (1) ◽

pp. 30-31

Author(s):

H. M. James Hung ◽

John Lawrence ◽

Sue-Jane Wang

Keyword(s):

Hypothesis Testing ◽

P Value ◽

P Values ◽

Strength Of Evidence

Download Full-text

SAT0210 FACTORS ASSOCIATED WITH TIME TO SEVERE LUPUS NEPHRITIS IN A COHORT OF COLOMBIAN PATIENTS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.6282 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1048.2-1048

Author(s):

S. Herrera ◽

J. C. Diaz-Coronado ◽

D. Rojas-Gualdrón ◽

L. Betancur-Vasquez ◽

D. Gonzalez-Hurtado ◽

...

Keyword(s):

Lupus Nephritis ◽

Renal Involvement ◽

Classification Criteria ◽

P Value ◽

Systemic Lupus ◽

P Values ◽

Interval Censored ◽

Severe Lupus Nephritis ◽

Age And Sex

Background:Systemic lupus erythematosus (SLE) clinical manifestations, and their severity, vary according to age, ethnicity and socioeconomic status. Both Hispanic and Afro-Americans have a higher incidence and more sever presentation when compared to Caucasian patients with SLEObjectives:To analyze clinical and immunological characteristics associated with time to severe renal involvement in patients with Systemic Lupus Erythematous in a Colombian cohort followed for one year, between January 2015 and December 2018Methods:Retrospective follow-up study based in clinical records. Patients with SLE diagnosis that fulfilled either 1987 American College of Rheumatology Classification Criteria for SLE or 2011 Systemic Lupus International Collaborating Clinics (SLICC) classification criteria for SLE. We included patients with diagnosis of lupus nephritis according to Wallace and Dubois criteria. Patients who did not have at least two follow-up measurements or had a cause of nephritis other than lupus were excluded. The main outcome was defined as time from diagnosis to sever renal involvement defined as creatinine clearance ≤50 ml/min, 24-hour proteinuria ≥3.5 grams o end stage renal disease.We analyzed clinical and immunological characteristics. Descriptive statistical analyses of participant data during the first evaluation are reported as frequencies and percentages for categorical variables, and as medians and interquartile ranges (IQR) for quantitative variables. Age and sex adjusted survival functions and Hazard ratios (HR) with 95% confidence intervals and p-values were estimated using parametric Weibull models por interval-censored data. P values < 0.05 were considered statistically significantResults:548 patients were analyzed: 67 were left-censored as they presented renal involvement at entry, 6 were interval censored as outcome occurred between study visits, and 475 were right-censored as involvement was not registered during follow-up. 529 (96.5%) patients were female, median age at entry was 46 (IQR = 23) and median age to diagnosis was 29.5 (IQR = 20.6). 67% were mestizo, 13% Caucasian and 0.3% Afro-Colombian. Age and sex adjusted variables associated with time to severe lupus nephritis were high blood pressure HR = 3.5 (95%CI 2.2-5.6; p-value <0.001) and Anti-RO (per unit increase) HR = 1.002 (95%CI 1.001-1.004; p-value = 0.04). Figure 1 shows age and sex adjusted survival function.Conclusion:In our cohort the appearance of severe lupus nephritis occurs in less than 15% of patients at 10 years. Both high blood pressure and elevated anti-Ro titers were associated with a higher rate of onset in the presentation of severe lupus nephritis, as seen in some polymorphs of anti Ro.References:Disclosure of Interests:Sebastian Herrera Speakers bureau: academic conference, Juan camilo Diaz-Coronado: None declared, Diego Rojas-Gualdrón: None declared, Laura Betancur-Vasquez: None declared, Daniel Gonzalez-Hurtado: None declared, Juanita Gonzalez-Arango: None declared, laura Uribe-Arango: None declared, Maria Fernanda Saavedra Chacón: None declared, Jorge Lacouture-Fierro: None declared, Santiago Monsalve: None declared, Sebastian Guerra-Zarama: None declared, Juan david Lopez: None declared, Juan david Serna: None declared, Julian Barbosa: None declared, Ana Sierra: None declared, Deicy Hernandez-Parra: None declared, Ricardo Pineda.Tamayo: None declared

Download Full-text

Was This in Your Statistics Textbook? IV. Frequency Data

Experimental Agriculture ◽

10.1017/s0014479700016392 ◽

1989 ◽

Vol 25 (1) ◽

pp. 11-25

Author(s):

D. J. Finney

Keyword(s):

Statistical Analysis ◽

Contingency Tables ◽

The Other ◽

Frequency Data ◽

Significance Tests ◽

Test Statistic ◽

Jumping To Conclusions ◽

Proper Role ◽

Measure Of Association

SUMMARYObservations that are frequencies rather than measurements often call for special types of statistical analysis. This paper comments on circumstances in which methods for one type of data can sensibly be used for the other. A section on two-way contingency tables emphasizes the proper role of χ2 a test statistic but not a measure of association; it mentions the distinction between one-tail and two-tail significance tests and reminds the reader of dangers. Multiway tables bring new complications, and the problems of interactions when additional classificatory factors are explicit or hidden are discussed at some length. A brief outline attempts to show how probit, logit, and similar techniques are related to the analysis of contingency tables. Finally, three unusual examples are described as illustrations of the care that is needed to avoid jumping to conclusions on how frequency data should be analysed.

Download Full-text

A bootstrap method to calculate the p-value of Fisher’s combination for a large number of weakly dependent p-values

Communications in Statistics - Simulation and Computation ◽

10.1080/03610918.2021.1955265 ◽

2021 ◽

pp. 1-8

Author(s):

Jiayan Zhu ◽

Li Ma ◽

Mengying Ni ◽

Zhengbang Li

Keyword(s):

Bootstrap Method ◽

P Value ◽

P Values ◽

Weakly Dependent

Download Full-text