A comparative review of methods for comparing means using partially paired data

2015 ◽  
Vol 26 (3) ◽  
pp. 1323-1340 ◽  
Author(s):  
Beibei Guo ◽  
Ying Yuan

In medical experiments with the objective of testing the equality of two means, data are often partially paired by design or because of missing data. The partially paired data represent a combination of paired and unpaired observations. In this article, we review and compare nine methods for analyzing partially paired data, including the two-sample t-test, paired t-test, corrected z-test, weighted t-test, pooled t-test, optimal pooled t-test, multiple imputation method, mixed model approach, and the test based on a modified maximum likelihood estimate. We compare the performance of these methods through extensive simulation studies that cover a wide range of scenarios with different effect sizes, sample sizes, and correlations between the paired variables, as well as true underlying distributions. The simulation results suggest that when the sample size is moderate, the test based on the modified maximum likelihood estimator is generally superior to the other approaches when the data is normally distributed and the optimal pooled t-test performs the best when the data is not normally distributed, with well-controlled type I error rates and high statistical power; when the sample size is small, the optimal pooled t-test is to be recommended when both variables have missing data and the paired t-test is to be recommended when only one variable has missing data.

1990 ◽  
Vol 15 (3) ◽  
pp. 237-247 ◽  
Author(s):  
Rand R. Wilcox

Let X and Y be dependent random variables with variances σ2x and σ2y. Recently, McCulloch (1987) suggested a modification of the Morgan-Pitman test of Ho: σ2x=σ2y But, as this paper describes, there are situations where McCulloch’s procedure is not robust. A subsample approach, similar to the Box-Scheffe test, is also considered and found to give conservative results, in terms of Type I errors, for all situations considered, but it yields relatively low power. New results on the Sandvik-Olsson procedure are also described, but the procedure is found to be nonrobust in situations not previously considered, and its power can be low relative to the two other techniques considered here. A modification of the Morgan-Pitman test based on the modified maximum likelihood estimate of a correlation is also considered. This last procedure appears to be robust in situations where the Sandvik-Olsson (1982) and McCulloch procedures are robust, and it can have more power than the Sandvik-Olsson. But it too gives unsatisfactory results in certain situations. Thus, in terms of power, McCulloch’s procedure is found to be best, with the advantage of being simple to use. But, it is concluded that, in terms of controlling both Type I and Type II errors, a satisfactory solution does not yet exist.


2021 ◽  
Vol 20 ◽  
pp. 415-430
Author(s):  
Juthaphorn Sinsomboonthong ◽  
Saichon Sinsomboonthong

The proposed estimator, namely weighted maximum likelihood (WML) correlation coefficient, for measuring the relationship between two variables to concern about missing values and outliers in the dataset is presented. This estimator is proven by applying the conditional probability function to take care of some missing values and pay more attention to values near the center. However, outliers in the dataset are assigned a slight weight. These using techniques will give the robust proposed method when the preliminary assumptions are not met data analysis. To inspect about the quality of the proposed estimator, the six methods—WML, Pearson, median, percentage bend, biweight mid, and composite correlation coefficients—are compared the properties in two criteria, i.e. the bias and mean squared error, via the simulation study. The results of generated data are illustrated that the WML estimator seems to have the best performance to withstand the missing values and outliers in dataset, especially for the tiny sample size and large percentage of outliers regardless of missing data levels. However, for the massive sample size, the median correlation coefficient seems to have the good estimator when linear relationship levels between two variables are approximately over 0.4 irrespective of outliers and missing data levels


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Keisuke Ejima ◽  
Andrew Brown ◽  
Daniel Smith ◽  
Ufuk Beyaztas ◽  
David Allison

Abstract Objectives Rigor, reproducibility and transparency (RRT) awareness has expanded over the last decade. Although RRT can be improved from various aspects, we focused on type I error rates and power of commonly used statistical analyses testing mean differences of two groups, using small (n ≤ 5) to moderate sample sizes. Methods We compared data from five distinct, homozygous, monogenic, murine models of obesity with non-mutant controls of both sexes. Baseline weight (7–11 weeks old) was the outcome. To examine whether type I error rate could be affected by choice of statistical tests, we adjusted the empirical distributions of weights to ensure the null hypothesis (i.e., no mean difference) in two ways: Case 1) center both weight distributions on the same mean weight; Case 2) combine data from control and mutant groups into one distribution. From these cases, 3 to 20 mice were resampled to create a ‘plasmode’ dataset. We performed five common tests (Student's t-test, Welch's t-test, Wilcoxon test, permutation test and bootstrap test) on the plasmodes and computed type I error rates. Power was assessed using plasmodes, where the distribution of the control group was shifted by adding a constant value as in Case 1, but to realize nominal effect sizes. Results Type I error rates were unreasonably higher than the nominal significance level (type I error rate inflation) for Student's t-test, Welch's t-test and permutation especially when sample size was small for Case 1, whereas inflation was observed only for permutation for Case 2. Deflation was noted for bootstrap with small sample. Increasing sample size mitigated inflation and deflation, except for Wilcoxon in Case 1 because heterogeneity of weight distributions between groups violated assumptions for the purposes of testing mean differences. For power, a departure from the reference value was observed with small samples. Compared with the other tests, bootstrap was underpowered with small samples as a tradeoff for maintaining type I error rates. Conclusions With small samples (n ≤ 5), bootstrap avoided type I error rate inflation, but often at the cost of lower power. To avoid type I error rate inflation for other tests, sample size should be increased. Wilcoxon should be avoided because of heterogeneity of weight distributions between mutant and control mice. Funding Sources This study was supported in part by NIH and Japan Society for Promotion of Science (JSPS) KAKENHI grant.


Author(s):  
Natcha Mahapoonyanont ◽  
Suwichaya Putuptim

The power of test is the probability that the test rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true. The probability of occurrence of a type I error is modelled on medical research that tried to avoid the type I error, such as testing of new medicines, etc. The statistical significance level must be set to be as small as possible, and the probability of type II error would be considered later. In behavioural sciences and social sciences research, the researcher wants to avoid a type I error by determining the level of statistical significance. There are arguments of statistical significance could affect the errors of the findings. Independent variables may have a real influence on the dependent variables but the researcher could not detect them because of statistical significance was setting at the low level. Therefore, in some situations, more attention should be paid to the occurrence of the type II error, and less interest in type I error. This may demonstrate more realistic and valid results. The objectives of this research were to compare of the power of test on t – test under the condition of different sample size (n; 30, 60, 90), statistical significance (sig; .001, .01, .05), and type of data (real data, transformed data, simulation data (Monte Carlo Simulation Technique)). The research findings provide significant information for researcher that is useful for further research using t-test, to improve the accuracy of research findings.


Author(s):  
Janet L. Peacock ◽  
Sally M. Kerry

Chapter 8 covers analysing matched or paired data, and includes the paired t-test, non-normal data, matched case–control data, cohort data, and further reading.


Methodology ◽  
2011 ◽  
Vol 7 (1) ◽  
pp. 25-38 ◽  
Author(s):  
Wolfgang T. Wiedermann ◽  
Rainer W. Alexandrowicz

For the dependent-samples problem it is known that nonparametric tests such as the Wilcoxon signed-ranks test should be used instead of the paired t-test if the normality assumption is violated. The present study extends a family of tests for correlated samples by incorporating the concept of expected normal scores. This fusion leads to a promising significance test for paired non-normal samples, especially when distributions are highly skewed. In a simulation study we show that this modified normal scores test is robust for a wide range of non-normal distributions. Also, in most situations the test proved more powerful than traditional tests such as the paired t-test, the Wilcoxon signed-ranks test, or the Fraser normal scores test. For skewed distributions the test is also more powerful than applying the modified test to original measures or on ranks.


Author(s):  
Zhiyi Zhang ◽  
Lukun Zheng

AbstractA nonparametric estimator of mutual information is proposed and is shown to have asymptotic normality and efficiency, and a bias decaying exponentially in sample size. The asymptotic normality and the rapidly decaying bias together offer a viable inferential tool for assessing mutual information between two random elements on finite alphabets where the maximum likelihood estimator of mutual information greatly inflates the probability of type I error. The proposed estimator is illustrated by three examples in which the association between a pair of genes is assessed based on their expression levels. Several results of simulation study are also provided.


Vascular ◽  
2020 ◽  
pp. 170853812095883
Author(s):  
Arindam Chaudhuri ◽  
Ayman Badawy

Objectives Aortic endografts used for endovascular aneurysm repair (EVAR) are based on varying skeletal platforms such as stainless steel or nitinol stents, using radial force applied to seal at the aneurysm neck, and varying proximal fixation methods, applying either suprarenal or infrarenal fixation. This study assesses whether varying skeleton/fixation platforms affect neck-related outcomes after primary endostapling with Heli-FX EndoAnchors at EVAR. Methods Retrospective analysis of a prospective database of infrarenal EVAR undertaken at a single centre. Chimney-EVAR, secondary cases were excluded. Primary outcomes analysed included neck diameter evolution from pre-EVAR to latest imaging follow-up, including a comparison of stent platforms to see if there was any outcome difference between those using stainless steel or nitinol, as also freedom from type I endoleakage and migration. Secondary outcomes assessed included average number of EndoAnchors, and sac size patterns before and after EVAR. Results A total of 101 patients underwent endostapled infrarenal EVAR between September 2013 and March 2020. After exclusion of ineligible patients, 84 patients (76 male, 8 female, age 73.7 ± 7.8 years) were available for analysis. 57/27 endografts used suprarenal/infrarenal fixation, whilst 16/68 devices were based on stainless steel/nitinol platforms, respectively. Mean oversizing was higher for stainless steel/suprarenal fixation endografts ( p = 0.02). A total of 582 EndoAnchors were deployed, averaging 7 ± 2 per patient. Median neck diameter was 25 mm (IQR 22–31) with 22 necks having non-parallel morphology (conical, tapered or bubble). Median follow-up period was 28.5 (IQR 12–43) months. Neck evolution studies suggested aortic neck dilatation of 5 ± 4 mm ( p <0.001, paired T-test), independent of platforms employed ( p = NS, ANOVA). There was no endograft migration; one immediate post-EVAR endoleak settled by eight weeks. There was a mean 5.7 ± 8.2 mm sac size reduction ( p < 0.001, paired T-test). Conclusion Aortic neck dilatation occurs after EVAR with primary endostapling, but the process may be independent of stainless steel/nitinol platforms, possibly due to the attenuating effect of EndoAnchors. Adjunct aneurysm neck fixation by primary endostapling prevents migration regardless of whether suprarenal/infrarenal fixation is the primary fixative method. Device platform choice therefore may be left to the operator discretion if primary endostapling is applied at EVAR. Freedom from complications such as migration and endoleakage in the intermediate term suggests a higher level of ‘tolerance’ to aortic neck dilatation with primary endostapling. We would therefore suggest routine usage of EndoAnchors at EVAR when not otherwise contraindicated.


Sign in / Sign up

Export Citation Format

Share Document