scholarly journals Nonparametric Limits of Agreement for Small to Moderate Sample Sizes: A Simulation Study

Stats ◽  
2020 ◽  
Vol 3 (3) ◽  
pp. 343-355 ◽  
Author(s):  
Maria E. Frey ◽  
Hans C. Petersen ◽  
Oke Gerke

The assessment of agreement in method comparison and observer variability analysis of quantitative measurements is usually done by the Bland–Altman Limits of Agreement, where the paired differences are implicitly assumed to follow a normal distribution. Whenever this assumption does not hold, the 2.5% and 97.5% percentiles are obtained by quantile estimation. In the literature, empirical quantiles have been used for this purpose. In this simulation study, we applied both sample, subsampling, and kernel quantile estimators, as well as other methods for quantile estimation to sample sizes between 30 and 150 and different distributions of the paired differences. The performance of 15 estimators in generating prediction intervals was measured by their respective coverage probability for one newly generated observation. Our results indicated that sample quantile estimators based on one or two order statistics outperformed all of the other estimators and they can be used for deriving nonparametric Limits of Agreement. For sample sizes exceeding 80 observations, more advanced quantile estimators, such as the Harrell–Davis and estimators of Sfakianakis–Verginis type, which use all of the observed differences, performed likewise well, but may be considered intuitively more appealing than simple sample quantile estimators that are based on only two observations per quantile.

Author(s):  
Oke Gerke

Bland–Altman limits of agreement and the underlying plot are a well-established means in method comparison studies on quantitative outcomes. Normally distributed paired differences, a constant bias, and variance homogeneity across the measurement range are implicit assumptions to this end. Whenever these assumptions are not fully met and cannot be remedied by an appropriate transformation of the data or the application of a regression approach, the 2.5% and 97.5% quantiles of the differences have to be estimated nonparametrically. Earlier, a simple Sample Quantile (SQ) estimator (a weighted average of the observations closest to the target quantile), the Harrell–Davis estimator (HD), and estimators of the Sfakianakis–Verginis type (SV) outperformed 10 other quantile estimators in terms of mean coverage for the next observation in a simulation study, based on sample sizes between 30 and 150. Here, we investigate the variability of the coverage probability of these three and another three promising nonparametric quantile estimators with n=50(50)200,250(250)1000. The SQ estimator outperformed the HD and SV estimators for n=50 and was slightly better for n=100, whereas the SQ, HD, and SV estimators performed identically well for n≥150. The similarity of the boxplots for the SQ estimator across both distributions and sample sizes was striking.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Ricardo A. Gonzales ◽  
Felicia Seemann ◽  
Jérôme Lamy ◽  
Per M. Arvidsson ◽  
Einar Heiberg ◽  
...  

Abstract Background Segmentation of the left atrium (LA) is required to evaluate atrial size and function, which are important imaging biomarkers for a wide range of cardiovascular conditions, such as atrial fibrillation, stroke, and diastolic dysfunction. LA segmentations are currently being performed manually, which is time-consuming and observer-dependent. Methods This study presents an automated image processing algorithm for time-resolved LA segmentation in cardiac magnetic resonance imaging (MRI) long-axis cine images of the 2-chamber (2ch) and 4-chamber (4ch) views using active contours. The proposed algorithm combines mitral valve tracking, automated threshold calculation, edge detection on a radially resampled image, edge tracking based on Dijkstra’s algorithm, and post-processing involving smoothing and interpolation. The algorithm was evaluated in 37 patients diagnosed mainly with paroxysmal atrial fibrillation. Segmentation accuracy was assessed using the Dice similarity coefficient (DSC) and Hausdorff distance (HD), with manual segmentations in all time frames as the reference standard. For inter-observer variability analysis, a second observer performed manual segmentations at end-diastole and end-systole on all subjects. Results The proposed automated method achieved high performance in segmenting the LA in long-axis cine sequences, with a DSC of 0.96 for 2ch and 0.95 for 4ch, and an HD of 5.5 mm for 2ch and 6.4 mm for 4ch. The manual inter-observer variability analysis had an average DSC of 0.95 and an average HD of 4.9 mm. Conclusion The proposed automated method achieved performance on par with human experts analyzing MRI images for evaluation of atrial size and function.


2011 ◽  
Vol 2011 ◽  
pp. 1-11 ◽  
Author(s):  
Erol Egrioglu ◽  
Cagdas Hakan Aladag ◽  
Cem Kadilar

Seasonal Autoregressive Fractionally Integrated Moving Average (SARFIMA) models are used in the analysis of seasonal long memory-dependent time series. Two methods, which are conditional sum of squares (CSS) and two-staged methods introduced by Hosking (1984), are proposed to estimate the parameters of SARFIMA models. However, no simulation study has been conducted in the literature. Therefore, it is not known how these methods behave under different parameter settings and sample sizes in SARFIMA models. The aim of this study is to show the behavior of these methods by a simulation study. According to results of the simulation, advantages and disadvantages of both methods under different parameter settings and sample sizes are discussed by comparing the root mean square error (RMSE) obtained by the CSS and two-staged methods. As a result of the comparison, it is seen that CSS method produces better results than those obtained from the two-staged method.


2021 ◽  
Author(s):  
Monsurul Hoq ◽  
Susan Donath ◽  
Paul Monagle ◽  
John Carlin

Abstract Background: Reference intervals (RIs), which are used as an assessment tool in laboratory medicine, change with age for most biomarkers in children. Addressing this, RIs that vary continuously with age have been developed using a range of curve-fitting approaches. The choice of statistical method may be important as different methods may produce substantially different RIs. Hence, we developed a simulation study to investigate the performance of statistical methods for estimating continuous paediatric RIs.Methods: We compared four methods for estimating age-varying RIs. These were Cole’s LMS, the Generalised Additive Model for Location Scale and Shape (GAMLSS), Royston’s method based on fractional polynomials and exponential transformation, and a new method applying quantile regression using power variables in age selected by fractional polynomial regression for the mean. Data were generated using hypothetical true curves based on five biomarkers with varying complexity of association with age, i.e. linear or nonlinear, constant or nonconstant variation across age, and for four sample sizes (100, 200, 400 and 1000). Root mean square error (RMSE) was used as the primary performance measure for comparison. Results: Regression-based parametric methods performed better in most scenarios. Royston’s and the new method performed consistently well in all scenarios for sample sizes of at least 400, while the new method had the smallest average RMSE in scenarios with nonconstant variation across age. Conclusions: We recommend methods based on flexible parametric models for estimating continuous paediatric RIs, irrespective of the complexity of the association between biomarkers and age, for at least 400 samples.


2016 ◽  
Vol 27 (6) ◽  
pp. 1650-1660 ◽  
Author(s):  
Patrick Taffé

Bland and Altman’s limits of agreement have traditionally been used in clinical research to assess the agreement between different methods of measurement for quantitative variables. However, when the variances of the measurement errors of the two methods are different, Bland and Altman’s plot may be misleading; there are settings where the regression line shows an upward or a downward trend but there is no bias or a zero slope and there is a bias. Therefore, the goal of this paper is to clearly illustrate why and when does a bias arise, particularly when heteroscedastic measurement errors are expected, and propose two new plots, the “bias plot” and the “precision plot,” to help the investigator visually and clinically appraise the performance of the new method. These plots do not have the above-mentioned defect and still are easy to interpret, in the spirit of Bland and Altman’s limits of agreement. To achieve this goal, we rely on the modeling framework recently developed by Nawarathna and Choudhary, which allows the measurement errors to be heteroscedastic and depend on the underlying latent trait. Their estimation procedure, however, is complex and rather daunting to implement. We have, therefore, developed a new estimation procedure, which is much simpler to implement and, yet, performs very well, as illustrated by our simulations. The methodology requires several measurements with the reference standard and possibly only one with the new method for each individual.


Author(s):  
Patrick Taffé ◽  
Mingkai Peng ◽  
Vicki Stagg ◽  
Tyler Williamson

Bland and Altman's (1986, Lancet 327: 307–310) limits of agreement have been used in many clinical research settings to assess agreement between two methods of measuring a quantitative characteristic. However, when the variances of the measurement errors of the two methods differ, limits of agreement can be misleading. biasplot implements a new statistical methodology that Taffé (Forthcoming, Statistical Methods in Medical Research) recently developed to circumvent this issue and assess bias and precision of the two measurement methods (one is the reference standard, and the other is the new measurement method to be evaluated). biasplot produces three new plots introduced by Taffé: the “bias plot”, “precision plot”, and “comparison plot”. These help the investigator visually evaluate the performance of the new measurement method. In this article, we introduce the user-written command biasplot and present worked examples using simulated data included with the package. Note that the Taffé method assumes there are several measurements from the reference standard and possibly as few as one measurement from the new method for each individual.


2016 ◽  
Vol 27 (5) ◽  
pp. 1559-1574 ◽  
Author(s):  
Andrew Carkeet ◽  
Yee Teng Goh

Bland and Altman described approximate methods in 1986 and 1999 for calculating confidence limits for their 95% limits of agreement, approximations which assume large subject numbers. In this paper, these approximations are compared with exact confidence intervals calculated using two-sided tolerance intervals for a normal distribution. The approximations are compared in terms of the tolerance factors themselves but also in terms of the exact confidence limits and the exact limits of agreement coverage corresponding to the approximate confidence interval methods. Using similar methods the 50th percentile of the tolerance interval are compared with the k values of 1.96 and 2, which Bland and Altman used to define limits of agreements (i.e. [Formula: see text]+/− 1.96Sd and [Formula: see text]+/− 2Sd). For limits of agreement outer confidence intervals, Bland and Altman’s approximations are too permissive for sample sizes <40 (1999 approximation) and <76 (1986 approximation). For inner confidence limits the approximations are poorer, being permissive for sample sizes of <490 (1986 approximation) and all practical sample sizes (1999 approximation). Exact confidence intervals for 95% limits of agreements, based on two-sided tolerance factors, can be calculated easily based on tables and should be used in preference to the approximate methods, especially for small sample sizes.


2021 ◽  
Author(s):  
Benjamin Kearns ◽  
Matt D. Stevenson ◽  
Kostas Triantafyllopoulos ◽  
Andrea Manca

Abstract BackgroundEstimates of future survival can be a key evidence source when deciding if a medical treatment should be funded. Current practice is to use standard parametric models for generating extrapolations. Several emerging, more flexible, survival models are available which can provide improved within-sample fit. This study aimed to assess if these emerging practice models also provided improved extrapolations.MethodsBoth a simulation study and a case-study were used to assess the goodness of fit of five classes of survival model. These were: current practice models, Royston Parmar models (RPMs), Fractional polynomials (FPs), Generalised additive models (GAMs), and Dynamic survival models (DSMs). The simulation study used a mixture-Weibull model as the data-generating mechanism with varying lengths of follow-up and sample sizes. The case-study was long-term follow-up of a prostate cancer trial. For both studies, models were fit to an early data-cut of the data, and extrapolations compared to the known long-term follow-up.ResultsThe emerging practice models provided better within-sample fit than current practice models. For data-rich simulation scenarios (large sample sizes or long follow-up), the GAMs and DSMs provided improved extrapolations compared with current practice. Extrapolations from FPs were always very poor whilst those from RPMs were similar to current practice. With short follow-up all the models struggled to provide useful extrapolations. In the case-study all the models provided very similar estimates, but extrapolations were all poor as no model was able to capture a turning-point during the extrapolated period. ConclusionsGood within-sample fit does not guarantee good extrapolation performance. Both GAMs and DSMs may be considered as candidate extrapolation models in addition to current practice. Further research into when these flexible models are most useful, and the role of external evidence to improve extrapolations is required.


Sign in / Sign up

Export Citation Format

Share Document