scholarly journals Accuracy, Precision, And Agreement Statistical Tests For Bland-Altman Method

Author(s):  
Paulo S.P. Silveira ◽  
Joaquim E. Vieira ◽  
Alexandre A. Ferraro ◽  
Jose O. Siqueira

Abstract Background: Bland and Altman plot method is a widely cited graphical approach to assess equivalence of quantitative measurement techniques. Perhaps due to its graphical output, it has been widely applied, however often misinterpreted by lacking of inferential statistical support. To compare data sets obtained from two measurement techniques, researchers may apply Pearson’s correlation, ordinal least-square linear regression, or the Bland-Altman plot methods, failing to locate the weakness of each measurement technique. We aim to develop and distribute a statistical method in R in order to add robust and suitable inferential statistics of equivalence. Methods: Three nested tests based on structural regressions are proposed to assess the equivalence of structural means (accuracy), equivalence of structural variances (precision), and concordance with the structural bisector line (agreement in measurements of data pairs obtained from the same subject) to reach statistical support for the equivalence of measurement techniques. Graphical outputs illustrating these three tests were added to follow Bland and Altman’s principles of easy communication. Results: Statistical p-values and robust approach by bootstrapping with corresponding graphs provide objective, robust measures of equivalence. Five pairs of data sets were analyzed in order to criticize previously published articles that applied the Bland and Altman’s principles, thus showing the suitability of the present statistical approach. In one case it was demonstrated strict equivalence, three cases showed partial equivalence, and one case showed poor equivalence. Package containing open codes and data is available with installation instructions on SourceForge for free distribution. Conclusions: Statistical p-values and robust approach assess the equivalence of accuracy, precision, and agreement for measurement techniques. Decomposition in three tests helps the location of any disagreement as a means to fix a new technique.

2015 ◽  
Author(s):  
William Poole ◽  
David L. Gibbs ◽  
Ilya Shmulevich ◽  
Brady Bernard ◽  
Theo Knijnenburg

Combining P-values from multiple statistical tests is a common exercise in bioinformatics. However, this procedure is non-trivial for dependent P-values. Here we discuss an empirical adaptation of Brown's Method (an extension of Fisher's Method) for combining dependent P-values which is appropriate for the correlated data sets found in high-throughput biological experiments. We show that Fisher's Method is biased when used on dependent sets of P-values with both simulated data and gene expression data from The Cancer Genome Atlas (TCGA). When applied on the same data sets, the Empirical Brown's Method provides a better null distribution and a more conservative result. The Empirical Brown's Method is available in Python, R, and MATLAB and can be obtained from https://github.com/IlyaLab/CombiningDependentPvaluesUsingEBM.


2020 ◽  
Vol 132 (6) ◽  
pp. 1970-1976
Author(s):  
Ashwin G. Ramayya ◽  
H. Isaac Chen ◽  
Paul J. Marcotte ◽  
Steven Brem ◽  
Eric L. Zager ◽  
...  

OBJECTIVEAlthough it is known that intersurgeon variability in offering elective surgery can have major consequences for patient morbidity and healthcare spending, data addressing variability within neurosurgery are scarce. The authors performed a prospective peer review study of randomly selected neurosurgery cases in order to assess the extent of consensus regarding the decision to offer elective surgery among attending neurosurgeons across one large academic institution.METHODSAll consecutive patients who had undergone standard inpatient surgical interventions of 1 of 4 types (craniotomy for tumor [CFT], nonacute redo CFT, first-time spine surgery with/without instrumentation, and nonacute redo spine surgery with/without instrumentation) during the period 2015–2017 were retrospectively enrolled (n = 9156 patient surgeries, n = 80 randomly selected individual cases, n = 20 index cases of each type randomly selected for review). The selected cases were scored by attending neurosurgeons using a need for surgery (NFS) score based on clinical data (patient demographics, preoperative notes, radiology reports, and operative notes; n = 616 independent case reviews). Attending neurosurgeon reviewers were blinded as to performing provider and surgical outcome. Aggregate NFS scores across various categories were measured. The authors employed a repeated-measures mixed ANOVA model with autoregressive variance structure to compute omnibus statistical tests across the various surgery types. Interrater reliability (IRR) was measured using Cohen’s kappa based on binary NFS scores.RESULTSOverall, the authors found that most of the neurosurgical procedures studied were rated as “indicated” by blinded attending neurosurgeons (mean NFS = 88.3, all p values < 0.001) with greater agreement among neurosurgeon raters than expected by chance (IRR = 81.78%, p = 0.016). Redo surgery had lower NFS scores and IRR scores than first-time surgery, both for craniotomy and spine surgery (ANOVA, all p values < 0.01). Spine surgeries with fusion had lower NFS scores than spine surgeries without fusion procedures (p < 0.01).CONCLUSIONSThere was general agreement among neurosurgeons in terms of indication for surgery; however, revision surgery of all types and spine surgery with fusion procedures had the lowest amount of decision consensus. These results should guide efforts aimed at reducing unnecessary variability in surgical practice with the goal of effective allocation of healthcare resources to advance the value paradigm in neurosurgery.


2008 ◽  
Vol 44-46 ◽  
pp. 871-878 ◽  
Author(s):  
Chu Yang Luo ◽  
Jun Jiang Xiong ◽  
R.A. Shenoi

This paper outlines a new technique to address the paucity of data in determining fatigue life and performance based on reliability concepts. Two new randomized models are presented for estimating the safe life and pS-N curve, by using the standard procedure for statistical analysis and dealing with small sample numbers of incomplete data. The confidence level formulations for the safe and p-S-N curve are also given. The concepts are then applied for the determination of the safe life and p-S-N curve. Two sets of fatigue tests for the safe life and p-S-N curve are conducted to validate the presented method, demonstrating the practical use of the proposed technique.


Author(s):  
Ned Augenblick ◽  
Matthew Rabin

Abstract When a Bayesian learns new information and changes her beliefs, she must on average become concomitantly more certain about the state of the world. Consequently, it is rare for a Bayesian to frequently shift beliefs substantially while remaining relatively uncertain, or, conversely, become very confident with relatively little belief movement. We formalize this intuition by developing specific measures of movement and uncertainty reduction given a Bayesian’s changing beliefs over time, showing that these measures are equal in expectation and creating consequent statistical tests for Bayesianess. We then show connections between these two core concepts and four common psychological biases, suggesting that the test might be particularly good at detecting these biases. We provide support for this conclusion by simulating the performance of our test and other martingale tests. Finally, we apply our test to data sets of individual, algorithmic, and market beliefs.


2016 ◽  
Vol 16 (24) ◽  
pp. 15545-15559 ◽  
Author(s):  
Ernesto Reyes-Villegas ◽  
David C. Green ◽  
Max Priestman ◽  
Francesco Canonaco ◽  
Hugh Coe ◽  
...  

Abstract. The multilinear engine (ME-2) factorization tool is being widely used following the recent development of the Source Finder (SoFi) interface at the Paul Scherrer Institute. However, the success of this tool, when using the a value approach, largely depends on the inputs (i.e. target profiles) applied as well as the experience of the user. A strategy to explore the solution space is proposed, in which the solution that best describes the organic aerosol (OA) sources is determined according to the systematic application of predefined statistical tests. This includes trilinear regression, which proves to be a useful tool for comparing different ME-2 solutions. Aerosol Chemical Speciation Monitor (ACSM) measurements were carried out at the urban background site of North Kensington, London from March to December 2013, where for the first time the behaviour of OA sources and their possible environmental implications were studied using an ACSM. Five OA sources were identified: biomass burning OA (BBOA), hydrocarbon-like OA (HOA), cooking OA (COA), semivolatile oxygenated OA (SVOOA) and low-volatility oxygenated OA (LVOOA). ME-2 analysis of the seasonal data sets (spring, summer and autumn) showed a higher variability in the OA sources that was not detected in the combined March–December data set; this variability was explored with the triangle plots f44 : f43 f44 : f60, in which a high variation of SVOOA relative to LVOOA was observed in the f44 : f43 analysis. Hence, it was possible to conclude that, when performing source apportionment to long-term measurements, important information may be lost and this analysis should be done to short periods of time, such as seasonally. Further analysis on the atmospheric implications of these OA sources was carried out, identifying evidence of the possible contribution of heavy-duty diesel vehicles to air pollution during weekdays compared to those fuelled by petrol.


2021 ◽  
Author(s):  
Luz Karime Atencia ◽  
María Gómez del Campo ◽  
Gema Camacho ◽  
Antonio Hueso ◽  
Ana M. Tarquis

&lt;p&gt;Olive is the main fruit tree in Spain representing 50% of the fruit trees surface, around 2,751,255 ha. Due to its adaptation to arid conditions and the scarcity of water, regulated deficit irrigation (RDI) strategy is normally applied in traditional olive orchards and recently to high density orchards. The application of RDI is one of the most important technique used in the olive hedgerow orchard. An investigation of the detection of water stress in nonhomogeneous olive tree canopies such as orchards using remote sensing imagery is presented.&lt;/p&gt;&lt;p&gt;In 2018 and 2019 seasons, data on stem water potential were collected to characterize tree water state in a hedgerow olive orchard cv. Arbequina located in Chozas de Canales (Toledo). Close to the measurement&amp;#8217;s dates, remote sensing images with spectral and thermal sensors were acquired. Several vegetation indexes (VI) using both or one type of sensors were estimated from the areas selected that correspond to the olive crown avoiding the canopy shadows.&lt;/p&gt;&lt;p&gt;Nonparametric statistical tests between the VIs and the stem water potential were carried out to reveal the most significant correlation. The results will be discussing in the context of robustness and sensitivity between both data sets at different phenological olive state.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;ACKNOWLODGEMENTS&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Financial support provided by the Spanish Research Agency co-financed with European Union FEDER funds (AEI/FEDER, UE, AGL2016-77282-C3-2R project) and Comunidad de Madrid through calls for grants for the completion of Industrial Doctorates, is greatly appreciated.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document