scholarly journals Reporting Standards for a Bland–Altman Agreement Analysis

Author(s):  
Oke Gerke

The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. Seven proposals were identified from a MEDLINE/PubMed search on March 03, 2020, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses.

Diagnostics ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 334 ◽  
Author(s):  
Oke Gerke

The Bland–Altman Limits of Agreement is a popular and widespread means of analyzing the agreement of two methods, instruments, or raters in quantitative outcomes. An agreement analysis could be reported as a stand-alone research article but it is more often conducted as a minor quality assurance project in a subgroup of patients, as a part of a larger diagnostic accuracy study, clinical trial, or epidemiological survey. Consequently, such an analysis is often limited to brief descriptions in the main report. Therefore, in several medical fields, it has been recommended to report specific items related to the Bland–Altman analysis. The present study aimed to identify the most comprehensive and appropriate list of items for such an analysis. Seven proposals were identified from a MEDLINE/PubMed search, three of which were derived by reviewing anesthesia journals. Broad consensus was seen for the a priori establishment of acceptability benchmarks, estimation of repeatability of measurements, description of the data structure, visual assessment of the normality and homogeneity assumption, and plotting and numerically reporting both bias and the Bland–Altman Limits of Agreement, including respective 95% confidence intervals. Abu-Arafeh et al. provided the most comprehensive and prudent list, identifying 13 key items for reporting (Br. J. Anaesth. 2016, 117, 569–575). An exemplification with interrater data from a local study accentuated the straightforwardness of transparent reporting of the Bland–Altman analysis. The 13 key items should be applied by researchers, journal editors, and reviewers in the future, to increase the quality of reporting Bland–Altman agreement analyses.


Author(s):  
Hugh G. Pemberton ◽  
◽  
Olivia Goodkin ◽  
Ferran Prados ◽  
Ravi K. Das ◽  
...  

Abstract Objectives We examined whether providing a quantitative report (QReport) of regional brain volumes improves radiologists’ accuracy and confidence in detecting volume loss, and in differentiating Alzheimer’s disease (AD) and frontotemporal dementia (FTD), compared with visual assessment alone. Methods Our forced-choice multi-rater clinical accuracy study used MRI from 16 AD patients, 14 FTD patients, and 15 healthy controls; age range 52–81. Our QReport was presented to raters with regional grey matter volumes plotted as percentiles against data from a normative population (n = 461). Nine raters with varying radiological experience (3 each: consultants, registrars, ‘non-clinical image analysts’) assessed each case twice (with and without the QReport). Raters were blinded to clinical and demographic information; they classified scans as ‘normal’ or ‘abnormal’ and if ‘abnormal’ as ‘AD’ or ‘FTD’. Results The QReport improved sensitivity for detecting volume loss and AD across all raters combined (p = 0.015* and p = 0.002*, respectively). Only the consultant group’s accuracy increased significantly when using the QReport (p = 0.02*). Overall, raters’ agreement (Cohen’s κ) with the ‘gold standard’ was not significantly affected by the QReport; only the consultant group improved significantly (κs 0.41➔0.55, p = 0.04*). Cronbach’s alpha for interrater agreement improved from 0.886 to 0.925, corresponding to an improvement from ‘good’ to ‘excellent’. Conclusion Our QReport referencing single-subject results to normative data alongside visual assessment improved sensitivity, accuracy, and interrater agreement for detecting volume loss. The QReport was most effective in the consultants, suggesting that experience is needed to fully benefit from the additional information provided by quantitative analyses. Key Points • The use of quantitative report alongside routine visual MRI assessment improves sensitivity and accuracy for detecting volume loss and AD vs visual assessment alone. • Consultant neuroradiologists’ assessment accuracy and agreement (kappa scores) significantly improved with the use of quantitative atrophy reports. • First multi-rater radiological clinical evaluation of visual quantitative MRI atrophy report for use as a diagnostic aid in dementia.


2019 ◽  
Vol 12 (12) ◽  
pp. 6273-6301
Author(s):  
Edward Malina ◽  
Haili Hu ◽  
Jochen Landgraf ◽  
Ben Veihelmann

Abstract. Retrievals of methane isotopologues have the potential to differentiate between natural and anthropogenic methane sources types, which can provide much needed information about the current global methane budget. We investigate the feasibility of retrieving the second most abundant isotopologue of atmospheric methane (13CH4, roughly 1.1 % of total atmospheric methane) from the shortwave infrared (SWIR) channels of the future Sentinel-5/ultra-violet, visible, near-infrared, shortwave infrared (UVNS) and current Copernicus Sentinel-5 Precursor TROPOspheric Monitoring Instrument (TROPOMI) instruments. With the intended goal of calculating the δ13C value, we assume that a δ13C uncertainty of better than 1 ‰ is sufficient to differentiate between source types, which corresponds to a 13CH4 uncertainty of <0.02 ppb. Using the well-established information content analysis techniques and assuming clear-sky, non-scattering conditions, we find that the SWIR3 (2305–2385 nm) channel on the TROPOMI instrument can achieve a mean uncertainty of <1 ppb, while the SWIR1 channel (1590–1675 nm) on the Sentinel-5 UVNS instrument can achieve <0.68 ppb or <0.2 ppb in high signal-to-noise ratio (SNR) cases. These uncertainties combined with significant spatial and/or temporal averaging techniques can reduce δ13C uncertainty to the target magnitude or better. However, we find that 13CH4 retrievals are highly sensitive to errors in a priori knowledge of temperature and pressure, and accurate knowledge of these profiles is required before 13CH4 retrievals can be performed on TROPOMI and future Sentinel-5/UVNS data. In addition, we assess the assumption that scattering-induced light path errors are cancelled out by comparing the δ13C values calculated for non-scattering and scattering scenarios. We find that there is a minor bias in δ13C values from scattering and non-scattering retrievals, but this is unrelated to scattering-induced errors.


2008 ◽  
Vol 8 (12) ◽  
pp. 3081-3092 ◽  
Author(s):  
S. S. Kulawik ◽  
K. W. Bowman ◽  
M. Luo ◽  
C. D. Rodgers ◽  
L. Jourdain

Abstract. Non-linear maximum a posteriori (MAP) estimates of atmospheric profiles from the Tropospheric Emission Spectrometer (TES) contains a priori information that may vary geographically, which is a confounding factor in the analysis and physical interpretation of an ensemble of profiles. One mitigation strategy is to transform profile estimates to a common prior using a linear operation thereby facilitating the interpretation of profile variability. However, this operation is dependent on the assumption of not worse than moderate non-linearity near the solution of the non-linear estimate. The robustness of this assumption is tested by comparing atmospheric retrievals from the Tropospheric Emission Spectrometer processed with a uniform prior with those processed with a variable prior and converted to a uniform prior following the non-linear retrieval. Linearly converting the prior following a non-linear retrieval is shown to have a minor effect on the results as compared to a non-linear retrieval using a uniform prior when compared to the expected total error, with less than 10% of the change in the prior ending up as unbiased fluctuations in the profile estimate results.


1979 ◽  
Vol 10 (1) ◽  
pp. 159-174 ◽  
Author(s):  
Stephen O'Harrow

The question of a national identity for Vietnam has long plagued historians, both Vietnamese and foreign. Some see Vietnam throughout its pre-modern history as a minor appendage of the Chinese Empire, one whose culture and institutions are so thoroughly influenced by the Chinese tradition that they evade meaningful individual scrutiny. A few apply the tools of Sinology in such a way as to reach conclusions which, while cogent in themselves, cannot escape the confines of their methodology. Others, including a majority of scholars from Vietnam itself, reject the former view and are continuously searching for evidence to demonstrate the uniqueness of the Vietnamese experience. There is little merit in the a priori assumptions of either school, but this does not invalidate the question. It would be of particular interest to know not simply whether some significant differences existed between Vietnamese and Chinese institutions at various points throughout history but whether these institutional differences had a significant bearing on a sense of nationalism and whether such differences resulted at least partially from a selfconception on the part of Vietnamese thinkers, one consciously held and pursued. The Binh Ngo Dai Cao provides us with some intriguing clues. It is, as well, a narrative document of great literary worth and the subject of constant allusion, the background of which could bear illumination for purely historical interest.


2013 ◽  
Vol 13 (19) ◽  
pp. 9771-9788 ◽  
Author(s):  
M. Inoue ◽  
I. Morino ◽  
O. Uchino ◽  
Y. Miyamoto ◽  
Y. Yoshida ◽  
...  

Abstract. Column-averaged dry air mole fractions of carbon dioxide (XCO2) retrieved from Greenhouse gases Observing SATellite (GOSAT) Short-Wavelength InfraRed (SWIR) observations were validated with aircraft measurements by the Comprehensive Observation Network for TRace gases by AIrLiner (CONTRAIL) project, the National Oceanic and Atmospheric Administration (NOAA), the US Department of Energy (DOE), the National Institute for Environmental Studies (NIES), the HIAPER Pole-to-Pole Observations (HIPPO) program, and the GOSAT validation aircraft observation campaign over Japan. To calculate XCO2 based on aircraft measurements (aircraft-based XCO2), tower measurements and model outputs were used for additional information near the surface and above the tropopause, respectively. Before validation, we investigated the impacts of GOSAT SWIR column averaging kernels (CAKs) and the shape of a priori profiles on the aircraft-based XCO2 calculation. The differences between aircraft-based XCO2 with and without the application of GOSAT CAK were evaluated to be less than ±0.4 ppm at most, and less than ±0.1 ppm on average. Therefore, we concluded that the GOSAT CAK produces only a minor effect on the aircraft-based XCO2 calculation in terms of the overall uncertainty of GOSAT XCO2. We compared GOSAT data retrieved within ±2 or ±5° latitude/longitude boxes centered at each aircraft measurement site to aircraft-based data measured on a GOSAT overpass day. The results indicated that GOSAT XCO2 over land regions agreed with aircraft-based XCO2, except that the former is biased by −0.68 ppm (−0.99 ppm) with a standard deviation of 2.56 ppm (2.51 ppm), whereas the averages of the differences between the GOSAT XCO2 over ocean and the aircraft-based XCO2 were −1.82 ppm (−2.27 ppm) with a standard deviation of 1.04 ppm (1.79 ppm) for ±2° (±5°) boxes.


2013 ◽  
Vol 21 (01) ◽  
pp. 19-43 ◽  
Author(s):  
EMIEL L. EIJDENBERG ◽  
ENNO MASUREL

The objective of this study is to explore entrepreneurial motivation in a least developed country (LDC), which can be divided into push factors and pull factors, without a priori separation between those which are necessity-driven and those which are opportunity-driven. This study shows that the premise "For people who start their own business in an LDC, push factors are more important than pull factors" can be rejected. In contrast to the findings from prior studies on entrepreneurship in LDCs, this study shows that push factors and pull factors are not mutually exclusive. In addition, this study shows that pull factors are even more important than push factors, and that therefore push factors only play a minor role for entrepreneurs. The overall implications are that motivation is a more combined, and nuanced construct, and that the Western concept of entrepreneurial motivation and method of measuring entrepreneurial motivation, are globally applicable.


2013 ◽  
Vol 13 (2) ◽  
pp. 3203-3246 ◽  
Author(s):  
M. Inoue ◽  
I. Morino ◽  
O. Uchino ◽  
Y. Miyamoto ◽  
Y. Yoshida ◽  
...  

Abstract. Column-averaged volume mixing ratios of carbon dioxide (XCO2) retrieved from Greenhouse gases Observing SATellite (GOSAT) Short-Wavelength InfraRed (SWIR) observations were compared with aircraft measurements by the Comprehensive Observation Network for TRace gases by AIrLiner (CONTRAIL) project, the National Oceanic and Atmospheric Administration (NOAA), and the National Institute for Environmental Studies (NIES). Before validation, we investigated the impacts of GOSAT SWIR column averaging kernels (CAK) and the shape of a priori profiles on the calculation of XCO2 based on aircraft measurements (aircraft-based XCO2). The differences between aircraft-based XCO2 with and without the application of GOSAT CAK were evaluated to be less than ±0.4 ppm at most, and less than 0.1 ppm on average. Therefore, we concluded that the GOSAT CAK produces only a minor effect on the aircraft-based XCO2 calculation in terms of the overall uncertainty of GOSAT XCO2. In this study, two approaches were used to validate GOSAT products (Ver. 02.00). First, we performed a comparison of GOSAT data retrieved within ±2-degree or ±5-degree latitude/longitude boxes centered at each aircraft measurement site and aircraft-based data measured on a GOSAT overpass day (i.e. extraction of temporally matched cases). As this method resulted in no matched data for observation sites where no aircraft measurement was made on the GOSAT overpass day, we also attempted to validate GOSAT products by gap-filling the aircraft-based XCO2 time series through curve fitting. Both methods indicated that GOSAT XCO2 agreed well with aircraft-based XCO2, except that the former is negatively biased by 1–2 ppm with a standard deviation of 1–3 ppm.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0244750
Author(s):  
Diego A. Caraballo ◽  
María E. Montani ◽  
Leila M. Martínez ◽  
Leandro R. Antoniazzi ◽  
Tomás C. Sambrana ◽  
...  

Bats are among the most diverse, widespread, and abundant mammals. In Argentina, 67 species of bats have been recorded, belonging to 5 families and 29 genera. These high levels of biodiversity are likely to complicate identification at fieldwork, especially between closely related species, where external morphology-based approaches are the only immediate means for a priori species assignment. The use of molecular markers can enhance species identification, and acquires particular relevance in capture-release studies. In this study, we discuss the extent of the use of the mitochondrial cytochrome b gene for species identification, comparing external morphology identification with a molecular phylogenetic classification based on this marker, under the light of current bat systematics. We analyzed 33 samples collected in an eco-epidemiological survey in the province of Santa Fe (Argentina). We further sequenced 27 museum vouchers to test the accuracy of cytochrome b -based phylogenies in taxonomic identification of bats occurring in the Pampean/Chacoan regions of Argentina. The cytochrome b gene was successfully amplified in all Molossid and Vespertilionid species except for Eptesicus, for which we designed a new reverse primer. The resulting Bayesian phylogeny was congruent with current systematics. Cytochrome b proved useful for species-level delimitation in non-conflicting genera (Eumops, Dasypterus, Molossops) and has infrageneric resolution in more complex lineages (Eptesicus, Myotis, Molossus). We discuss four sources of incongruence that may act separately or in combination: 1) molecular processes, 2) biology, 3) limitations in identification, and 4) errors in the current taxonomy. The present study confirms the general applicability of cytochrome b -based phylogenies in eco-epidemiological studies, but its resolution and reliability depend mainly, but not solely, on the level of genetic differentiation within each bat genus.


2016 ◽  
Vol 115 (7) ◽  
pp. 1273-1280 ◽  
Author(s):  
Marijka J. Batterham ◽  
Christel Van Loo ◽  
Karen E. Charlton ◽  
Dylan P. Cliff ◽  
Anthony D. Okely

AbstractThe aim of this study was to demonstrate the use of testing for equivalence in combination with the Bland and Altman method when assessing agreement between two dietary methods. A sample data set, with eighty subjects simulated from previously published studies, was used to compare a FFQ with three 24 h recalls (24HR) for assessing dietary I intake. The mean I intake using the FFQ was 126·51 (sd 54·06) µg and using the three 24HR was 124·23 (sd 48·62) µg. The bias was −2·28 (sd 43·93) µg with a 90 % CI 10·46, 5·89 µg. The limits of agreement (LOA) were −88·38, 83·82 µg. Four equivalence regions were compared. Using the conventional 10 % equivalence range, the methods are shown to be equivalent both by using the CI (−12·4, 12·4 µg) and the two one-sided tests approach (lower t=−2·99 (79 df), P=0·002; upper t=2·06 (79 df), P=0·021). However, we make a case that clinical decision making should be used to set the equivalence limits, and for nutrients where there are potential issues with deficiency or toxicity stricter criteria may be needed. If the equivalence region is lowered to ±5 µg, or ±10 µg, these methods are no longer equivalent, and if a wider limit of ±15 µg is accepted they are again equivalent. Using equivalence testing, acceptable agreement must be assessed a priori and justified; this makes the process of defining agreement more transparent and results easier to interpret than relying on the LOA alone.


Sign in / Sign up

Export Citation Format

Share Document