test variability
Recently Published Documents


TOTAL DOCUMENTS

97
(FIVE YEARS 12)

H-INDEX

15
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Ibukun Oloruntoba ◽  
Toan D Nguyen ◽  
Zongyuan Ge ◽  
Tine Vestergaard ◽  
Victoria Mar

BACKGROUND Convolutional neural networks (CNNs) are a type of artificial intelligence that show promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image data sets of varying quality and image capture standardization. OBJECTIVE The aim of our study is to use CNN models with the same architecture, but different training image sets, and test variability in performance when classifying skin cancer images in different populations, acquired with different devices. Additionally, we wanted to assess the performance of the models against Danish teledermatologists when tested on images acquired from Denmark. METHODS Three CNNs with the same architecture were trained. CNN-NS was trained on 25,331 nonstandardized images taken from the International Skin Imaging Collaboration using different image capture devices. CNN-S was trained on 235,268 standardized images, and CNN-S2 was trained on 25,331 standardized images (matched for number and classes of training images to CNN-NS). Both standardized data sets (CNN-S and CNN-S2) were provided by Molemap using the same image capture device. A total of 495 Danish patients with 569 images of skin lesions predominantly involving Fitzpatrick skin types II and III were used to test the performance of the models. Four teledermatologists independently diagnosed and assessed the images taken of the lesions. Primary outcome measures were sensitivity, specificity, and area under the curve of the receiver operating characteristic (AUROC). RESULTS A total of 569 images were taken from 495 patients (n=280, 57% women, n=215, 43% men; mean age 55, SD 17 years) for this study. On these images, CNN-S achieved an AUROC of 0.861 (95% CI 0.830-0.889; <i>P</i>&lt;.001), and CNN-S2 achieved an AUROC of 0.831 (95% CI 0.798-0.861; <i>P</i>=.009), with both outperforming CNN-NS, which achieved an AUROC of 0.759 (95% CI 0.722-0.794; <i>P</i>&lt;.001; <i>P</i>=.009). When the CNNs were matched to the mean sensitivity and specificity of the teledermatologists, the model’s resultant sensitivities and specificities were surpassed by the teledermatologists. However, when compared to CNN-S, the differences were not statistically significant (<i>P</i>=.10; <i>P</i>=.05). Performance across all CNN models and teledermatologists was influenced by the image quality. CONCLUSIONS CNNs trained on standardized images had improved performance and therefore greater generalizability in skin cancer classification when applied to an unseen data set. This is an important consideration for future algorithm development, regulation, and approval. Further, when tested on these unseen test images, the teledermatologists <i>clinically</i> outperformed all the CNN models; however, the difference was deemed to be statistically insignificant when compared to CNN-S.


2021 ◽  
Author(s):  
Ibukun Oloruntoba ◽  
Tine Vestergaard ◽  
Toan D Nguyen ◽  
Zongyuan Ge ◽  
Victoria Mar

BACKGROUND Convolutional neural networks (CNNs) are a type of artificial intelligence (AI) which show promise as a diagnostic aid for skin cancer. However, the majority are trained using retrospective image datasets of varying quality and image capture standardisation. OBJECTIVE The objective of our study was to use CNN models with the same architecture, but different training image sets, and test variability in performance when classifying skin cancer images in different populations, acquired with different devices. Additionally, we wanted to assess the performance of the models against Danish tele-dermatologists, when tested on images acquired from Denmark. METHODS Three CNNs with the same architecture were trained. CNN-NS was trained on 25,331 non- standardised images taken from the International Skin Imaging Collaboration using different image capture devices. CNN-S was trained on 235,268 standardised images and CNN-S2 was trained on 25,331 standardised images (matched for number and classes of training images to CNN-NS). Both standardised datasets (CNN-S and CNN-S2) were provided by Molemap using the same image capture device. 495 Danish patients with 569 images of skin lesions predominantly involving Fitzpatrick's skin types II and III were used to test the performance of the models. 4 tele-dermatologists independently diagnosed and assessed the images taken of the lesions. Primary outcome measures were sensitivity, specificity and area under the curve of the receiver operating characteristic (AUROC). RESULTS 569 images were taken from 495 patients (280 women [57%], 215 men [43%]; mean age 55 years [17 SD]) for this study. On these images, CNN-S achieved an AUROC of 0.861 (CI 0.830 – 0.889; P=.001) and CNN-S2 achieved an AUROC of 0.831 (CI 0.798 – 0.861; P=.009), with both outperforming CNN-NS, which achieved an AUROC of 0.759 (CI 0.722 – 0.794; P=.001, P=.009) (Figure 1). When the CNNs were matched to the mean sensitivity and specificity of the tele-dermatologists, the model’s resultant sensitivities and specificities were surpassed by the tele-dermatologists (Table 1). However, when compared to CNN-S, the differences were not statistically significant (P=.10, P=.053). Performance across all CNN models as well as tele- dermatologists was influenced by image quality. CONCLUSIONS CNNs trained on standardised images had improved performance and therefore greater generalisability in skin cancer classification when applied to an unseen dataset. This is an important consideration for future algorithm development, regulation and approval. Further, when tested on these unseen test images, the tele-dermatologists ‘clinically’ outperformed all the CNN models; however, the difference was deemed to be statistically insignificant when compared to CNN-S. CLINICALTRIAL This retrospective diagnostic comparative study was approved by the Monash University Human Ethics Committee, Melbourne, Australia (Project ID: 28130).


2021 ◽  
Vol 10 (1) ◽  
pp. 26
Author(s):  
Giovanni Montesano ◽  
Timos K. Naska ◽  
Bethany E. Higgins ◽  
David M. Wright ◽  
Ruth E. Hogg ◽  
...  

2021 ◽  
Vol 22 (Supplement_1) ◽  
Author(s):  
S Unlu ◽  
O Mirea ◽  
S Bezy ◽  
J Duchenne ◽  
ED Pagourelias ◽  
...  

Abstract Funding Acknowledgements Type of funding sources: None. Background Vendors use proprietary speckle tracking software algorithms for echocardiographic strain measurements, which results in high inter-vendor variability. Little is known about potential advantages or disadvantages of using vendor-independent software in clinical practice. Purpose We therefore investigated the reproducibility, accuracy, and ability to identify scar of strain measurements on images from different vendors by using a vendor-independent software. Methods A vendor-independent software (TomTec Image Arena) was used to analyze datasets of 63 patients which were obtained on four ultrasound machines from different vendors (GE, Philips, Siemens, Toshiba). We measured the tracking feasibility, inter-vendor bias, the relative and absolute test-re-test variability of strain measurements and their ability to detect scar. Cardiac magnetic resonance delayed enhancement images were used as the reference standard of scar definition. Results Tracking feasibility differed depending on the image source (p &lt; 0.05). Variability of global longitudinal strain (GLS) (Figure 1A) was similar (ANOVA p = 0.124) among the images of different vendors whereas variability of segmental longitudinal strain (SLS) (Figure 1B) showed modest difference (ANOVA- peak systolic strain (PS); p = 0.077, end-systolic strain (ES); p = 0.171, post-systolic strain (PSS); p = 0.020). Relative test-re-test variability of GLS showed no differences (ANOVA p = 0.360). Absolute test-re-test errors of SLS measurements showed modest differences among images of different vendors (ANOVA- PS; p = 0.018, ES; p = 0.001, PSS; p = 0.090). No relevant difference in scar detection capability was observed (Figure 1C). Conclusions Vendor independent software leads to low bias among strain measurements on images from different vendors. Likewise, measurement variability and the ability to identify scar becomes similar. Our findings suggest that a vendor independent speckle tracking software could help to overcome inter-vendor bias. To which extend such measurements would be more accurate compared to vendor specific software remains to be determined. Abstract Figure 1


2020 ◽  
Author(s):  
Sung-Won Chae ◽  
Jae-Jun Song ◽  
Woo-Sub Kim

Abstract Although symptoms of unilateral vestibular neuritis (uVN) resolve spontaneously, there is unclearness in the recovery of gait. Prospective longitudinal studies on gait parameters after uVN are lacking. In this study, twenty three participants with uVN and 20 controls were included. 3D gait analyses were conducted thrice after uVN onset. From the gait analysis data, spatio-temporal parameters, inclination angle (IA) representing the relationship between CoM and CoP in the frontal plane, and IA variability were obtained. Time effects on gait metrics were tested. Walking speed improved significantly between the 1st and 3rd tests, but they were within normal range, even in the 1st test. The step width of participants with uVN was significantly larger than that of control in the 1st test and improved to normal in the 2nd test. Variability of IA in affected side was significantly larger than that in controls in the 1st test and improved significantly in the 3rd test compared to the 1st test. Improvement of overall gait function and neural adaptation of mediolateral stability during gait continued during the recovery stage of uVN (after two months from onset). Rehabilitation intervention should be continued during the recovery stage of uVN to enhance appropriate adaptation in gait.


2020 ◽  
Vol 55 ◽  
pp. 101361
Author(s):  
Eyal Ben Dori ◽  
Carmit Avnon Ziv ◽  
Adi Auerbach ◽  
Yael Greenberg ◽  
Hagit Zaken ◽  
...  

Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Judith A Hsia ◽  
Marc P Bonaca ◽  
Robin White ◽  
Victoria Anderson ◽  
William R Hiatt

Background: The 6-minute walk test (6MWT) is well established for evaluation of functional exercise capacity in patients with conditions such as pulmonary hypertension, peripheral arterial disease and heart failure. Its popularity as an endpoint in heart failure trials has increased in parallel with health authority acceptance of the test as a measure of patients' function. Minimizing variability is key to the successful conduct and outcome of trials with 6MWT endpoints. We assessed the impact on walking distance variability of a structured training and monitoring program Methods: After systematically observing conduct of 6MWT worldwide, our core lab developed a multifaceted approach including inspection and standardization of the walking course, standardized training, review of the first 3 tests for each test administrator and random tests thereafter, standardized data collection methods, and assessment of intra-test inconsistencies with feedback. Variability of walking distance using this structured approach is descriptively compared with 6MWT data from the literature. Results: In a multicenter trial which used the structured program, the standard deviation (SD) of distance walked was 21.7% of the mean at baseline and 22.6% at Week 4 (Table). For comparison, we reviewed 2018-19 reports of 6MWT not utilizing this structured approach and identified 5 multicenter studies of patients with heart failure which reported mean and SD of distance walked (Table). Baseline distance walked ranged from 104 to 385 m (weighted mean 220.4 m); SD of distance walked ranged from 28% to 135% of distance walked (weighted mean 70.9%). Conclusion: Standardization of the 6MWT walking course, structured training of test administrators and monitoring of test quality may reduce test variability which could improve accuracy of treatment effect assessment and possibly require smaller sample sizes.


2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Maria Emília V. Guimarães ◽  
Carolina P. B. Gracitelli ◽  
Syril Dorairaj ◽  
Fábio N. Kanadani ◽  
Tiago S. Prata

Purpose. To evaluate factors associated with midterm visual field (VF) variability in stable glaucoma patients in Brazil. Methods. This retrospective observational study included 59 eyes of 39 stable glaucoma patients. Baseline data assessed were age, gender, educational level, intraocular pressure (IOP), central corneal thickness, best-corrected visual acuity, spherical equivalent, number of hypotensive eye drops, type of glaucoma, number of VFs performed, follow-up in years, lens status, visual field index (VFI) values from the last 5 VF (standard automated perimetry (SAP)) tests, the presence or absence of central scotoma in the VF test, and the level of glaucomatous damage according to the VF mean deviation (MD) index of the last VFs. The 5 latest VFI scores were used to calculate the mean, the standard deviation (SD), and the coefficient of variation (CV). We divided the eyes into 2 groups, being group 1 comprised by the 29 eyes presenting the lowest CV values and group 2 comprised by the 30 eyes presenting the highest CV values. GEE models were used to compare the CV and demographic and clinical parameters of all participants. Results. Mean age of all subjects was 65.8 ± 10.1 years. 54.0% were women. Average SAP MD values for groups 1 and 2 were −2.8 ± 3.1 dB and −6.2 ± 4.1 dB, respectively (P=0.006). Average SAP VFI values for groups 1 and 2 were 95.6 ± 5.9% and 85.9 ± 11.3%, respectively (P=0.002). There was a statistically significant association between CV and SAP MD values (P=0.006). A worse SAP MD and VFI were associated with a higher CV. In addition, even adjusting for potential confounding factors (age and level of education), the association between CV and the SAP MD and between CV and VFI remained significant (P≤0.010). Conclusion. Glaucomatous patients with worse VF sensitivity scores (both MD and VFI indices) present higher VF test variability.


2019 ◽  
Vol 105 (3) ◽  
pp. e477-e483 ◽  
Author(s):  
Moe Thuzar ◽  
Karen Young ◽  
Ashraf H Ahmed ◽  
Greg Ward ◽  
Martin Wolley ◽  
...  

Abstract Background In primary aldosteronism (PA), excessive, autonomous secretion of aldosterone is not suppressed by salt loading or fludrocortisone. For seated saline suppression testing (SSST), the recommended diagnostic cutoff 4-hour plasma aldosterone concentration (PAC) measured by high-performance liquid chromatography–mass spectrometry (HPLC-MS/MS is 162 pmol/L. Most diagnostic laboratories, however, use immunoassays to measure PAC. The cutoff for SSST using immunoassay is not known. We hypothesized that the cutoff is different between the assays. Methods We analyzed 80 of the 87 SSST tests that were performed during our recent study defining the HPLC-MS/MS cutoff. PA was confirmed in 65 by positive fludrocortisone suppression testing (FST) and/or lateralization on adrenal venous sampling and excluded in 15 by negative FST. PAC was measured by a chemiluminescence immunoassay (PACIA) in the SSST samples using the DiaSorin Liaison XL analyzer, and receiver operating characteristics (ROC) analysis was performed to identify the PACIA cutoff. Results ROC revealed good performance (area under the curve = 0.893; P &lt; .001) of 4-hour postsaline PACIA for diagnosis of PA and an optimal diagnostic cutoff of 171 pmol/L, with sensitivity and specificity of 95.4% and 80.0%, respectively. A higher cutoff of 217 pmol/L improved specificity (86.7%) with lower sensitivity (86.2%). PACIA measurements strongly correlated with PAC measured by HPLC-MS (r = 0.94, P &lt; .001). Conclusions A higher diagnostic cutoff for SSST should be employed when PAC is measured by immunoassay rather than HPLC-MS/MS. The results suggest that (i) PA can be excluded if 4-hour PACIA is less than 171 pmol/L, and (ii) PA is highly likely if the PACIA is greater than 217 pmol/L by chemiluminescence immunoassay. A gray zone exists between the cutoffs of 171 and 217 pmol/L, likely reflecting a lower specificity of immunoassay.


Sign in / Sign up

Export Citation Format

Share Document