scholarly journals Accuracy and Reproducibility of a Software Prototype for Semi-Automated Computer-Aided Volumetry of the solid and subsolid Components of part-solid Pulmonary Nodules

Author(s):  
Sebastian Werner ◽  
Regina Gast ◽  
Rainer Grimmer ◽  
Andreas Wimmer ◽  
Marius Horger

Purpose To test the accuracy and reproducibility of a software prototype for semi-automated computer-aided volumetry (CAV) of part-solid pulmonary nodules (PSN) with separate segmentation of the solid part. Materials and Methods 66 PSNs were retrospectively identified in 34 thin-slice unenhanced chest CTs of 19 patients. CAV was performed by two medical students. Manual volumetry (MV) was carried out by two radiology residents. The reference standard was determined by an experienced radiologist in consensus with one of the residents. Visual assessment of CAV accuracy was performed. Measurement variability between CAV/MV and the reference standard as a measure of accuracy, CAV inter- and intra-rater variability as well as CAV intrascan variability between two recontruction kernels was determined via the Bland-Altman method and intraclass correlation coefficients (ICC). Results Subjectively assessed accuracy of CAV/MV was 77 %/79 %–80 % for the solid part and 67 %/73 %–76 % for the entire nodule. Measurement variability between CAV and the reference standard ranged from –151–117 % for the solid part and –106–54 % for the entire nodule. Interrater variability was –16–16 % for the solid part (ICC 0.998) and –102–65 % for the entire nodule (ICC 0.880). Intra-rater variability was –70–49 % for the solid part (ICC 0.992) and –111–31 % for the entire nodule (ICC 0.929). Intrascan variability between the smooth and the sharp reconstruction kernel was –45–39 % for the solid part and –21–46 % for the entire nodule. Conclusion Although the software prototype delivered satisfactory results when segmentation is evaluated subjectively, quantitative statistical analysis revealed room for improvement especially regarding the segmentation accuracy of the solid part and the reproducibility of measurements of the nodule’s subsolid margins. Key points: 

Data Mining ◽  
2013 ◽  
pp. 1794-1818
Author(s):  
William H. Horsthemke ◽  
Daniela S. Raicu ◽  
Jacob D. Furst ◽  
Samuel G. Armato

Evaluating the success of computer-aided decision support systems depends upon a reliable reference standard, a ground truth. The ideal gold standard is expected to result from the marking, labeling, and rating by domain experts of the image of interest. However experts often disagree, and this lack of agreement challenges the development and evaluation of image-based feature prediction of expert-defined “truth.” The following discussion addresses the success and limitation of developing computer-aided models to characterize suspicious pulmonary nodules based upon ratings provided by multiple expert radiologists. These prediction models attempt to bridge the semantic gap between images and medically-meaningful, descriptive opinions about visual characteristics of nodules. The resultant computer-aided diagnostic characterizations (CADc) are directly usable for indexing and retrieving in content-based medical image retrieval and supporting computer-aided diagnosis. The predictive performance of CADc models are directly related to the extent of agreement between radiologists; the models better predict radiologists’ opinions when radiologists agree more with each other about the characteristics of nodules.


Author(s):  
William H. Horsthemke ◽  
Daniela S. Raicu ◽  
Jacob D. Furst ◽  
Samuel G. Armato

Evaluating the success of computer-aided decision support systems depends upon a reliable reference standard, a ground truth. The ideal gold standard is expected to result from the marking, labeling, and rating by domain experts of the image of interest. However experts often disagree, and this lack of agreement challenges the development and evaluation of image-based feature prediction of expert-defined “truth.” The following discussion addresses the success and limitation of developing computer-aided models to characterize suspicious pulmonary nodules based upon ratings provided by multiple expert radiologists. These prediction models attempt to bridge the semantic gap between images and medically-meaningful, descriptive opinions about visual characteristics of nodules. The resultant computer-aided diagnostic characterizations (CADc) are directly usable for indexing and retrieving in content-based medical image retrieval and supporting computer-aided diagnosis. The predictive performance of CADc models are directly related to the extent of agreement between radiologists; the models better predict radiologists’ opinions when radiologists agree more with each other about the characteristics of nodules.


Author(s):  
Yongfeng Gao ◽  
Jiaxing Tan ◽  
Zhengrong Liang ◽  
Lihong Li ◽  
Yumei Huo

AbstractComputer aided detection (CADe) of pulmonary nodules plays an important role in assisting radiologists’ diagnosis and alleviating interpretation burden for lung cancer. Current CADe systems, aiming at simulating radiologists’ examination procedure, are built upon computer tomography (CT) images with feature extraction for detection and diagnosis. Human visual perception in CT image is reconstructed from sinogram, which is the original raw data acquired from CT scanner. In this work, different from the conventional image based CADe system, we propose a novel sinogram based CADe system in which the full projection information is used to explore additional effective features of nodules in the sinogram domain. Facing the challenges of limited research in this concept and unknown effective features in the sinogram domain, we design a new CADe system that utilizes the self-learning power of the convolutional neural network to learn and extract effective features from sinogram. The proposed system was validated on 208 patient cases from the publicly available online Lung Image Database Consortium database, with each case having at least one juxtapleural nodule annotation. Experimental results demonstrated that our proposed method obtained a value of 0.91 of the area under the curve (AUC) of receiver operating characteristic based on sinogram alone, comparing to 0.89 based on CT image alone. Moreover, a combination of sinogram and CT image could further improve the value of AUC to 0.92. This study indicates that pulmonary nodule detection in the sinogram domain is feasible with deep learning.


2010 ◽  
Vol 50 (1) ◽  
pp. 43-53 ◽  
Author(s):  
Michael C. Lee ◽  
Lilla Boroczky ◽  
Kivilcim Sungur-Stasik ◽  
Aaron D. Cann ◽  
Alain C. Borczuk ◽  
...  

Author(s):  
Henriëtte A. W. Meijer ◽  
Maurits Graafland ◽  
Miryam C. Obdeijn ◽  
Marlies P. Schijven ◽  
J. Carel Goslings

Abstract Purpose To determine the validity of wrist range of motion (ROM) measurements by the wearable-controlled ReValidate! wrist-rehabilitation game, which simultaneously acts as a digital goniometer. Furthermore, to establish the reliability of the game by contrasting ROM measurements to those found by medical experts using a universal goniometer. Methods As the universal goniometer is considered the reference standard, inter-rater reliability between surgeons was first determined. Internal validity of the game ROM measurements was determined in a test–retest setting with healthy volunteers. The reliability of the game was tested in 34 patients with a restricted range of motion, in whom the ROM was measured by experts as well as digitally. Intraclass-correlation coefficients (ICCs) were determined and outcomes were analyzed using Bland–Altman plots. Results Inter-rater reliability between experts using a universal goniometer was poor, with ICCs of 0.002, 0.160 and 0.520. Internal validity testing of the game found ICCs of − 0.693, 0.376 and 0.863, thus ranging from poor to good. Reliability testing of the game compared to medical expert measurements, found that mean differences were small for the flexion–extension arc and the radial deviation-ulnar deviation arc. Conclusion The ReValidate! game is a reliable home-monitoring device digitally measuring ROM in the wrist. Interestingly, the test–retest reliability of the serious game was found to be considerably higher than the inter-rater reliability of the reference standard, being healthcare professionals using a universal goniometer. Trial registration number (internal hospital registration only) MEC-AMC W17_003 #17.015.


2002 ◽  
Vol 82 (4) ◽  
pp. 364-371 ◽  
Author(s):  
Douglas P Gross ◽  
Michele C Battié

Abstract Background and Purpose. Functional capacity evaluations (FCEs) are measurement tools used in predicting readiness to return to work following injury. The interrater and test-retest reliability of determinations of maximal safe lifting during kinesiophysical FCEs were examined in a sample of people who were off work and receiving workers' compensation. Subjects. Twenty-eight subjects with low back pain who had plateaued with treatment were enrolled. Five occupational therapists, trained and experienced in kinesiophysical methods, conducted testing. Methods. A repeated-measures design was used, with raters testing subjects simultaneously, yet independently. Subjects were rated on 2 occasions, separated by 2 to 4 days. Analyses included intraclass correlation coefficients (ICCs) and 95% confidence intervals. Results. The ICC values for interrater reliability ranged from .95 to .98. Test-retest values ranged from .78 to .94. Discussion and Conclusion. Inconsistencies in subjects' performance across sessions were the greatest source of FCE measurement variability. Overall, however, test-retest reliability was good and interrater reliability was excellent.


Sign in / Sign up

Export Citation Format

Share Document