Implementation of deep learning-based auto-segmentation for radiotherapy planning structures: a workflow study at two cancer centers

Abstract Purpose We recently described the validation of deep learning-based auto-segmented contour (DC) models for organs at risk (OAR) and clinical target volumes (CTV). In this study, we evaluate the performance of implemented DC models in the clinical radiotherapy (RT) planning workflow and report on user experience. Methods and materials DC models were implemented at two cancer centers and used to generate OAR and CTVs for all patients undergoing RT for a central nervous system (CNS), head and neck (H&N), or prostate cancer. Radiation Therapists/Dosimetrists and Radiation Oncologists completed post-contouring surveys rating the degree of edits required for DCs (1 = minimal, 5 = significant) and overall DC satisfaction (1 = poor, 5 = high). Unedited DCs were compared to the edited treatment approved contours using Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD). Results Between September 19, 2019 and March 6, 2020, DCs were generated on approximately 551 eligible cases. 203 surveys were collected on 27 CNS, 54 H&N, and 93 prostate RT plans, resulting in an overall survey compliance rate of 32%. The majority of OAR DCs required minimal edits subjectively (mean editing score ≤ 2) and objectively (mean DSC and 95% HD was ≥ 0.90 and ≤ 2.0 mm). Mean OAR satisfaction score was 4.1 for CNS, 4.4 for H&N, and 4.6 for prostate structures. Overall CTV satisfaction score (n = 25), which encompassed the prostate, seminal vesicles, and neck lymph node volumes, was 4.1. Conclusions Previously validated OAR DC models for CNS, H&N, and prostate RT planning required minimal subjective and objective edits and resulted in a positive user experience, although low survey compliance was a concern. CTV DC model evaluation was even more limited, but high user satisfaction suggests that they may have served as appropriate starting points for patient specific edits.

Download Full-text

Training and Validation of Deep Learning-Based Auto-Segmentation Models for Lung Stereotactic Ablative Radiotherapy Using Retrospective Radiotherapy Planning Contours

Frontiers in Oncology ◽

10.3389/fonc.2021.626499 ◽

2021 ◽

Vol 11 ◽

Author(s):

Jordan Wong ◽

Vicky Huang ◽

Joshua A. Giambattista ◽

Tony Teke ◽

Carter Kolbeck ◽

...

Keyword(s):

Deep Learning ◽

Organs At Risk ◽

Bronchial Tree ◽

Quality Data ◽

Radiotherapy Planning ◽

Ct Scans ◽

Stereotactic Ablative Radiotherapy ◽

Dice Similarity Coefficient ◽

Clinical Workflow ◽

Bilateral Lung

PurposeDeep learning-based auto-segmented contour (DC) models require high quality data for their development, and previous studies have typically used prospectively produced contours, which can be resource intensive and time consuming to obtain. The aim of this study was to investigate the feasibility of using retrospective peer-reviewed radiotherapy planning contours in the training and evaluation of DC models for lung stereotactic ablative radiotherapy (SABR).MethodsUsing commercial deep learning-based auto-segmentation software, DC models for lung SABR organs at risk (OAR) and gross tumor volume (GTV) were trained using a deep convolutional neural network and a median of 105 contours per structure model obtained from 160 publicly available CT scans and 50 peer-reviewed SABR planning 4D-CT scans from center A. DCs were generated for 50 additional planning CT scans from center A and 50 from center B, and compared with the clinical contours (CC) using the Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (HD).ResultsComparing DCs to CCs, the mean DSC and 95% HD were 0.93 and 2.85mm for aorta, 0.81 and 3.32mm for esophagus, 0.95 and 5.09mm for heart, 0.98 and 2.99mm for bilateral lung, 0.52 and 7.08mm for bilateral brachial plexus, 0.82 and 4.23mm for proximal bronchial tree, 0.90 and 1.62mm for spinal cord, 0.91 and 2.27mm for trachea, and 0.71 and 5.23mm for GTV. DC to CC comparisons of center A and center B were similar for all OAR structures.ConclusionsThe DCs developed with retrospective peer-reviewed treatment contours approximated CCs for the majority of OARs, including on an external dataset. DCs for structures with more variability tended to be less accurate and likely require using a larger number of training cases or novel training approaches to improve performance. Developing DC models from existing radiotherapy planning contours appears feasible and warrants further clinical workflow testing.

Download Full-text

Evaluation of deep learning-based autosegmentation in breast cancer radiotherapy

Radiation Oncology ◽

10.1186/s13014-021-01923-1 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Hwa Kyung Byun ◽

Jee Suk Chang ◽

Min Seo Choi ◽

Jaehee Chun ◽

Jinhong Jung ◽

...

Keyword(s):

Deep Learning ◽

User Satisfaction ◽

Organs At Risk ◽

Breast Conserving Surgery ◽

Dice Similarity Coefficient ◽

Expert Committee ◽

Breast Radiotherapy ◽

Breast Cancer Radiotherapy ◽

Independent Expert ◽

Manual Contour

Abstract Purpose To study the performance of a proposed deep learning-based autocontouring system in delineating organs at risk (OARs) in breast radiotherapy with a group of experts. Methods Eleven experts from two institutions delineated nine OARs in 10 cases of adjuvant radiotherapy after breast-conserving surgery. Autocontours were then provided to the experts for correction. Overall, 110 manual contours, 110 corrected autocontours, and 10 autocontours of each type of OAR were analyzed. The Dice similarity coefficient (DSC) and Hausdorff distance (HD) were used to compare the degree of agreement between the best manual contour (chosen by an independent expert committee) and each autocontour, corrected autocontour, and manual contour. Higher DSCs and lower HDs indicated a better geometric overlap. The amount of time reduction using the autocontouring system was examined. User satisfaction was evaluated using a survey. Results Manual contours, corrected autocontours, and autocontours had a similar accuracy in the average DSC value (0.88 vs. 0.90 vs. 0.90). The accuracy of autocontours ranked the second place, based on DSCs, and the first place, based on HDs among the manual contours. Interphysician variations among the experts were reduced in corrected autocontours, compared to variations in manual contours (DSC: 0.89–0.90 vs. 0.87–0.90; HD: 4.3–5.8 mm vs. 5.3–7.6 mm). Among the manual delineations, the breast contours had the largest variations, which improved most significantly with the autocontouring system. The total mean times for nine OARs were 37 min for manual contours and 6 min for corrected autocontours. The results of the survey revealed good user satisfaction. Conclusions The autocontouring system had a similar performance in OARs as that of the experts’ manual contouring. This system can be valuable in improving the quality of breast radiotherapy and reducing interphysician variability in clinical practice.

Download Full-text

DVHnet: A deep‐learning‐based prediction of patient‐specific dose volume histograms for radiotherapy planning

Medical Physics ◽

10.1002/mp.14758 ◽

2021 ◽

Author(s):

Xinyuan Chen ◽

Kuo Men ◽

Ji Zhu ◽

Bingning Yang ◽

Minghui Li ◽

...

Keyword(s):

Deep Learning ◽

Radiotherapy Planning ◽

Patient Specific ◽

Dose Volume Histograms ◽

Dose Volume

Download Full-text

Design of Desktop Audiovisual Entertainment System with Deep Learning and Haptic Sensations

Symmetry ◽

10.3390/sym12101718 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1718

Author(s):

Chien-Hsing Chou ◽

Yu-Sheng Su ◽

Che-Ju Hsu ◽

Kong-Chang Lee ◽

Ping-Hsuan Han

Keyword(s):

Deep Learning ◽

Object Detection ◽

User Experience ◽

Recognition System ◽

Scene Recognition ◽

Single Shot ◽

Auditory Signals ◽

Hot Weather ◽

Viewing Experience ◽

At Home

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.

Download Full-text

Soda Machine and User Experience: A Study of Icon Display

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181320641493 ◽

2020 ◽

Vol 64 (1) ◽

pp. 2035-2045

Author(s):

Vera Puglisi ◽

Jasmine Ghorbani ◽

Yan Chen ◽

Manuel Nyagisere ◽

Grace Babalola ◽

...

Keyword(s):

User Experience ◽

User Satisfaction ◽

Touch Screen ◽

T Test ◽

Consumer Electronics ◽

Point Scale ◽

Test User

Touch-screen GUIs have become a key feature of modern consumer electronics. The purpose of this study is to investigate the effect that reducing the number of icons on the GUI of a popular soda machine has on drink selection time and user satisfaction. Twenty subjects participated in the study, with 10 assigned to the control and experimental groups respectively. Time to make a drink selection was recorded and compared between groups using unpaired t-test. User satisfaction was measured using a five-point scale questionnaire. The results suggested that user satisfaction, except for the display dependability category, is not affected by the reduction of the number of icons on the soda machine GUI and no change was observed in drink selection time.

Download Full-text

Advancing Stress Detection Methodology with Deep Learning Techniques Targeting UX Evaluation in AAL Scenarios: Applying Embeddings for Categorical Variables

Electronics ◽

10.3390/electronics10131550 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1550

Author(s):

Alexandros Liapis ◽

Evanthia Faliagka ◽

Christos P. Antonopoulos ◽

Georgios Keramidas ◽

Nikolaos Voros

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Experience ◽

Electrodermal Activity ◽

Binary Classification ◽

Research Question ◽

Classification Problem ◽

Categorical Variables ◽

Stress Detection ◽

Software Failures

Physiological measurements have been widely used by researchers and practitioners in order to address the stress detection challenge. So far, various datasets for stress detection have been recorded and are available to the research community for testing and benchmarking. The majority of the stress-related available datasets have been recorded while users were exposed to intense stressors, such as songs, movie clips, major hardware/software failures, image datasets, and gaming scenarios. However, it remains an open research question if such datasets can be used for creating models that will effectively detect stress in different contexts. This paper investigates the performance of the publicly available physiological dataset named WESAD (wearable stress and affect detection) in the context of user experience (UX) evaluation. More specifically, electrodermal activity (EDA) and skin temperature (ST) signals from WESAD were used in order to train three traditional machine learning classifiers and a simple feed forward deep learning artificial neural network combining continues variables and entity embeddings. Regarding the binary classification problem (stress vs. no stress), high accuracy (up to 97.4%), for both training approaches (deep-learning, machine learning), was achieved. Regarding the stress detection effectiveness of the created models in another context, such as user experience (UX) evaluation, the results were quite impressive. More specifically, the deep-learning model achieved a rather high agreement when a user-annotated dataset was used for validation.

Download Full-text

Automated Segmentation of Infarct Lesions in T1-Weighted MRI Scans Using Variational Mode Decomposition and Deep Learning

Sensors ◽

10.3390/s21061952 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1952

Author(s):

May Phu Paing ◽

Supan Tungjitkusolmun ◽

Toan Huy Bui ◽

Sarinporn Visitsattapongse ◽

Chuchart Pintavirooj

Keyword(s):

Deep Learning ◽

Brain Infarction ◽

Three Dimensional ◽

Dice Similarity Coefficient ◽

Automated Segmentation ◽

Variational Mode Decomposition ◽

Brain Scans ◽

Automated Method ◽

Mode Decomposition ◽

Segmentation Task

Automated segmentation methods are critical for early detection, prompt actions, and immediate treatments in reducing disability and death risks of brain infarction. This paper aims to develop a fully automated method to segment the infarct lesions from T1-weighted brain scans. As a key novelty, the proposed method combines variational mode decomposition and deep learning-based segmentation to take advantages of both methods and provide better results. There are three main technical contributions in this paper. First, variational mode decomposition is applied as a pre-processing to discriminate the infarct lesions from unwanted non-infarct tissues. Second, overlapped patches strategy is proposed to reduce the workload of the deep-learning-based segmentation task. Finally, a three-dimensional U-Net model is developed to perform patch-wise segmentation of infarct lesions. A total of 239 brain scans from a public dataset is utilized to develop and evaluate the proposed method. Empirical results reveal that the proposed automated segmentation can provide promising performances with an average dice similarity coefficient (DSC) of 0.6684, intersection over union (IoU) of 0.5022, and average symmetric surface distance (ASSD) of 0.3932, respectively.

Download Full-text

Deep learning-based segmentation of the lung in MR-images acquired by a stack-of-spirals trajectory at ultra-short echo-times

BMC Medical Imaging ◽

10.1186/s12880-021-00608-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Andreas M. Weng ◽

Julius F. Heidenreich ◽

Corona Metz ◽

Simon Veldhoen ◽

Thorsten A. Bley ◽

...

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Accurate Estimation ◽

Dice Similarity Coefficient ◽

Manual Segmentation ◽

Pearson’S Correlation ◽

Lung Volumes ◽

Lung Segmentation ◽

Pearson's Correlation ◽

Lung Mri

Abstract Background Functional lung MRI techniques are usually associated with time-consuming post-processing, where manual lung segmentation represents the most cumbersome part. The aim of this study was to investigate whether deep learning-based segmentation of lung images which were scanned by a fast UTE sequence exploiting the stack-of-spirals trajectory can provide sufficiently good accuracy for the calculation of functional parameters. Methods In this study, lung images were acquired in 20 patients suffering from cystic fibrosis (CF) and 33 healthy volunteers, by a fast UTE sequence with a stack-of-spirals trajectory and a minimum echo-time of 0.05 ms. A convolutional neural network was then trained for semantic lung segmentation using 17,713 2D coronal slices, each paired with a label obtained from manual segmentation. Subsequently, the network was applied to 4920 independent 2D test images and results were compared to a manual segmentation using the Sørensen–Dice similarity coefficient (DSC) and the Hausdorff distance (HD). Obtained lung volumes and fractional ventilation values calculated from both segmentations were compared using Pearson’s correlation coefficient and Bland Altman analysis. To investigate generalizability to patients outside the CF collective, in particular to those exhibiting larger consolidations inside the lung, the network was additionally applied to UTE images from four patients with pneumonia and one with lung cancer. Results The overall DSC for lung tissue was 0.967 ± 0.076 (mean ± standard deviation) and HD was 4.1 ± 4.4 mm. Lung volumes derived from manual and deep learning based segmentations as well as values for fractional ventilation exhibited a high overall correlation (Pearson’s correlation coefficent = 0.99 and 1.00). For the additional cohort with unseen pathologies / consolidations, mean DSC was 0.930 ± 0.083, HD = 12.9 ± 16.2 mm and the mean difference in lung volume was 0.032 ± 0.048 L. Conclusions Deep learning-based image segmentation in stack-of-spirals based lung MRI allows for accurate estimation of lung volumes and fractional ventilation values and promises to replace the time-consuming step of manual image segmentation in the future.

Download Full-text

Foveal avascular zone segmentation in optical coherence tomography angiography images using a deep learning approach

Scientific Reports ◽

10.1038/s41598-020-80058-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Reza Mirshahi ◽

Pasha Anvari ◽

Hamid Riazi-Esfahani ◽

Mahsa Sardarinia ◽

Masood Naseripour ◽

...

Keyword(s):

Optical Coherence Tomography ◽

Diabetic Retinopathy ◽

Deep Learning ◽

Healthy Subjects ◽

Optical Coherence Tomography Angiography ◽

Validation Dataset ◽

Diabetic Patients ◽

Dice Similarity Coefficient ◽

Optical Coherence ◽

Avascular Zone

AbstractThe purpose of this study was to introduce a new deep learning (DL) model for segmentation of the fovea avascular zone (FAZ) in en face optical coherence tomography angiography (OCTA) and compare the results with those of the device’s built-in software and manual measurements in healthy subjects and diabetic patients. In this retrospective study, FAZ borders were delineated in the inner retinal slab of 3 × 3 enface OCTA images of 131 eyes of 88 diabetic patients and 32 eyes of 18 healthy subjects. To train a deep convolutional neural network (CNN) model, 126 enface OCTA images (104 eyes with diabetic retinopathy and 22 normal eyes) were used as training/validation dataset. Then, the accuracy of the model was evaluated using a dataset consisting of OCTA images of 10 normal eyes and 27 eyes with diabetic retinopathy. The CNN model was based on Detectron2, an open-source modular object detection library. In addition, automated FAZ measurements were conducted using the device’s built-in commercial software, and manual FAZ delineation was performed using ImageJ software. Bland–Altman analysis was used to show 95% limit of agreement (95% LoA) between different methods. The mean dice similarity coefficient of the DL model was 0.94 ± 0.04 in the testing dataset. There was excellent agreement between automated, DL model and manual measurements of FAZ in healthy subjects (95% LoA of − 0.005 to 0.026 mm2 between automated and manual measurement and 0.000 to 0.009 mm2 between DL and manual FAZ area). In diabetic eyes, the agreement between DL and manual measurements was excellent (95% LoA of − 0.063 to 0.095), however, there was a poor agreement between the automated and manual method (95% LoA of − 0.186 to 0.331). The presence of diabetic macular edema and intraretinal cysts at the fovea were associated with erroneous FAZ measurements by the device’s built-in software. In conclusion, the DL model showed an excellent accuracy in detection of FAZ border in enfaces OCTA images of both diabetic patients and healthy subjects. The DL and manual measurements outperformed the automated measurements of the built-in software.

Download Full-text

Assessment of patient specific information in the wild on fundus photography and optical coherence tomography

Scientific Reports ◽

10.1038/s41598-021-86577-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marion R. Munk ◽

Thomas Kurmann ◽

Pablo Márquez-Neila ◽

Martin S. Zinkernagel ◽

Sebastian Wolf ◽

...

Keyword(s):

Deep Learning ◽

Cross Sections ◽

Underlying Disease ◽

Geographic Atrophy ◽

Clinical Population ◽

Fundus Photography ◽

Peak Performance ◽

Patient Specific ◽

Specific Information ◽

Fundus Images

AbstractIn this paper we analyse the performance of machine learning methods in predicting patient information such as age or sex solely from retinal imaging modalities in a heterogeneous clinical population. Our dataset consists of N = 135,667 fundus images and N = 85,536 volumetric OCT scans. Deep learning models were trained to predict the patient’s age and sex from fundus images, OCT cross sections and OCT volumes. For sex prediction, a ROC AUC of 0.80 was achieved for fundus images, 0.84 for OCT cross sections and 0.90 for OCT volumes. Age prediction mean absolute errors of 6.328 years for fundus, 5.625 years for OCT cross sections and 4.541 for OCT volumes were observed. We assess the performance of OCT scans containing different biomarkers and note a peak performance of AUC = 0.88 for OCT cross sections and 0.95 for volumes when there is no pathology on scans. Performance drops in case of drusen, fibrovascular pigment epitheliuum detachment and geographic atrophy present. We conclude that deep learning based methods are capable of classifying the patient’s sex and age from color fundus photography and OCT for a broad spectrum of patients irrespective of underlying disease or image quality. Non-random sex prediction using fundus images seems only possible if the eye fovea and optic disc are visible.

Download Full-text