Prospectively-validated deep learning model for segmenting swallowing and chewing structures in CT

Author(s):  
Aditi Iyer ◽  
Maria Thor ◽  
Ifeanyirochukwu Onochie ◽  
Jennifer Hesse ◽  
Kaveh Zakeri ◽  
...  

Abstract ObjectiveDelineating swallowing and chewing structures aids in radiotherapy (RT) treatment planning to limit dysphagia, trismus, and speech dysfunction. We aim to develop an accurate and efficient method to automate this process.ApproachCT scans of 242 head and neck (H&N) cancer patients acquired from 2004-2009 at our institution were used to develop auto-segmentation models for the masseters, medial pterygoids, larynx, and pharyngeal constrictor muscle using DeepLabV3+. A cascaded architecture was used, wherein models were trained sequentially to spatially constrain each structure group based on prior segmentations. Additionally, an ensemble of models, combining contextual information from axial, coronal, and sagittal views was used to improve segmentation accuracy. Prospective evaluation was conducted by measuring the amount of manual editing required in 91 H&N CT scans acquired February-May 2021.Main resultsMedians and inter-quartile ranges of Dice Similarity Coefficients (DSC) computed on the retrospective testing set (N=24) were 0.87 (0.85-0.89) for the masseters, 0.80 (0.79- 0.81) for the medial pterygoids, 0.81 (0.79-0.84) for the larynx, and 0.69 (0.67-0.71) for the constrictor. Auto-segmentations, when compared to inter-observer variability in 10 randomly selected scans, showed better agreement (DSC) with each observer as compared to inter-observer DSC. Prospective analysis showed most manual modifications needed for clinical use were minor, suggesting auto-contouring could increase clinical efficiency. Trained segmentation models are available for research use upon request via https://github.com/cerr/CERR/wiki/Auto-Segmentation-models.SignificanceWe developed deep learning-based auto-segmentation models for swallowing and chewing structures in CT and demonstrated its potential for use in treatment planning to limit complications post-RT. To the best of our knowledge, this is the only prospectively-validated deep learning-based model for segmenting chewing and swallowing structures in CT. Additionally, the segmentation models have been made open-source to facilitate reproducibility and multi-institutional research.

2019 ◽  
Author(s):  
Aditi Iyer ◽  
Maria Thor ◽  
Rabia Haq ◽  
Joseph O. Deasy ◽  
Aditya P. Apte

AbstractPurposeDelineating the swallowing and chewing structures in Head and Neck (H&N) CT scans is necessary for radiotherapy treatment (RT) planning to reduce the incidence of radiation-induced dysphagia, trismus, and speech dysfunction. Automating this process would decrease the manual input required and yield reproducible segmentations, but generating accurate segmentations is challenging due to the complex morphology of swallowing and chewing structures and limited soft tissue contrast in CT images.MethodsWe trained deep learning models using 194 H&N CT scans from our institution to segment the masseters (left and right), medial pterygoids (left and right), larynx, and pharyngeal constrictor muscle using DeepLabV3+ with the resnet-101 backbone. Models were trained in a sequential manner to guide the localization of each structure group based on prior segmentations. Additionally, an ensemble of models was developed using contextual information from three different views (axial, coronal, and sagittal), for robustness to occasional failures of the individual models. Output probability maps were averaged, and voxels were assigned labels corresponding to the class with the highest combined probability.ResultsThe median dice similarity coefficients (DSC) computed on a hold-out set of 24 CT scans were 0.87±0.02 for the masseters, 0.80±0.03 for the medial pterygoids, 0.81±0.04 for the larynx, and 0.69±0.07for the constrictor muscle. The corresponding 95th percentile Hausdorff distances were 0.32±0.08cm (masseters), 0.42±0.2cm (medial pterygoids), 0.53±0.3cm (larynx), and 0.36±0.15cm (constrictor muscle). Dose-volume histogram (DVH) metrics previously found to correlate with each toxicity were extracted from manual and auto-generated contours and compared between the two sets of contours to assess clinical utility. Differences in DVH metrics were not found to be statistically significant (p>0.05) for any of the structures. Further, inter-observer variability in contouring was studied in 10 CT scans. Automated segmentations were found to agree better with each of the observers as compared to inter-observer agreement, measured in terms of DSC.ConclusionsWe developed deep learning-based auto-segmentation models for swallowing and chewing structures in CT. The resulting segmentations can be included in treatment planning to limit complications following RT for H&N cancer. The segmentation models developed in this work are distributed for research use through the open-source platform CERR, accessible at https://github.com/cerr/CERR.


2020 ◽  
Vol 36 (12) ◽  
pp. 3856-3862
Author(s):  
Di Jin ◽  
Peter Szolovits

Abstract Motivation In evidence-based medicine, defining a clinical question in terms of the specific patient problem aids the physicians to efficiently identify appropriate resources and search for the best available evidence for medical treatment. In order to formulate a well-defined, focused clinical question, a framework called PICO is widely used, which identifies the sentences in a given medical text that belong to the four components typically reported in clinical trials: Participants/Problem (P), Intervention (I), Comparison (C) and Outcome (O). In this work, we propose a novel deep learning model for recognizing PICO elements in biomedical abstracts. Based on the previous state-of-the-art bidirectional long-short-term memory (bi-LSTM) plus conditional random field architecture, we add another layer of bi-LSTM upon the sentence representation vectors so that the contextual information from surrounding sentences can be gathered to help infer the interpretation of the current one. In addition, we propose two methods to further generalize and improve the model: adversarial training and unsupervised pre-training over large corpora. Results We tested our proposed approach over two benchmark datasets. One is the PubMed-PICO dataset, where our best results outperform the previous best by 5.5%, 7.9% and 5.8% for P, I and O elements in terms of F1 score, respectively. And for the other dataset named NICTA-PIBOSO, the improvements for P/I/O elements are 3.9%, 15.6% and 1.3% in F1 score, respectively. Overall, our proposed deep learning model can obtain unprecedented PICO element detection accuracy while avoiding the need for any manual feature selection. Availability and implementation Code is available at https://github.com/jind11/Deep-PICO-Detection.


2021 ◽  
Author(s):  
Parnian Afshar ◽  
Shahin Heidarian ◽  
Farnoosh Naderkhani ◽  
Moezedin Javad Rafiee ◽  
Anastasia Oikonomou ◽  
...  

2020 ◽  
Author(s):  
William Speier ◽  
Jiayun Li ◽  
Wenyuan Li ◽  
Karthik Sarma ◽  
Corey Arnold

AbstractAutomated Gleason grading can be a valuable tool for physicians when assessing risk and planning treatment for prostate cancer patients. Semantic segmentation provides pixel-wise Gleason predictions across an entire slide, which can be more informative than classification of pre-selected homogeneous regions. Deep learning methods can automatically learn visual semantics to accomplish this task, but training models on whole slides is impractical due to large image sizes and scarcity of fully annotated data. Patch-based methods can alleviate these problems, and have been shown to produce significant results in histopathology segmentation. However, the irregular contours of biopsies on slides makes performance highly dependent on patch selection. In the traditional grid-based strategy, many patches lie on biopsy boundaries, reducing segmentation accuracy due to a reduction in contextual information. In this paper, we propose an automatic patch selection process based on image features. This algorithm segments the biopsy and aligns patches based on the tissue contour to maximize the amount of contextual information in each patch. This method was used to generate patches for a fully convolutional network to segment high grade, low grade, and benign tissue from a set of 59 histopathological slides, and results were compared against manual physician labels. We show that using our image-based patch selection algorithm results in a significant improvement in segmentation accuracy over the traditional grid-based approach. Our results suggest that informed patch selection can be a valuable addition to an automated histopathological analysis pipeline.


2021 ◽  
pp. 1-38
Author(s):  
Josh M. Nicholson ◽  
Milo Mordaunt ◽  
Patrice Lopez ◽  
Ashish Uppala ◽  
Dominic Rosati ◽  
...  

Abstract Citation indices are tools used by the academic community for research and research evaluation which aggregate scientific literature output and measure impact by collating citation counts. Citation indices help measure the interconnections between scientific papers but fall short because they fail to communicate contextual information about a citation. The usage of citations in research evaluation without consideration of context can be problematic, because a citation that presents contrasting evidence to a paper is treated the same as a citation that presents supporting evidence. To solve this problem, we have used machine learning, traditional document ingestion methods, and a network of researchers to develop a “smart citation index” called scite, which categorizes citations based on context. Scite shows how a citation was used by displaying the surrounding textual context from the citing paper and a classification from our deep learning model that indicates whether the statement provides supporting or contrasting evidence for a referenced work, or simply mentions it. Scite has been developed by analyzing over 25 million full-text scientific articles and currently has a database of more than 880 million classified citation statements. Here we describe how scite works and how it can be used to further research and research evaluation. Peer Review https://publons.com/publon/10.1162/qss_a_00146


2022 ◽  
Vol 8 ◽  
Author(s):  
Yan Yi ◽  
Li Mao ◽  
Cheng Wang ◽  
Yubo Guo ◽  
Xiao Luo ◽  
...  

Background: The identification of aortic dissection (AD) at baseline plays a crucial role in clinical practice. Non-contrast CT scans are widely available, convenient, and easy to perform. However, the detection of AD on non-contrast CT scans by radiologists currently lacks sensitivity and is suboptimal.Methods: A total of 452 patients who underwent aortic CT angiography (CTA) were enrolled retrospectively from two medical centers in China to form the internal cohort (341 patients, 139 patients with AD, 202 patients with non-AD) and the external testing cohort (111 patients, 46 patients with AD, 65 patients with non-AD). The internal cohort was divided into the training cohort (n = 238), validation cohort (n = 35), and internal testing cohort (n = 68). Morphological characteristics were extracted from the aortic segmentation. A deep-integrated model based on the Gaussian Naive Bayes algorithm was built to differentiate AD from non-AD, using the combination of the three-dimensional (3D) deep-learning model score and morphological characteristics. The areas under the receiver operating characteristic curve (AUCs), accuracy, sensitivity, and specificity were used to evaluate the model performance. The proposed model was also compared with the subjective assessment of radiologists.Results: After the combination of all the morphological characteristics, our proposed deep-integrated model significantly outperformed the 3D deep-learning model (AUC: 0.948 vs. 0.803 in the internal testing cohort and 0.969 vs. 0.814 in the external testing cohort, both p < 0.05). The accuracy, sensitivity, and specificity of our model reached 0.897, 0.862, and 0.923 in the internal testing cohort and 0.730, 0.978, and 0.554 in the external testing cohort, respectively. The accuracy for AD detection showed no significant difference between our model and the radiologists (p > 0.05).Conclusion: The proposed model presented good performance for AD detection on non-contrast CT scans; thus, early diagnosis and prompt treatment would be available.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e20601-e20601 ◽  
Author(s):  
Yi Yang ◽  
Jiancheng Yang ◽  
Yuxiang Ye ◽  
Tian Xia ◽  
Shun Lu

e20601 Background: Manual application of length-based tumor response criteria is the standard-of-care for assessing metastatic tumor response. It is technically challenging, time-consuming and associated with low reproducibility. In this study, we presented a novel automatic Deep Neural Networks (DNNs) based segmentation method for assessing tumor progression to immunotherapy. Next stage, AI will assist Physicians assessing pseudo-progression. Methods: A data set of 39 lung cancer patients with 156 computed tomography (CT) scans was used for model training and validation. A 3D segmentation DNN DenseSharp, was trained with an input size of on CT scans of tumor with manual delineated volume of interest (VOI) as ground truth. The trained model was subsequently used to estimate the volumes of target lesions via 16 sliding windows. We referred the progression-free survival (PFS) only considering tumor size as PFS-T. PFS-Ts assessed by longest tumor diameter (PFS-Tdiam), by tumor volume (PFS-Tvol), and by predicted tumor volume (PFS-Tpred-vol) were compared with standard PFS (as assessed by one junior and one senior clinician). Tumor progression was defined as > 20% increase in the longest tumor diameter or > 50% increase in tumor volume. Effective treatment was defined as a PFS of > 60 days after immunotherapy. Results: In a 4-fold cross-validation test, the DenseSharp segmentation neural network achieved a mean per-class intersection over union (mIoU) of 80.1%. The effectiveness rates of immunotherapy assessed using PFS-Tdiam (32 / 39, 82.1%), PFS-Tvol (33/39, 84.6%) and PFS-T pred-vol (32/39, 82.1%) were the same as standard PFS. The agreement between PFS-Tvol, and PFS-Tpred-vol was 97.4% (38/39). Evaluation time with deep learning model implemented with PyTorch 0.4.1 on GTX 1080 GPU was hundred-fold faster than manual evaluation (1.42s vs. 5-10 min per patient). Conclusions: In this study, DNN based model demonstrated fast and stable performance for tumor progression evaluation. Automatic volumetric measurement of tumor lesion enabled by deep learning provides the potential for a more efficient, objective and sensitive measurement than linear measurement by clinicians.


2020 ◽  
Author(s):  
Wen Chen ◽  
Yimin Li ◽  
Brandon A Dyer ◽  
Xue Feng ◽  
Shyam Rao ◽  
...  

Abstract Background: Impaired function of masticatory muscles will lead to trismus. Routine delineation of these muscles during planning may improve dose tracking and facilitate dose reduction resulting in decreased radiation-related trismus. This study aimed to compare a deep learning model with a commercial atlas-based model for fast auto-segmentation of the masticatory muscles on head and neck computed tomography (CT) images. Material and methods: Paired masseter (M), temporalis (T), medial and lateral pterygoid (MP, LP) muscles were manually segmented on 56 CT images. CT images were randomly divided into training (n=27) and validation (n=29) cohorts. Two methods were used for automatic delineation of masticatory muscles (MMs): Deep learning auto-segmentation (DLAS) and atlas-based auto-segmentation (ABAS). The automatic algorithms were evaluated using Dice similarity coefficient (DSC), recall, precision, Hausdorff distance (HD), HD95, and mean surface distance (MSD). A consolidated score was calculated by normalizing the metrics against interobserver variability and averaging over all patients. Differences in dose (∆Dose) to MMs for DLAS and ABAS segmentations were assessed. A paired t-test was used to compare the geometric and dosimetric difference between DLAS and ABAS methods.Results: DLAS outperformed ABAS in delineating all MMs (p < 0.05). The DLAS mean DSC for M, T, MP, and LP ranged from 0.83±0.03 to 0.89±0.02, the ABAS mean DSC ranged from 0.79±0.05 to 0.85±0.04. The mean value for recall, HD, HD95, MSD also improved with DLAS for auto-segmentation. Interobserver variation revealed the highest variability in DSC and MSD for both T and MP, and the highest scores were achieved for T by both automatic algorithms. With few exceptions, the mean ∆D98%, ∆D95%, ∆D50%, and ∆D2% for all structures were below 10% for DLAS and ABAS and had no detectable statistical difference (P >0.05). DLAS based contours had dose endpoints more closely matched with that of the manually segmented when compared with ABAS. Conclusions: DLAS auto-segmentation of masticatory muscles for the head and neck radiotherapy had improved segmentation accuracy compared with ABAS with no qualitative difference in dosimetric endpoints compared to manually segmented contours.


2020 ◽  
Author(s):  
Zhipeng Chen ◽  
Daniel D Zeng ◽  
Ryan G N Seltzer ◽  
Blake D Hamilton

BACKGROUND Though shock wave lithotripsy (SWL) has developed to be one of the most common treatment approaches for nephrolithiasis in recent decades, its treatment planning is often a trial-and-error process based on physicians’ subjective judgement. Physicians’ inexperience with this modality can lead to low-quality treatment and unnecessary risks to patients. OBJECTIVE To improve the quality and consistency of shock wave lithotripsy treatment, we aimed to develop a deep learning model for generating the next treatment step by previous steps and preoperative patient characteristics and to produce personalized SWL treatment plans in a step-by-step protocol based on the deep learning model. METHODS We developed a deep learning model to generate the optimal power level, shock rate, and number of shocks in the next step, given previous treatment steps encoded by long short-term memory neural networks and preoperative patient characteristics. We constructed a next-step data set (N=8583) from top practices of renal SWL treatments recorded in the International Stone Registry. Then, we trained the deep learning model and baseline models (linear regression, logistic regression, random forest, and support vector machine) with 90% of the samples and validated them with the remaining samples. RESULTS The deep learning models for generating the next treatment steps outperformed the baseline models (accuracy = 98.8%, F1 = 98.0% for power levels; accuracy = 98.1%, F1 = 96.0% for shock rates; root mean squared error = 207, mean absolute error = 121 for numbers of shocks). The hypothesis testing showed no significant difference between steps generated by our model and the top practices (<i>P</i>=.480 for power levels; <i>P</i>=.782 for shock rates; <i>P</i>=.727 for numbers of shocks). CONCLUSIONS The high performance of our deep learning approach shows its treatment planning capability on par with top physicians. To the best of our knowledge, our framework is the first effort to implement automated planning of SWL treatment via deep learning. It is a promising technique in assisting treatment planning and physician training at low cost.


2020 ◽  
Author(s):  
Wen Chen ◽  
Brandon A Dyer ◽  
Xue Feng ◽  
Yimin Li ◽  
Shyam Rao ◽  
...  

Abstract Background: Trismus is caused by impaired function of masticatory muscles. Routine delineation of these muscles during planning may improve dose tracking and facilitate dose reduction resulting in decreased radiation-related trismus. This study aimed to compare a deep learning model vs. a commercial atlas-based model for fast auto-segmentation of the masticatory muscles on head and neck computed tomography (CT) images. Material and methods: Paired masseter (M), temporalis (T), medial and lateral pterygoid (MP, LP) muscles were manually segmented on 56 CT images. CT images were randomly divided into training (n=27) and validation (n=29) cohorts. Two methods were used for automatic delineation of masticatory muscles (MMs): Deep learning auto-segmentation (DLAS) and atlas-based auto-segmentation (ABAS). Quantitative assessment of automatic versus manually segmented contours were performed using Dice similarity coefficient (DSC), recall, precision, Hausdorff distance (HD), HD95, and mean surface distance (MSD). The interobserver variability in manual segmentation of MMs was also evaluated. Differences in dose (∆Dose) to MMs for DLAS and ABAS segmentations were assessed. A paired t-test was used to compare the geometric and dosimetric difference between DLAS and ABAS methods.Results: DLAS outperformed ABAS in delineating all MMs (p < 0.05). The DLAS mean DSC for M, T, MP, and LP ranged between 0.83±0.03 to 0.89±0.02, the ABAS mean DSC ranged between 0.79±0.05 to 0.85±0.04. The mean value for recall, precision, HD, HD95, MSD also improved with DLAS for auto-segmentation and were close to the mean interobserver variation. With few exceptions, ∆D99%, ∆D95%, ∆D50%, and ∆D1% for all structures were below 10% for DLAS and ABAS and had no detectable statistical difference (P >0.05). DLAS based contours have dose endpoints more closely matched with that of the manually segmented when compared with ABAS. Conclusions: DLAS auto-segmentation of masticatory muscles for the head and neck radiotherapy had improved segmentation accuracy compared with ABAS with no qualitative difference in dosimetric endpoints compared to manually segmented contours.


Sign in / Sign up

Export Citation Format

Share Document