Radiation-induced xerostomia, as a major problem in radiation treatment of the head and neck cancer, is mainly due to the overdose irradiation injury to the parotid glands. Helical Tomotherapy-based megavoltage computed tomography (MVCT) imaging during the Tomotherapy treatment can be applied to monitor the successive variations in the parotid glands. While manual segmentation is time consuming, laborious, and subjective, automatic segmentation is quite challenging due to the complicated anatomical environment of head and neck as well as noises in MVCT images. In this article, we propose a localization-refinement scheme to segment the parotid gland in MVCT. After data pre-processing we use mask region convolutional neural network (Mask R-CNN) in the localization stage after data pre-processing, and design a modified U-Net in the following fine segmentation stage. To the best of our knowledge, this study is a pioneering work of deep learning on MVCT segmentation. Comprehensive experiments based on different data distribution of head and neck MVCTs and different segmentation models have demonstrated the superiority of our approach in terms of accuracy, effectiveness, flexibility, and practicability. Our method can be adopted as a powerful tool for radiation-induced injury studies, where accurate organ segmentation is crucial.
In magnetic resonance imaging (MRI) segmentation, conventional approaches utilize U-Net models with encoder–decoder structures, segmentation models using vision transformers, or models that combine a vision transformer with an encoder–decoder model structure. However, conventional models have large sizes and slow computation speed and, in vision transformer models, the computation amount sharply increases with the image size. To overcome these problems, this paper proposes a model that combines Swin transformer blocks and a lightweight U-Net type model that has an HarDNet blocks-based encoder–decoder structure. To maintain the features of the hierarchical transformer and shifted-windows approach of the Swin transformer model, the Swin transformer is used in the first skip connection layer of the encoder instead of in the encoder–decoder bottleneck. The proposed model, called STHarDNet, was evaluated by separating the anatomical tracings of lesions after stroke (ATLAS) dataset, which comprises 229 T1-weighted MRI images, into training and validation datasets. It achieved Dice, IoU, precision, and recall values of 0.5547, 0.4185, 0.6764, and 0.5286, respectively, which are better than those of the state-of-the-art models U-Net, SegNet, PSPNet, FCHarDNet, TransHarDNet, Swin Transformer, Swin UNet, X-Net, and D-UNet. Thus, STHarDNet improves the accuracy and speed of MRI image-based stroke diagnosis.
When segmenting massive amounts of remote sensing images collected from different satellites or geographic locations (cities), the pre-trained deep learning models cannot always output satisfactory predictions. To deal with this issue, domain adaptation has been widely utilized to enhance the generalization abilities of the segmentation models. Most of the existing domain adaptation methods, which based on image-to-image translation, firstly transfer the source images to the pseudo-target images, adapt the classifier from the source domain to the target domain. However, these unidirectional methods suffer from the following two limitations: (1) they do not consider the inverse procedure and they cannot fully take advantage of the information from the other domain, which is also beneficial, as confirmed by our experiments; (2) these methods may fail in the cases where transferring the source images to the pseudo-target images is difficult. In this paper, in order to solve these problems, we propose a novel framework BiFDANet for unsupervised bidirectional domain adaptation in the semantic segmentation of remote sensing images. It optimizes the segmentation models in two opposite directions. In the source-to-target direction, BiFDANet learns to transfer the source images to the pseudo-target images and adapts the classifier to the target domain. In the opposite direction, BiFDANet transfers the target images to the pseudo-source images and optimizes the source classifier. At test stage, we make the best of the source classifier and the target classifier, which complement each other with a simple linear combination method, further improving the performance of our BiFDANet. Furthermore, we propose a new bidirectional semantic consistency loss for our BiFDANet to maintain the semantic consistency during the bidirectional image-to-image translation process. The experiments on two datasets including satellite images and aerial images demonstrate the superiority of our method against existing unidirectional methods.
AIM: To assist with retinal vein occlusion (RVO) screening, artificial intelligence (AI) methods based on deep learning (DL) have been developed to alleviate the pressure experienced by ophthalmologists and discover and treat RVO as early as possible.
METHODS: A total of 8600 color fundus photographs (CFPs) were included for training, validation, and testing of disease recognition models and lesion segmentation models. Four disease recognition and four lesion segmentation models were established and compared. Finally, one disease recognition model and one lesion segmentation model were selected as superior. Additionally, 224 CFPs from 130 patients were included as an external test set to determine the abilities of the two selected models.
RESULTS: Using the Inception-v3 model for disease identification, the mean sensitivity, specificity, and F1 for the three disease types and normal CFPs were 0.93, 0.99, and 0.95, respectively, and the mean area under the curve (AUC) was 0.99. Using the DeepLab-v3 model for lesion segmentation, the mean sensitivity, specificity, and F1 for four lesion types (abnormally dilated and tortuous blood vessels, cotton-wool spots, flame-shaped hemorrhages, and hard exudates) were 0.74, 0.97, and 0.83, respectively.
CONCLUSION: DL models show good performance when recognizing RVO and identifying lesions using CFPs. Because of the increasing number of RVO patients and increasing demand for trained ophthalmologists, DL models will be helpful for diagnosing RVO early in life and reducing vision impairment.
The precise identification of micro-features on 2.25Cr1Mo0.25V steel is of great significance for understanding the mechanism of hydrogen embrittlement (HE) and evaluating the alloy’s properties of HE resistance. Presently, the convolution neural network (CNN) of deep learning is widely applied in the micro-features identification of alloy. However, with the development of the transformer in image recognition, the transformer-based neural network performs better on the learning of global and long-range semantic information than CNN and achieves higher prediction accuracy. In this work, a new transformer-based neural network model Swin–UNet++ was proposed. Specifically, the architecture of the decoder was redesigned to more precisely detect and identify the micro-feature with complex morphology (i.e., dimples) of 2.25Cr1Mo0.25V steel fracture surface. Swin–UNet++ and other segmentation models performed state-of-the-art (SOTA) were compared on the dimple dataset constructed in this work, which consists of 830 dimple scanning electron microscopy (SEM) images on 2.25Cr1Mo0.25V steel fracture surface. The segmentation results show Swin–UNet++ not only realizes the accurate identification of dimples but displays a much higher prediction accuracy and stronger robustness than Swin–Unet and UNet. Moreover, efforts from this work will also provide an important reference value to the identification of other micro-features with complex morphologies.
Breast cancer screening using Mammography serves as the earliest defense against breast cancer, revealing anomalous tissue years before it can be detected through physical screening. Despite the use of high resolution radiography, the presence of densely overlapping patterns challenges the consistency of human-driven diagnosis and drives interest in leveraging state-of-art localization ability of deep convolutional neural networks (DCNN). The growing availability of digitized clinical archives enables the training of deep segmentation models, but training using the most widely available form of coarse hand-drawn annotations works against learning the precise boundary of cancerous tissue in evaluation, while producing results that are more aligned with the annotations rather than the underlying lesions. The expense of collecting high quality pixel-level data in the field of medical science makes this even more difficult. To surmount this fundamental challenge, we propose LatentCADx, a deep learning segmentation model capable of precisely annotating cancer lesions underlying hand-drawn annotations, which we procedurally obtain using joint classification training and a strict segmentation penalty. We demonstrate the capability of LatentCADx on a publicly available dataset of 2,620 Mammogram case files, where LatentCADx obtains classification ROC of 0.97, AP of 0.87, and segmentation AP of 0.75 (IOU = 0.5), giving comparable or better performance than other models. Qualitative and precision evaluation of LatentCADx annotations on validation samples reveals that LatentCADx increases the specificity of segmentations beyond that of existing models trained on hand-drawn annotations, with pixel level specificity reaching a staggering value of 0.90. It also obtains sharp boundary around lesions unlike other methods, reducing the confused pixels in the output by more than 60%.
ObjectiveDelineating swallowing and chewing structures aids in radiotherapy (RT) treatment planning to limit dysphagia, trismus, and speech dysfunction. We aim to develop an accurate and efficient method to automate this process.ApproachCT scans of 242 head and neck (H&N) cancer patients acquired from 2004-2009 at our institution were used to develop auto-segmentation models for the masseters, medial pterygoids, larynx, and pharyngeal constrictor muscle using DeepLabV3+. A cascaded architecture was used, wherein models were trained sequentially to spatially constrain each structure group based on prior segmentations. Additionally, an ensemble of models, combining contextual information from axial, coronal, and sagittal views was used to improve segmentation accuracy. Prospective evaluation was conducted by measuring the amount of manual editing required in 91 H&N CT scans acquired February-May 2021.Main resultsMedians and inter-quartile ranges of Dice Similarity Coefficients (DSC) computed on the retrospective testing set (N=24) were 0.87 (0.85-0.89) for the masseters, 0.80 (0.79- 0.81) for the medial pterygoids, 0.81 (0.79-0.84) for the larynx, and 0.69 (0.67-0.71) for the constrictor. Auto-segmentations, when compared to inter-observer variability in 10 randomly selected scans, showed better agreement (DSC) with each observer as compared to inter-observer DSC. Prospective analysis showed most manual modifications needed for clinical use were minor, suggesting auto-contouring could increase clinical efficiency. Trained segmentation models are available for research use upon request via https://github.com/cerr/CERR/wiki/Auto-Segmentation-models.SignificanceWe developed deep learning-based auto-segmentation models for swallowing and chewing structures in CT and demonstrated its potential for use in treatment planning to limit complications post-RT. To the best of our knowledge, this is the only prospectively-validated deep learning-based model for segmenting chewing and swallowing structures in CT. Additionally, the segmentation models have been made open-source to facilitate reproducibility and multi-institutional research.