Compression is a standard procedure for making convolutional neural networks (CNNs) adhere to some specific computing resource constraints. However, searching for a compressed architecture typically involves a series of time-consuming training/validation experiments to determine a good compromise between network size and performance accuracy. To address this, we propose an image complexity-guided network compression technique for biomedical image segmentation. Given any resource constraints, our framework utilizes data complexity and network architecture to quickly estimate a compressed model which does not require network training. Specifically, we
the dataset complexity to the target network accuracy degradation caused by compression. Such mapping enables us to predict the final accuracy for different network sizes, based on the computed dataset complexity. Thus, one may choose a solution that meets both the network size and segmentation accuracy requirements. Finally, the mapping is used to determine the convolutional layer-wise multiplicative factor for generating a compressed network. We conduct experiments using 5 datasets, employing 3 commonly-used CNN architectures for biomedical image segmentation as representative networks. Our proposed framework is shown to be effective for generating compressed segmentation networks, retaining up to ≈95% of the full-sized network segmentation accuracy, and at the same time, utilizing ≈32x fewer network trainable weights (average reduction) of the full-sized networks.
Segmentation and reconstruction of arteries is important for a variety of medical and engineering fields, such as surgical planning and physiological modeling. However, manual methods can be laborious and subject to a high degree of human variability. In this work, we developed various
convolutional neural network
) architectures to segment Stanford
type B aortic dissections
), characterized by a tear in the descending aortic wall creating a normal channel of blood flow called a true lumen and a pathologic channel within the wall called a false lumen. We introduced several variations to the
) U-Net, where small stacks of slices were inputted into the networks instead of individual slices or whole geometries. We compared these variations with a variety of CNN segmentation architectures and found that stacking the input data slices in the upward direction with 2D U-Net improved segmentation accuracy, as measured by the
Dice similarity coefficient
) and point-by-point
), by more than
. Our optimal architecture produced DC scores of 0.94, 0.88, and 0.90 and AVD values of 0.074, 0.22, and 0.11 in the whole aorta, true lumen, and false lumen, respectively. Altogether, the predicted reconstructions closely matched manual reconstructions.
Cerebral haemorrhage is a serious subtype of stroke, with most patients experiencing short-term haematoma enlargement leading to worsening neurological symptoms and death. The main hemostatic agents currently used for cerebral haemorrhage are antifibrinolytics and recombinant coagulation factor VIIa. However, there is no clinical evidence that patients with cerebral haemorrhage can benefit from hemostatic treatment. We provide an overview of the mechanisms of haematoma expansion in cerebral haemorrhage and the progress of research on commonly used hemostatic drugs. To improve the semantic segmentation accuracy of cerebral haemorrhage, a segmentation method based on RGB-D images is proposed. Firstly, the parallax map was obtained based on a semiglobal stereo matching algorithm and fused with RGB images to form a four-channel RGB-D image to build a sample library. Secondly, the networks were trained with 2 different learning rate adjustment strategies for 2 different structures of convolutional neural networks. Finally, the trained networks were tested and compared for analysis. The 146 head CT images from the Chinese intracranial haemorrhage image database were divided into a training set and a test set using the random number table method. The validation set was divided into four methods: manual segmentation, algorithmic segmentation, the exact Tada formula, and the traditional Tada formula to measure the haematoma volume. The manual segmentation was used as the “gold standard,” and the other three algorithms were tested for consistency. The results showed that the algorithmic segmentation had the lowest percentage error of 15.54 (8.41, 23.18) % compared to the Tada formula method.
Liver segmentation is instrumental for decision making in the medical realm for the diagnosis and treatment planning of hepatic diseases. However, the manual segmentation of the hundreds of CT images is tedious for medical experts. Thus, it hampers the segmentation accuracy and is reliant on opinion of the operator. This chapter presents the deep learning-based modified multi-scale UNet++ (M2UNet++) approach for automatic liver segmentation. The multi-scale features were modified channel-wise using adaptive feature recalibration to improve the representation of the high-level semantic information of the skip pathways and improved the segmentation performance with fewer computational overheads. The experimental results proved the model's efficacy on the publicly available 3DIRCADb dataset, which offers significant complexity and variations. The model's dice coefficient value is 97.28% that is 7.64%, and 2.24% improved from the UNet and UNet++ model. The quantitative result analysis shows that the M2UNet++ model outperforms the state-of-the-art methods proposed for liver segmentation.
Printed fabric patterns contain multiple repeat pattern primitives, which have a significant impact on fabric pattern design in the textile industry. The pattern primitive is often composed of multiple elements, such as color, form, and texture structure. Therefore, the more pattern elements it contains, the more complex the primitive is. In order to segment fabric primitives, this paper proposes a novel convolutional neural network (CNN) method with spatial pyramid pooling module as a feature extractor, which enables to learn the pattern feature information and determine whether the printed fabric has periodic pattern primitives. Furthermore, by choosing pair of activation peaks in a filter, a set of displacement vectors can be calculated. The activation peaks that are most accordant with the optimum displacement vector contribute to pick out the final size of primitives. The results show that the method with the powerful feature extraction capabilities of the CNN can segment the periodic pattern primitives of complex printed fabrics. Compared with the traditional algorithm, the proposed method has higher segmentation accuracy and adaptability.
The automated measurement of crop phenotypic parameters is of great significance to the quantitative study of crop growth. The segmentation and classification of crop point cloud help to realize the automation of crop phenotypic parameter measurement. At present, crop spike-shaped point cloud segmentation has problems such as fewer samples, uneven distribution of point clouds, occlusion of stem and spike, disorderly arrangement of point clouds, and lack of targeted network models. The traditional clustering method can realize the segmentation of the plant organ point cloud with relatively independent spatial location, but the accuracy is not acceptable. This paper first builds a desktop-level point cloud scanning apparatus based on a structured-light projection module to facilitate the point cloud acquisition process. Then, the rice ear point cloud was collected, and the rice ear point cloud data set was made. In addition, data argumentation is used to improve sample utilization efficiency and training accuracy. Finally, a 3D point cloud convolutional neural network model called Panicle-3D was designed to achieve better segmentation accuracy. Specifically, the design of Panicle-3D is aimed at the multiscale characteristics of plant organs, combined with the structure of PointConv and long and short jumps, which accelerates the convergence speed of the network and reduces the loss of features in the process of point cloud downsampling. After comparison experiments, the segmentation accuracy of Panicle-3D reaches 93.4%, which is higher than PointNet. Panicle-3D is suitable for other similar crop point cloud segmentation tasks.
X-ray computed tomography (X-CT) plays an important role in non-destructive quality inspection and process evaluation in metal additive manufacturing, as several types of defects such as keyhole and lack of fusion pores can be observed in these 3D images as local changes in material density. Segmentation of these defects often relies on threshold methods applied to the reconstructed attenuation values of the 3D image voxels. However, the segmentation accuracy is aﬀected by unavoidable X-CT reconstruction features such as partial volume eﬀects, voxel noise and imaging artefacts. These eﬀects create false positives, diﬃculties in threshold value selection and unclear or jagged defect edges. In this paper, we present a new X-CT defect segmentation method based on preprocessing the X-CT image with a 3D total variation denoising method. By comparing the changes in the histogram, threshold selection can be signiﬁcantly better, and the resulting segmentation is of much higher quality. We derive the optimal algorithm parameter settings and demonstrate robustness for deviating settings. The technique is presented on simulated data sets, compared between low- and high-quality X-CT scans, and evaluated with optical microscopy after destructive tests.
Video compact representation aims to obtain a representation that could reflect the kernel mode of video content and concisely describe the video. As most information in complex videos is either noisy or redundant, some researchers have instead focused on long-term video semantics. Recent video compact representation methods heavily rely on the segmentation accuracy of video semantics. In this paper, we propose a novel framework to address these challenges. Specifically, we designed a novel continuous video semantic embedding model to learn the actual distribution of video words. First, an embedding model based on the continuous bag of words method is proposed to learn the video embeddings, integrated with a well-designed discriminative negative sampling approach, which helps emphasize the convincing clips in the embedding while weakening the influence of the confusing ones. Second, an aggregated distribution pooling method is proposed to capture the semantic distribution of kernel modes in videos. Finally, our well-trained model can generate compact video representations by direct inference, which provides our model with a better generalization ability compared with those of previous methods. We performed extensive experiments on event detection and the mining of representative event parts. Experiments on TRECVID MED11 and CCV datasets demonstrated the effectiveness of our method. Our method could capture the semantic distribution of kernel modes in videos and shows powerful potential to discover and better describe complex video patterns.
Hypoplastic left heart syndrome (HLHS) is a severe congenital heart defect in which the right ventricle and associated tricuspid valve (TV) alone support the circulation. TV failure is thus associated with heart failure, and the outcome of TV valve repair are currently poor. 3D echocardiography (3DE) can generate high-quality images of the valve, but segmentation is necessary for precise modeling and quantification. There is currently no robust methodology for rapid TV segmentation, limiting the clinical application of these technologies to this challenging population. We utilized a Fully Convolutional Network (FCN) to segment tricuspid valves from transthoracic 3DE. We trained on 133 3DE image-segmentation pairs and validated on 28 images. We then assessed the effect of varying inputs to the FCN using Mean Boundary Distance (MBD) and Dice Similarity Coefficient (DSC). The FCN with the input of an annular curve achieved a median DSC of 0.86 [IQR: 0.81–0.88] and MBD of 0.35 [0.23–0.4] mm for the merged segmentation and an average DSC of 0.77 [0.73–0.81] and MBD of 0.6 [0.44–0.74] mm for individual TV leaflet segmentation. The addition of commissural landmarks improved individual leaflet segmentation accuracy to an MBD of 0.38 [0.3–0.46] mm. FCN-based segmentation of the tricuspid valve from transthoracic 3DE is feasible and accurate. The addition of an annular curve and commissural landmarks improved the quality of the segmentations with MBD and DSC within the range of human inter-user variability. Fast and accurate FCN-based segmentation of the tricuspid valve in HLHS may enable rapid modeling and quantification, which in the future may inform surgical planning. We are now working to deploy this network for public use.
ObjectiveDelineating swallowing and chewing structures aids in radiotherapy (RT) treatment planning to limit dysphagia, trismus, and speech dysfunction. We aim to develop an accurate and efficient method to automate this process.ApproachCT scans of 242 head and neck (H&N) cancer patients acquired from 2004-2009 at our institution were used to develop auto-segmentation models for the masseters, medial pterygoids, larynx, and pharyngeal constrictor muscle using DeepLabV3+. A cascaded architecture was used, wherein models were trained sequentially to spatially constrain each structure group based on prior segmentations. Additionally, an ensemble of models, combining contextual information from axial, coronal, and sagittal views was used to improve segmentation accuracy. Prospective evaluation was conducted by measuring the amount of manual editing required in 91 H&N CT scans acquired February-May 2021.Main resultsMedians and inter-quartile ranges of Dice Similarity Coefficients (DSC) computed on the retrospective testing set (N=24) were 0.87 (0.85-0.89) for the masseters, 0.80 (0.79- 0.81) for the medial pterygoids, 0.81 (0.79-0.84) for the larynx, and 0.69 (0.67-0.71) for the constrictor. Auto-segmentations, when compared to inter-observer variability in 10 randomly selected scans, showed better agreement (DSC) with each observer as compared to inter-observer DSC. Prospective analysis showed most manual modifications needed for clinical use were minor, suggesting auto-contouring could increase clinical efficiency. Trained segmentation models are available for research use upon request via https://github.com/cerr/CERR/wiki/Auto-Segmentation-models.SignificanceWe developed deep learning-based auto-segmentation models for swallowing and chewing structures in CT and demonstrated its potential for use in treatment planning to limit complications post-RT. To the best of our knowledge, this is the only prospectively-validated deep learning-based model for segmenting chewing and swallowing structures in CT. Additionally, the segmentation models have been made open-source to facilitate reproducibility and multi-institutional research.