scholarly journals On Improving the Training of Models for the Semantic Segmentation of Benthic Communities from Orthographic Imagery

2020 ◽  
Vol 12 (18) ◽  
pp. 3106
Author(s):  
Gaia Pavoni ◽  
Massimiliano Corsini ◽  
Marco Callieri ◽  
Giuseppe Fiameni ◽  
Clinton Edwards ◽  
...  

The semantic segmentation of underwater imagery is an important step in the ecological analysis of coral habitats. To date, scientists produce fine-scale area annotations manually, an exceptionally time-consuming task that could be efficiently automatized by modern CNNs. This paper extends our previous work presented at the 3DUW’19 conference, outlining the workflow for the automated annotation of imagery from the first step of dataset preparation, to the last step of prediction reassembly. In particular, we propose an ecologically inspired strategy for an efficient dataset partition, an over-sampling methodology targeted on ortho-imagery, and a score fusion strategy. We also investigate the use of different loss functions in the optimization of a Deeplab V3+ model, to mitigate the class-imbalance problem and improve prediction accuracy on coral instance boundaries. The experimental results demonstrate the effectiveness of the ecologically inspired split in improving model performance, and quantify the advantages and limitations of the proposed over-sampling strategy. The extensive comparison of the loss functions gives numerous insights on the segmentation task; the Focal Tversky, typically used in the context of medical imaging (but not in remote sensing), results in the most convenient choice. By improving the accuracy of automated ortho image processing, the results presented here promise to meet the fundamental challenge of increasing the spatial and temporal scale of coral reef research, allowing researchers greater predictive ability to better manage coral reef resilience in the context of a changing environment.

Sensors ◽  
2018 ◽  
Vol 18 (11) ◽  
pp. 3774 ◽  
Author(s):  
Xuran Pan ◽  
Lianru Gao ◽  
Bing Zhang ◽  
Fan Yang ◽  
Wenzhi Liao

Semantic segmentation of high-resolution aerial images is of great importance in certain fields, but the increasing spatial resolution brings large intra-class variance and small inter-class differences that can lead to classification ambiguities. Based on high-level contextual features, the deep convolutional neural network (DCNN) is an effective method to deal with semantic segmentation of high-resolution aerial imagery. In this work, a novel dense pyramid network (DPN) is proposed for semantic segmentation. The network starts with group convolutions to deal with multi-sensor data in channel wise to extract feature maps of each channel separately; by doing so, more information from each channel can be preserved. This process is followed by the channel shuffle operation to enhance the representation ability of the network. Then, four densely connected convolutional blocks are utilized to both extract and take full advantage of features. The pyramid pooling module combined with two convolutional layers are set to fuse multi-resolution and multi-sensor features through an effective global scenery prior manner, producing the probability graph for each class. Moreover, the median frequency balanced focal loss is proposed to replace the standard cross entropy loss in the training phase to deal with the class imbalance problem. We evaluate the dense pyramid network on the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam 2D semantic labeling dataset, and the results demonstrate that the proposed framework exhibits better performances, compared to the state of the art baseline.


2020 ◽  
Vol 15 (11) ◽  
pp. 1847-1858
Author(s):  
Mina Rezaei ◽  
Janne J. Näppi ◽  
Christoph Lippert ◽  
Christoph Meinel ◽  
Hiroyuki Yoshida

Abstract Purpose The identification of abnormalities that are relatively rare within otherwise normal anatomy is a major challenge for deep learning in the semantic segmentation of medical images. The small number of samples of the minority classes in the training data makes the learning of optimal classification challenging, while the more frequently occurring samples of the majority class hamper the generalization of the classification boundary between infrequently occurring target objects and classes. In this paper, we developed a novel generative multi-adversarial network, called Ensemble-GAN, for mitigating this class imbalance problem in the semantic segmentation of abdominal images. Method The Ensemble-GAN framework is composed of a single-generator and a multi-discriminator variant for handling the class imbalance problem to provide a better generalization than existing approaches. The ensemble model aggregates the estimates of multiple models by training from different initializations and losses from various subsets of the training data. The single generator network analyzes the input image as a condition to predict a corresponding semantic segmentation image by use of feedback from the ensemble of discriminator networks. To evaluate the framework, we trained our framework on two public datasets, with different imbalance ratios and imaging modalities: the Chaos 2019 and the LiTS 2017. Result In terms of the F1 score, the accuracies of the semantic segmentation of healthy spleen, liver, and left and right kidneys were 0.93, 0.96, 0.90 and 0.94, respectively. The overall F1 scores for simultaneous segmentation of the lesions and liver were 0.83 and 0.94, respectively. Conclusion The proposed Ensemble-GAN framework demonstrated outstanding performance in the semantic segmentation of medical images in comparison with other approaches on popular abdominal imaging benchmarks. The Ensemble-GAN has the potential to segment abdominal images more accurately than human experts.


Sensors ◽  
2020 ◽  
Vol 20 (11) ◽  
pp. 3183 ◽  
Author(s):  
Zia Khan ◽  
Norashikin Yahya ◽  
Khaled Alsaih ◽  
Syed Saad Azhar Ali ◽  
Fabrice Meriaudeau

In this paper, we present an evaluation of four encoder–decoder CNNs in the segmentation of the prostate gland in T2W magnetic resonance imaging (MRI) image. The four selected CNNs are FCN, SegNet, U-Net, and DeepLabV3+, which was originally proposed for the segmentation of road scene, biomedical, and natural images. Segmentation of prostate in T2W MRI images is an important step in the automatic diagnosis of prostate cancer to enable better lesion detection and staging of prostate cancer. Therefore, many research efforts have been conducted to improve the segmentation of the prostate gland in MRI images. The main challenges of prostate gland segmentation are blurry prostate boundary and variability in prostate anatomical structure. In this work, we investigated the performance of encoder–decoder CNNs for segmentation of prostate gland in T2W MRI. Image pre-processing techniques including image resizing, center-cropping and intensity normalization are applied to address the issues of inter-patient and inter-scanner variability as well as the issue of dominating background pixels over prostate pixels. In addition, to enrich the network with more data, to increase data variation, and to improve its accuracy, patch extraction and data augmentation are applied prior to training the networks. Furthermore, class weight balancing is used to avoid having biased networks since the number of background pixels is much higher than the prostate pixels. The class imbalance problem is solved by utilizing weighted cross-entropy loss function during the training of the CNN model. The performance of the CNNs is evaluated in terms of the Dice similarity coefficient (DSC) and our experimental results show that patch-wise DeepLabV3+ gives the best performance with DSC equal to 92.8 % . This value is the highest DSC score compared to the FCN, SegNet, and U-Net that also competed the recently published state-of-the-art method of prostate segmentation.


2020 ◽  
Vol 12 (17) ◽  
pp. 2722
Author(s):  
Yuxuan Wang ◽  
Guangming Wu ◽  
Yimin Guo ◽  
Yifei Huang ◽  
Ryosuke Shibasaki

For efficient building outline extraction, many algorithms, including unsupervised or supervised, have been proposed over the past decades. In recent years, due to the rapid development of the convolutional neural networks, especially fully convolutional networks, building extraction is treated as a semantic segmentation task that deals with the extremely biased positive pixels. The state-of-the-art methods, either through direct or indirect approaches, are mainly focused on better network design. The shifts and rotations, which are coarsely presented in manually created annotations, have long been ignored. Due to the limited number of positive samples, the misalignment will significantly reduce the correctness of pixel-to-pixel loss that might lead to a gradient explosion. To overcome this, we propose a nearest feature selector (NFS) to dynamically re-align the prediction and slightly misaligned annotations. The NFS can be seamlessly appended to existing loss functions and prevent misleading by the errors or misalignment of annotations. Experiments on a large scale aerial image dataset with centered buildings and corresponding building outlines indicate that the additional NFS brings higher performance when compared to existing naive loss functions. In the classic L1 loss, the addition of NFS gains increments of 8.8% of f1-score, 8.9% of kappa coefficient, and 9.8% of Jaccard index, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1906
Author(s):  
Jia-Zheng Jian ◽  
Tzong-Rong Ger ◽  
Han-Hua Lai ◽  
Chi-Ming Ku ◽  
Chiung-An Chen ◽  
...  

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2803
Author(s):  
Rabeea Jaffari ◽  
Manzoor Ahmed Hashmani ◽  
Constantino Carlos Reyes-Aldasoro

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Matthew D. Guay ◽  
Zeyad A. S. Emam ◽  
Adam B. Anderson ◽  
Maria A. Aronova ◽  
Irina D. Pokrovskaya ◽  
...  

AbstractBiologists who use electron microscopy (EM) images to build nanoscale 3D models of whole cells and their organelles have historically been limited to small numbers of cells and cellular features due to constraints in imaging and analysis. This has been a major factor limiting insight into the complex variability of cellular environments. Modern EM can produce gigavoxel image volumes containing large numbers of cells, but accurate manual segmentation of image features is slow and limits the creation of cell models. Segmentation algorithms based on convolutional neural networks can process large volumes quickly, but achieving EM task accuracy goals often challenges current techniques. Here, we define dense cellular segmentation as a multiclass semantic segmentation task for modeling cells and large numbers of their organelles, and give an example in human blood platelets. We present an algorithm using novel hybrid 2D–3D segmentation networks to produce dense cellular segmentations with accuracy levels that outperform baseline methods and approach those of human annotators. To our knowledge, this work represents the first published approach to automating the creation of cell models with this level of structural detail.


Author(s):  
Sayan Surya Shaw ◽  
Shameem Ahmed ◽  
Samir Malakar ◽  
Laura Garcia-Hernandez ◽  
Ajith Abraham ◽  
...  

AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance.


Sign in / Sign up

Export Citation Format

Share Document