Unsupervised temporal consistency improvement for microscopy video segmentation with Siamese networks

We introduce a simple mechanism by which a CNN trained to perform semantic segmentation of individual images can be re-trained - with no additional annotations - to improve its performance for segmentation of videos. We put the segmentation CNN in a Siamese setup with shared weights and train both for segmentation accuracy on annotated images and for segmentation similarity on unlabelled consecutive video frames. Our main application is live microscopy imaging of membrane-less organelles where the fluorescent groundtruth for virtual staining can only be acquired for individual frames. The method is directly applicable to other microscopy modalities, as we demonstrate by experiments on the Cell Segmentation Benchmark. Our code is available at https://github.com/kreshuklab/ learning-temporal-consistency.

Download Full-text

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6699 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10713-10720

Author(s):

Mingyu Ding ◽

Zhe Wang ◽

Bolei Zhou ◽

Jianping Shi ◽

Zhiwu Lu ◽

...

Keyword(s):

Optical Flow ◽

Video Segmentation ◽

Video Clip ◽

Semantic Segmentation ◽

Temporal Consistency ◽

Flow Estimation ◽

Optical Flow Estimation ◽

Optical Flows ◽

Benchmark Datasets ◽

Spatio Temporal

A major challenge for video semantic segmentation is the lack of labeled data. In most benchmark datasets, only one frame of a video clip is annotated, which makes most supervised methods fail to utilize information from the rest of the frames. To exploit the spatio-temporal information in videos, many previous works use pre-computed optical flows, which encode the temporal consistency to improve the video segmentation. However, the video segmentation and optical flow estimation are still considered as two separate tasks. In this paper, we propose a novel framework for joint video semantic segmentation and optical flow estimation. Semantic segmentation brings semantic information to handle occlusion for more robust optical flow estimation, while the non-occluded optical flow provides accurate pixel-level temporal correspondences to guarantee the temporal consistency of the segmentation. Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference. Extensive experiments show that the proposed model makes the video semantic segmentation and optical flow estimation benefit from each other and outperforms existing methods under the same settings in both tasks.

Download Full-text

A stacked dense denoising–segmentation network for undersampled tomograms and knowledge transfer using synthetic tomograms

Machine Vision and Applications ◽

10.1007/s00138-021-01196-4 ◽

2021 ◽

Vol 32 (3) ◽

Author(s):

Dimitrios Bellos ◽

Mark Basham ◽

Tony Pridmore ◽

Andrew P. French

Keyword(s):

Time Series ◽

Knowledge Transfer ◽

Semantic Segmentation ◽

Real World Data ◽

Transfer Scheme ◽

X Ray ◽

Segmentation Accuracy ◽

X Ray Computed ◽

Temporal Events ◽

Time Critical

AbstractOver recent years, many approaches have been proposed for the denoising or semantic segmentation of X-ray computed tomography (CT) scans. In most cases, high-quality CT reconstructions are used; however, such reconstructions are not always available. When the X-ray exposure time has to be limited, undersampled tomograms (in terms of their component projections) are attained. This low number of projections offers low-quality reconstructions that are difficult to segment. Here, we consider CT time-series (i.e. 4D data), where the limited time for capturing fast-occurring temporal events results in the time-series tomograms being necessarily undersampled. Fortunately, in these collections, it is common practice to obtain representative highly sampled tomograms before or after the time-critical portion of the experiment. In this paper, we propose an end-to-end network that can learn to denoise and segment the time-series’ undersampled CTs, by training with the earlier highly sampled representative CTs. Our single network can offer two desired outputs while only training once, with the denoised output improving the accuracy of the final segmentation. Our method is able to outperform state-of-the-art methods in the task of semantic segmentation and offer comparable results in regard to denoising. Additionally, we propose a knowledge transfer scheme using synthetic tomograms. This not only allows accurate segmentation and denoising using less real-world data, but also increases segmentation accuracy. Finally, we make our datasets, as well as the code, publicly available.

Download Full-text

Semantic segmentation of gonio-photographs via adaptive ROI localisation and uncertainty estimation

BMJ Open Ophthalmology ◽

10.1136/bmjophth-2021-000898 ◽

2021 ◽

Vol 6 (1) ◽

pp. e000898

Author(s):

Andrea Peroni ◽

Anna Paviotti ◽

Mauro Campigotto ◽

Luis Abegão Pinto ◽

Carlo Alberto Cutolo ◽

...

Keyword(s):

Region Of Interest ◽

Ground Truth ◽

Semantic Segmentation ◽

Uncertainty Estimation ◽

Depth Of Field ◽

Clinical Settings ◽

Proposed Model ◽

Validation Experiment ◽

Segmentation Accuracy ◽

Ground Truth Image

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.

Download Full-text

Weakly supervised semantic segmentation of tomographic images in the diagnosis of stroke

Journal of Physics Conference Series ◽

10.1088/1742-6596/2099/1/012021 ◽

2021 ◽

Vol 2099 (1) ◽

pp. 012021

Author(s):

A V Dobshik ◽

A A Tulupov ◽

V B Berikov

Keyword(s):

Computed Tomography ◽

Network Architecture ◽

Semantic Segmentation ◽

Training Data ◽

Neural Network Architecture ◽

Brain Images ◽

Tomographic Images ◽

Computed Tomography Images ◽

Segmentation Accuracy ◽

Weakly Supervised

Abstract This paper presents an automatic algorithm for the segmentation of areas affected by an acute stroke in the non-contrast computed tomography brain images. The proposed algorithm is designed for learning in a weakly supervised scenario when some images are labeled accurately, and some images are labeled inaccurately. Wrong labels appear as a result of inaccuracy made by a radiologist in the process of manual annotation of computed tomography images. We propose methods for solving the segmentation problem in the case of inaccurately labeled training data. We use the U-Net neural network architecture with several modifications. Experiments on real computed tomography scans show that the proposed methods increase the segmentation accuracy.

Download Full-text

Residual Invertible Spatio-Temporal Network for Video Super-Resolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015981 ◽

2019 ◽

Vol 33 ◽

pp. 5981-5988 ◽

Cited By ~ 12

Author(s):

Xiaobin Zhu ◽

Zhuangzi Li ◽

Xiao-Yu Zhang ◽

Changsheng Li ◽

Yaqi Liu ◽

...

Keyword(s):

Spatial Information ◽

Super Resolution ◽

Temporal Consistency ◽

Temporal Network ◽

Convolutional Network ◽

Feature Representations ◽

Video Frames ◽

Temporal Features ◽

Benchmark Datasets ◽

Spatio Temporal

Video super-resolution is a challenging task, which has attracted great attention in research and industry communities. In this paper, we propose a novel end-to-end architecture, called Residual Invertible Spatio-Temporal Network (RISTN) for video super-resolution. The RISTN can sufficiently exploit the spatial information from low-resolution to high-resolution, and effectively models the temporal consistency from consecutive video frames. Compared with existing recurrent convolutional network based approaches, RISTN is much deeper but more efficient. It consists of three major components: In the spatial component, a lightweight residual invertible block is designed to reduce information loss during feature transformation and provide robust feature representations. In the temporal component, a novel recurrent convolutional model with residual dense connections is proposed to construct deeper network and avoid feature degradation. In the reconstruction component, a new fusion method based on the sparse strategy is proposed to integrate the spatial and temporal features. Experiments on public benchmark datasets demonstrate that RISTN outperforms the state-ofthe-art methods.

Download Full-text

Pedestrian segmentation based on a spatio-temporally consistent graph-cut with optimal transport

IPSJ Transactions on Computer Vision and Applications ◽

10.1186/s41074-019-0062-2 ◽

2019 ◽

Vol 11 (1) ◽

Author(s):

Yang Yu ◽

Yasushi Makihara ◽

Yasushi Yagi

Keyword(s):

Optimal Transport ◽

Graph Cut ◽

Temporal Consistency ◽

Semantic Level ◽

Spatial Consistency ◽

Segmentation Accuracy ◽

Spatio Temporal ◽

Accuracy Indices ◽

Graph Cut Segmentation ◽

Data Term

AbstractWe address a method of pedestrian segmentation in a video in a spatio-temporally consistent way. For this purpose, given a bounding box sequence of each pedestrian obtained by a conventional pedestrian detector and tracker, we construct a spatio-temporal graph on a video and segment each pedestrian on the basis of a well-established graph-cut segmentation framework. More specifically, we consider three terms as an energy function for the graph-cut segmentation: (1) a data term, (2) a spatial pairwise term, and (3) a temporal pairwise term. To maintain better temporal consistency of segmentation even under relatively large motions, we introduce a transportation minimization framework that provides a temporal correspondence. Moreover, we introduce the edge-sticky superpixel to maintain the spatial consistency of object boundaries. In experiments, we demonstrate that the proposed method improves segmentation accuracy indices, such as the average and weighted intersection of union on TUD datasets and the PETS2009 dataset at both the instance level and semantic level.

Download Full-text

Semantic Segmentation of Underwater Images Based on Improved Deeplab

Journal of Marine Science and Engineering ◽

10.3390/jmse8030188 ◽

2020 ◽

Vol 8 (3) ◽

pp. 188

Author(s):

Fangfang Liu ◽

Ming Fang

Keyword(s):

Semantic Segmentation ◽

Autonomous Driving ◽

Correction Method ◽

Target Object ◽

Original Method ◽

Indoor Navigation ◽

Fine Tuning ◽

Object Boundary ◽

Current State ◽

Segmentation Accuracy

Image semantic segmentation technology has been increasingly applied in many fields, for example, autonomous driving, indoor navigation, virtual reality and augmented reality. However, underwater scenes, where there is a huge amount of marine biological resources and irreplaceable biological gene banks that need to be researched and exploited, are limited. In this paper, image semantic segmentation technology is exploited to study underwater scenes. We extend the current state-of-the-art semantic segmentation network DeepLabv3 + and employ it as the basic framework. First, the unsupervised color correction method (UCM) module is introduced to the encoder structure of the framework to improve the quality of the image. Moreover, two up-sampling layers are added to the decoder structure to retain more target features and object boundary information. The model is trained by fine-tuning and optimizing relevant parameters. Experimental results indicate that the image obtained by our method demonstrates better performance in improving the appearance of the segmented target object and avoiding its pixels from mingling with other class’s pixels, enhancing the segmentation accuracy of the target boundaries and retaining more feature information. Compared with the original method, our method improves the segmentation accuracy by 3%.

Download Full-text

Unsupervised Temporal Consistency Metric for Video Segmentation in Highly-Automated Driving

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) ◽

10.1109/cvprw50498.2020.00176 ◽

2020 ◽

Author(s):

Serin Varghese ◽

Yasin Bayzidi ◽

Andreas Bar ◽

Nikhil Kapoor ◽

Sounak Lahiri ◽

...

Keyword(s):

Video Segmentation ◽

Temporal Consistency ◽

Automated Driving ◽

Highly Automated Driving

Download Full-text

Investigating the Relevance of Graph Cut Parameter on Interactive and Automatic Cell Segmentation

Computational and Mathematical Methods in Medicine ◽

10.1155/2018/7396910 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Kazeem Oyeyemi Oyebode ◽

Shengzhi Du ◽

Barend Jacobus van Wyk ◽

Karim Djouani

Keyword(s):

Statistical Analysis ◽

Significant Role ◽

Energy Function ◽

Automatic Segmentation ◽

Graph Cut ◽

Cell Segmentation ◽

Medical Field ◽

Segmentation Strategy ◽

Segmentation Accuracy ◽

Graph Cut Segmentation

Graph cut segmentation provides a platform to analyze images through a global segmentation strategy, and as a result of this, it has gained a wider acceptability in many interactive and automatic segmentation fields of application, such as the medical field. The graph cut energy function has a parameter that is tuned to ensure that the output is neither oversegmented (shrink bias) nor undersegmented. Models have been proposed in literature towards the improvement of graph cut segmentation, in the context of interactive and automatic cell segmentation. Along this line of research, the graph cut parameter has been leveraged, while in some instances, it has been ignored. Therefore, in this work, the relevance of graph cut parameter on both interactive and automatic cell segmentation is investigated. Statistical analysis, based on F1 score, of three publicly available datasets of cells, suggests that the graph cut parameter plays a significant role in improving the segmentation accuracy of the interactive graph cut than the automatic graph cut.

Download Full-text

One For All: A Mutual Enhancement Method for Object Detection and Semantic Segmentation

Applied Sciences ◽

10.3390/app10010013 ◽

2019 ◽

Vol 10 (1) ◽

pp. 13 ◽

Cited By ~ 2

Author(s):

Shichao Zhang ◽

Zhe Zhang ◽

Libo Sun ◽

Wenhu Qin

Keyword(s):

Object Detection ◽

Data Augmentation ◽

Semantic Segmentation ◽

Detection Task ◽

Training Set ◽

Segmentation Accuracy ◽

Enhancement Method ◽

Road Segmentation ◽

Accuracy Performance

Generally, most approaches using methods such as cropping, rotating, and flipping achieve more data to train models for improving the accuracy of detection and segmentation. However, due to the difficulties of labeling such data especially semantic segmentation data, those traditional data augmentation methodologies cannot help a lot when the training set is really limited. In this paper, a model named OFA-Net (One For All Network) is proposed to combine object detection and semantic segmentation tasks. Meanwhile, using a strategy called “1-N Alternation” to train the OFA-Net model, which can make a fusion of features from detection and segmentation data. The results show that object detection data can be recruited to better the segmentation accuracy performance, and furthermore, segmentation data assist a lot to enhance the confidence of predictions for object detection. Finally, the OFA-Net model is trained without traditional data augmentation methodologies and tested on the KITTI test server. The model works well on the KITTI Road Segmentation challenge and can do a good job on the object detection task.

Download Full-text