Automatic Segmentation and Visualisation of the Epirretinal Membrane in OCT Scans Using Densely Connected Convolutional Networks

Mateo Gende; Joaquim de Moura; Jorge Novo; Pablo Charlón; Marcos Ortega

doi:10.3390/engproc2021007002

Automatic Segmentation and Visualisation of the Epirretinal Membrane in OCT Scans Using Densely Connected Convolutional Networks

Engineering Proceedings ◽

10.3390/engproc2021007002 ◽

2021 ◽

Vol 7 (1) ◽

pp. 2

Author(s):

Mateo Gende ◽

Joaquim de Moura ◽

Jorge Novo ◽

Pablo Charlón ◽

Marcos Ortega

Keyword(s):

Optical Coherence Tomography ◽

Deep Learning ◽

Epiretinal Membrane ◽

State Of The Art ◽

Automatic Segmentation ◽

Jaccard Index ◽

Convolutional Network ◽

Convolutional Networks ◽

Inner Limiting Membrane ◽

Cellular Layer

The Epiretinal Membrane (ERM) is an ocular disease that appears as a fibro-cellular layer of tissue over the retina, specifically, over the Inner Limiting Membrane (ILM). It causes vision blurring and distortion, and its presence can be indicative of other ocular pathologies, such as diabetic macular edema. The ERM diagnosis is usually performed by visually inspecting Optical Coherence Tomography (OCT) images, a manual process which is tiresome and prone to subjectivity. In this work, we present a methodology for the automatic segmentation and visualisation of the ERM in OCT volumes using deep learning. By employing a Densely Connected Convolutional Network, every pixel in the ILM can be classified into either healthy or pathological. Thus, a segmentation of the region susceptible to ERM appearance can be produced. This methodology also produces an intuitive colour map representation of the ERM presence over a visualisation of the eye fundus created from the OCT volume. In a series of representative experiments conducted to evaluate this methodology, it achieved a Dice score of 0.826±0.112 and a Jaccard index of 0.714±0.155. The results that were obtained demonstrate the competitive performance of the proposed methodology when compared to other works in the state of the art.

Download Full-text

Efficient End-to-End Sentence-Level Lipreading with Temporal Convolutional Networks

Applied Sciences ◽

10.3390/app11156975 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6975

Author(s):

Tao Zhang ◽

Lun He ◽

Xudong Li ◽

Guoqing Feng

Keyword(s):

Performance Improvement ◽

State Of The Art ◽

Error Rates ◽

Convolutional Network ◽

Convolutional Networks ◽

Sentence Level ◽

End To End ◽

High Level ◽

Improved Accuracy ◽

Talking Face

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.

Download Full-text

Automatic Segmentation of Choroid Layer Using Deep Learning on Spectral Domain Optical Coherence Tomography

Applied Sciences ◽

10.3390/app11125488 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5488

Author(s):

Wei Ping Hsia ◽

Siu Lun Tse ◽

Chia Jen Chang ◽

Yu Len Huang

Keyword(s):

Optical Coherence Tomography ◽

Deep Learning ◽

Choroidal Thickness ◽

Automatic Segmentation ◽

Average Error ◽

Good Prediction ◽

Optical Coherence ◽

Learning Method ◽

Subfoveal Choroidal Thickness ◽

Fully Connected

The purpose of this article is to evaluate the accuracy of the optical coherence tomography (OCT) measurement of choroidal thickness in healthy eyes using a deep-learning method with the Mask R-CNN model. Thirty EDI-OCT of thirty patients were enrolled. A mask region-based convolutional neural network (Mask R-CNN) model composed of deep residual network (ResNet) and feature pyramid networks (FPNs) with standard convolution and fully connected heads for mask and box prediction, respectively, was used to automatically depict the choroid layer. The average choroidal thickness and subfoveal choroidal thickness were measured. The results of this study showed that ResNet 50 layers deep (R50) model and ResNet 101 layers deep (R101). R101 U R50 (OR model) demonstrated the best accuracy with an average error of 4.85 pixels and 4.86 pixels, respectively. The R101 ∩ R50 (AND model) took the least time with an average execution time of 4.6 s. Mask-RCNN models showed a good prediction rate of choroidal layer with accuracy rates of 90% and 89.9% for average choroidal thickness and average subfoveal choroidal thickness, respectively. In conclusion, the deep-learning method using the Mask-RCNN model provides a faster and accurate measurement of choroidal thickness. Comparing with manual delineation, it provides better effectiveness, which is feasible for clinical application and larger scale of research on choroid.

Download Full-text

MR-GCN: Multi-Relational Graph Convolutional Networks based on Generalized Tensor Product

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/175 ◽

2020 ◽

Author(s):

Zhichao Huang ◽

Xutao Li ◽

Yunming Ye ◽

Michael K. Ng

Keyword(s):

Tensor Product ◽

Convolution Operator ◽

State Of The Art ◽

Single Type ◽

Convolutional Network ◽

Convolutional Networks ◽

Node Classification ◽

Relational Graphs ◽

Eigen Decomposition ◽

Single Relation

Graph Convolutional Networks (GCNs) have been extensively studied in recent years. Most of existing GCN approaches are designed for the homogenous graphs with a single type of relation. However, heterogeneous graphs of multiple types of relations are also ubiquitous and there is a lack of methodologies to tackle such graphs. Some previous studies address the issue by performing conventional GCN on each single relation and then blending their results. However, as the convolutional kernels neglect the correlations across relations, the strategy is sub-optimal. In this paper, we propose the Multi-Relational Graph Convolutional Network (MR-GCN) framework by developing a novel convolution operator on multi-relational graphs. In particular, our multi-dimension convolution operator extends the graph spectral analysis into the eigen-decomposition of a Laplacian tensor. And the eigen-decomposition is formulated with a generalized tensor product, which can correspond to any unitary transform instead of limited merely to Fourier transform. We conduct comprehensive experiments on four real-world multi-relational graphs to solve the semi-supervised node classification task, and the results show the superiority of MR-GCN against the state-of-the-art competitors.

Download Full-text

MDAN-UNet: Multi-Scale and Dual Attention Enhanced Nested U-Net Architecture for Segmentation of Optical Coherence Tomography Images

Algorithms ◽

10.3390/a13030060 ◽

2020 ◽

Vol 13 (3) ◽

pp. 60 ◽

Cited By ~ 4

Author(s):

Wen Liu ◽

Yankui Sun ◽

Qingge Ji

Keyword(s):

Optical Coherence Tomography ◽

State Of The Art ◽

Imaging Technique ◽

Optical Coherence ◽

Convolutional Network ◽

Multi Scale ◽

Segmentation Methods ◽

Benchmark Datasets ◽

Resolution Imaging ◽

Layer Segmentation

Optical coherence tomography (OCT) is an optical high-resolution imaging technique for ophthalmic diagnosis. In this paper, we take advantages of multi-scale input, multi-scale side output and dual attention mechanism and present an enhanced nested U-Net architecture (MDAN-UNet), a new powerful fully convolutional network for automatic end-to-end segmentation of OCT images. We have evaluated two versions of MDAN-UNet (MDAN-UNet-16 and MDAN-UNet-32) on two publicly available benchmark datasets which are the Duke Diabetic Macular Edema (DME) dataset and the RETOUCH dataset, in comparison with other state-of-the-art segmentation methods. Our experiment demonstrates that MDAN-UNet-32 achieved the best performance, followed by MDAN-UNet-16 with smaller parameter, for multi-layer segmentation and multi-fluid segmentation respectively.

Download Full-text

Automated Video Behavior Recognition of Pigs Using Two-Stream Convolutional Networks

Sensors ◽

10.3390/s20041085 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1085

Author(s):

Kaifeng Zhang ◽

Dan Li ◽

Jiayun Huang ◽

Yifei Chen

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Optical Flow ◽

Network Models ◽

Well Being ◽

Motion Information ◽

Behavior Recognition ◽

Convolutional Network ◽

Convolutional Networks ◽

Effective Manner

The detection of pig behavior helps detect abnormal conditions such as diseases and dangerous movements in a timely and effective manner, which plays an important role in ensuring the health and well-being of pigs. Monitoring pig behavior by staff is time consuming, subjective, and impractical. Therefore, there is an urgent need to implement methods for identifying pig behavior automatically. In recent years, deep learning has been gradually applied to the study of pig behavior recognition. Existing studies judge the behavior of the pig only based on the posture of the pig in a still image frame, without considering the motion information of the behavior. However, optical flow can well reflect the motion information. Thus, this study took image frames and optical flow from videos as two-stream input objects to fully extract the temporal and spatial behavioral characteristics. Two-stream convolutional network models based on deep learning were proposed, including inflated 3D convnet (I3D) and temporal segment networks (TSN) whose feature extraction network is Residual Network (ResNet) or the Inception architecture (e.g., Inception with Batch Normalization (BN-Inception), InceptionV3, InceptionV4, or InceptionResNetV2) to achieve pig behavior recognition. A standard pig video behavior dataset that included 1000 videos of feeding, lying, walking, scratching and mounting from five kinds of different behavioral actions of pigs under natural conditions was created. The dataset was used to train and test the proposed models, and a series of comparative experiments were conducted. The experimental results showed that the TSN model whose feature extraction network was ResNet101 was able to recognize pig feeding, lying, walking, scratching, and mounting behaviors with a higher average of 98.99%, and the average recognition time of each video was 0.3163 s. The TSN model (ResNet101) is superior to the other models in solving the task of pig behavior recognition.

Download Full-text

Exploitation of deep learning in the automatic detection of cracks on paved roads

GEOMATICA ◽

10.1139/geomat-2019-0008 ◽

2019 ◽

Vol 73 (2) ◽

pp. 29-44

Author(s):

Won Mo Jung ◽

Faizaan Naveed ◽

Baoxin Hu ◽

Jianguo Wang ◽

Ningyuan Li

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Ground Truth ◽

Learning Networks ◽

Test Image ◽

Convolutional Network ◽

Image Patches ◽

Severity Levels ◽

First Time ◽

Different Levels

With the advance of deep learning networks, their applications in the assessment of pavement conditions are gaining more attention. A convolutional neural network (CNN) is the most commonly used network in image classification. In terms of pavement assessment, most existing CNNs are designed to only distinguish between cracks and non-cracks. Few networks classify cracks in different levels of severity. Information on the severity of pavement cracks is critical for pavement repair services. In this study, the state-of-the-art CNN used in the detection of pavement cracks was improved to localize the cracks and identify their distress levels based on three categories (low, medium, and high). In addition, a fully convolutional network (FCN) was, for the first time, utilized in the detection of pavement cracks. These designed architectures were validated using the data acquired on four highways in Ontario, Canada, and compared with the ground truth that was provided by the Ministry of Transportation of Ontario (MTO). The results showed that with the improved CNN, the prediction precision on a series of test image patches were 72.9%, 73.9%, and 73.1% for cracks with the severity levels of low, medium, and high, respectively. The precision for the FCN was tested on whole pavement images, resulting in 62.8%, 63.3%, and 66.4%, respectively, for cracks with the severity levels of low, medium, and high. It is worth mentioning that the ground truth contained some uncertainties, which partially contributed to the relatively low precision.

Download Full-text

A Novel Object-Based Deep Learning Framework for Semantic Segmentation of Very High-Resolution Remote Sensing Data: Comparison with Convolutional and Fully Convolutional Networks

Remote Sensing ◽

10.3390/rs11060684 ◽

2019 ◽

Vol 11 (6) ◽

pp. 684 ◽

Cited By ~ 17

Author(s):

Maria Papadomanolaki ◽

Maria Vakalopoulou ◽

Konstantinos Karantzalos

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Semantic Segmentation ◽

Novel Object ◽

Convolutional Networks ◽

Learning Framework ◽

Fully Convolutional Networks ◽

Object Based ◽

Deep Networks ◽

Very High

Deep learning architectures have received much attention in recent years demonstrating state-of-the-art performance in several segmentation, classification and other computer vision tasks. Most of these deep networks are based on either convolutional or fully convolutional architectures. In this paper, we propose a novel object-based deep-learning framework for semantic segmentation in very high-resolution satellite data. In particular, we exploit object-based priors integrated into a fully convolutional neural network by incorporating an anisotropic diffusion data preprocessing step and an additional loss term during the training process. Under this constrained framework, the goal is to enforce pixels that belong to the same object to be classified at the same semantic category. We compared thoroughly the novel object-based framework with the currently dominating convolutional and fully convolutional deep networks. In particular, numerous experiments were conducted on the publicly available ISPRS WGII/4 benchmark datasets, namely Vaihingen and Potsdam, for validation and inter-comparison based on a variety of metrics. Quantitatively, experimental results indicate that, overall, the proposed object-based framework slightly outperformed the current state-of-the-art fully convolutional networks by more than 1% in terms of overall accuracy, while intersection over union results are improved for all semantic categories. Qualitatively, man-made classes with more strict geometry such as buildings were the ones that benefit most from our method, especially along object boundaries, highlighting the great potential of the developed approach.

Download Full-text

Topology Optimization based Graph Convolutional Network

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/563 ◽

2019 ◽

Cited By ~ 2

Author(s):

Liang Yang ◽

Zesheng Kang ◽

Xiaochun Cao ◽

Di Jin ◽

Bo Yang ◽

...

Keyword(s):

Topology Optimization ◽

Network Topology ◽

State Of The Art ◽

Convolutional Network ◽

Topological Information ◽

Convolutional Networks ◽

The Past ◽

Attributed Network ◽

Fully Connected ◽

The Given

In the past few years, semi-supervised node classification in attributed network has been developed rapidly. Inspired by the success of deep learning, researchers adopt the convolutional neural network to develop the Graph Convolutional Networks (GCN), and they have achieved surprising classification accuracy by considering the topological information and employing the fully connected network (FCN). However, the given network topology may also induce a performance degradation if it is directly employed in classification, because it may possess high sparsity and certain noises. Besides, the lack of learnable filters in GCN also limits the performance. In this paper, we propose a novel Topology Optimization based Graph Convolutional Networks (TO-GCN) to fully utilize the potential information by jointly refining the network topology and learning the parameters of the FCN. According to our derivations, TO-GCN is more flexible than GCN, in which the filters are fixed and only the classifier can be updated during the learning process. Extensive experiments on real attributed networks demonstrate the superiority of the proposed TO-GCN against the state-of-the-art approaches.

Download Full-text

Transformer-Based Decoder Designs for Semantic Segmentation on Remotely Sensed Images

Remote Sensing ◽

10.3390/rs13245100 ◽

2021 ◽

Vol 13 (24) ◽

pp. 5100

Author(s):

Teerapong Panboonyuen ◽

Kulsawasd Jitkajornwanich ◽

Siam Lawawirojwong ◽

Panu Srestasathiern ◽

Peerapon Vateekul

Keyword(s):

Image Processing ◽

Deep Learning ◽

Natural Language Processing ◽

Language Processing ◽

State Of The Art ◽

Semantic Segmentation ◽

Landsat 8 ◽

Convolutional Network ◽

Image Labeling ◽

Feature Pyramid

Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.

Download Full-text

Image-Based Plant Disease Identification by Deep Learning Meta-Architectures

Plants ◽

10.3390/plants9111451 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1451

Author(s):

Muhammad Hammad Saleem ◽

Sapna Khanchi ◽

Johan Potgieter ◽

Khalid Mahmood Arif

Keyword(s):

Deep Learning ◽

Plant Disease ◽

State Of The Art ◽

Mean Average Precision ◽

Single Shot ◽

Average Precision ◽

Crop Monitoring ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Agricultural Applications

The identification of plant disease is an imperative part of crop monitoring systems. Computer vision and deep learning (DL) techniques have been proven to be state-of-the-art to address various agricultural problems. This research performed the complex tasks of localization and classification of the disease in plant leaves. In this regard, three DL meta-architectures including the Single Shot MultiBox Detector (SSD), Faster Region-based Convolutional Neural Network (RCNN), and Region-based Fully Convolutional Networks (RFCN) were applied by using the TensorFlow object detection framework. All the DL models were trained/tested on a controlled environment dataset to recognize the disease in plant species. Moreover, an improvement in the mean average precision of the best-obtained deep learning architecture was attempted through different state-of-the-art deep learning optimizers. The SSD model trained with an Adam optimizer exhibited the highest mean average precision (mAP) of 73.07%. The successful identification of 26 different types of defected and 12 types of healthy leaves in a single framework proved the novelty of the work. In the future, the proposed detection methodology can also be adopted for other agricultural applications. Moreover, the generated weights can be reused for future real-time detection of plant disease in a controlled/uncontrolled environment.

Download Full-text