ENCODER-DECODER ATTENTION NETWORK (EDANET) FOR POLYP SEGMENTATION IN COLONOSCOPY IMAGES

2021 ◽  
pp. 8-10
Author(s):  
Madhura Prakash M ◽  
Krishnamurthy G. N

Colorectal cancer (CRC) is one of the most common malignancies that can develop from high-risk colon polyps. Colonoscopy is a standard for examination and detection of colorectal polyps.[1] Segmentation and distinction of polyps can play a vital role in treatment (e.g., surgical planning) and predictive decision making. This paper proposes a neural network architecture called EDANet, using attention gates to effectively combine multi-level features to yield accurate polyp segmentation. The Encoder is a fully connected Convolution Neural Network (CNN) and the decoder part is a Cascaded Partial Decoder. Encoder and Decoder sub-networks are connected through a series of nested, dense skip pathways. The skip pathways aim at reducing the semantic gap between the feature maps of the Encoder and Decoder sub-networks. The proposed system trains the model on several epochs and it unies the previous epoch mask with the feature map of the current training epoch. The previous epoch mask is then used to provide a hard attention to the learnt feature maps at different convolutional layers. Experimental results demonstrate that the model trained and tested on the Kvasir-SEG dataset achieves a dice coefcient of 0.7874, mean Intersection over Union (mIoU) of 0.7010, recall of 0.7987, and a precision of 0.8577.

2019 ◽  
Vol 53 (1) ◽  
pp. 2-19 ◽  
Author(s):  
Erion Çano ◽  
Maurizio Morisio

Purpose The fabulous results of convolution neural networks in image-related tasks attracted attention of text mining, sentiment analysis and other text analysis researchers. It is, however, difficult to find enough data for feeding such networks, optimize their parameters, and make the right design choices when constructing network architectures. The purpose of this paper is to present the creation steps of two big data sets of song emotions. The authors also explore usage of convolution and max-pooling neural layers on song lyrics, product and movie review text data sets. Three variants of a simple and flexible neural network architecture are also compared. Design/methodology/approach The intention was to spot any important patterns that can serve as guidelines for parameter optimization of similar models. The authors also wanted to identify architecture design choices which lead to high performing sentiment analysis models. To this end, the authors conducted a series of experiments with neural architectures of various configurations. Findings The results indicate that parallel convolutions of filter lengths up to 3 are usually enough for capturing relevant text features. Also, max-pooling region size should be adapted to the length of text documents for producing the best feature maps. Originality/value Top results the authors got are obtained with feature maps of lengths 6–18. An improvement on future neural network models for sentiment analysis could be generating sentiment polarity prediction of documents using aggregation of predictions on smaller excerpt of the entire text.


Author(s):  
P. Bodani ◽  
K. Shreshtha ◽  
S. Sharma

<p><strong>Abstract.</strong> This paper addresses the task of semantic segmentation of orthoimagery using multimodal data e.g. optical RGB, infrared and digital surface model. We propose a deep convolutional neural network architecture termed OrthoSeg for semantic segmentation using multimodal, orthorectified and coregistered data. We also propose a training procedure for supervised training of OrthoSeg. The training procedure complements the inherent architectural characteristics of OrthoSeg for preventing complex co-adaptations of learned features, which may arise due to probable high dimensionality and spatial correlation in multimodal and/or multispectral coregistered data. OrthoSeg consists of parallel encoding networks for independent encoding of multimodal feature maps and a decoder designed for efficiently fusing independently encoded multimodal feature maps. A softmax layer at the end of the network uses the features generated by the decoder for pixel-wise classification. The decoder fuses feature maps from the parallel encoders locally as well as contextually at multiple scales to generate per-pixel feature maps for final pixel-wise classification resulting in segmented output. We experimentally show the merits of OrthoSeg by demonstrating state-of-the-art accuracy on the ISPRS Potsdam 2D Semantic Segmentation dataset. Adaptability is one of the key motivations behind OrthoSeg so that it serves as a useful architectural option for a wide range of problems involving the task of semantic segmentation of coregistered multimodal and/or multispectral imagery. Hence, OrthoSeg is designed to enable independent scaling of parallel encoder networks and decoder network to better match application requirements, such as the number of input channels, the effective field-of-view, and model capacity.</p>


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4403
Author(s):  
Umme Hafsa Billah ◽  
Hung Manh La ◽  
Alireza Tavakkoli

An autonomous concrete crack inspection system is necessary for preventing hazardous incidents arising from deteriorated concrete surfaces. In this paper, we present a concrete crack detection framework to aid the process of automated inspection. The proposed approach employs a deep convolutional neural network architecture for crack segmentation, while addressing the effect of gradient vanishing problem. A feature silencing module is incorporated in the proposed framework, capable of eliminating non-discriminative feature maps from the network to improve performance. Experimental results support the benefit of incorporating feature silencing within a convolutional neural network architecture for improving the network’s robustness, sensitivity, and specificity. An added benefit of the proposed architecture is its ability to accommodate for the trade-off between specificity (positive class detection accuracy) and sensitivity (negative class detection accuracy) with respect to the target application. Furthermore, the proposed framework achieves a high precision rate and processing time than the state-of-the-art crack detection architectures.


IoT ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 222-235
Author(s):  
Guillaume Coiffier ◽  
Ghouthi Boukli Hacene ◽  
Vincent Gripon

Deep Neural Networks are state-of-the-art in a large number of challenges in machine learning. However, to reach the best performance they require a huge pool of parameters. Indeed, typical deep convolutional architectures present an increasing number of feature maps as we go deeper in the network, whereas spatial resolution of inputs is decreased through downsampling operations. This means that most of the parameters lay in the final layers, while a large portion of the computations are performed by a small fraction of the total parameters in the first layers. In an effort to use every parameter of a network at its maximum, we propose a new convolutional neural network architecture, called ThriftyNet. In ThriftyNet, only one convolutional layer is defined and used recursively, leading to a maximal parameter factorization. In complement, normalization, non-linearities, downsamplings and shortcut ensure sufficient expressivity of the model. ThriftyNet achieves competitive performance on a tiny parameters budget, exceeding 91% accuracy on CIFAR-10 with less than 40 k parameters in total, 74.3% on CIFAR-100 with less than 600 k parameters, and 67.1% On ImageNet ILSVRC 2012 with no more than 4.15 M parameters. However, the proposed method typically requires more computations than existing counterparts.


Metals ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 549
Author(s):  
Ihor Konovalenko ◽  
Pavlo Maruschak ◽  
Vitaly Brevus ◽  
Olegas Prentkovskis

Classification of steel surface defects in steel industry is essential for their detection and also fundamental for the analysis of causes that lead to damages. Timely detection of defects allows to reduce the frequency of their appearance in the final product. This paper considers the classifiers for the recognition of scratches, scrapes and abrasions on metal surfaces. Classifiers are based on the ResNet50 and ResNet152 deep residual neural network architecture. The proposed technique supports the recognition of defects in images and does this with high accuracy. The binary accuracy of the classification based on the test data is 97.14%. The influence of a number of training conditions on the accuracy metrics of the model have been studied. The augmentation conditions have been figured out to make the greatest contribution to improving the accuracy during training. The peculiarities of damages that cause difficulties in their recognition have been studied. The fields of neuron activation have been investigated in the convolutional layers of the model. Feature maps which developed in this case have been found to correspond to the location of the objects of interest. Erroneous cases of the classifier application have been considered. The peculiarities of damages that cause difficulties in their recognition have been studied.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 342
Author(s):  
Fabio Martinelli ◽  
Fiammetta Marulli ◽  
Francesco Mercaldo ◽  
Antonella Santone

The proliferation of info-entertainment systems in nowadays vehicles has provided a really cheap and easy-to-deploy platform with the ability to gather information about the vehicle under analysis. With the purpose to provide an architecture to increase safety and security in automotive context, in this paper we propose a fully connected neural network architecture considering position-based features aimed to detect in real-time: (i) the driver, (ii) the driving style and (iii) the path. The experimental analysis performed on real-world data shows that the proposed method obtains encouraging results.


2019 ◽  
Vol 11 (5) ◽  
pp. 494 ◽  
Author(s):  
Wei Zhang ◽  
Ping Tang ◽  
Lijun Zhao

Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.


Author(s):  
Umme Billah ◽  
Hung La ◽  
Alireza Tavakkoli

An autonomous concrete crack inspection system is necessary for preventing hazardous incidents arising from deteriorated concrete surfaces. In this paper, we represent a concrete crack detection framework to aid the process of automated inspection. The proposed approach employs a deep convolutional neural network architecture for crack segmentation from concrete image. The proposed network alleviates the effect of gradient vanishing problem present in deep neural network architectures. A feature silencing module is incorporated in the crack detection framework, for eliminating unnecessary feature maps from the network. The overall performance of the network significantly improves as a result. Experimental results support the benefit of incorporating feature silencing within a convolutional neural network architecture for improving the network’s robustness, sensitivity, and specificity. An added benefit of the proposed architecture is its ability to accommodate for the trade-off between specificity (positive class detection accuracy) and sensitivity (negative class detection accuracy) with respect to the target application. Furthermore, the proposed framework achieves a high precision rate and processing time than crack detection architectures present in literature.


Sign in / Sign up

Export Citation Format

Share Document