DCNN-based Ship Classification using Enhanced Edge Information and Inception Module

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2022.66.3.030501 ◽

2021 ◽

Author(s):

Bo Wang ◽

Xiaoting Yu ◽

Chengeng Huang ◽

Qinghong Sheng ◽

Yuanyuan Wang ◽

...

Keyword(s):

Neural Networks ◽

Classification Performance ◽

Image Features ◽

Deep Convolutional Neural Networks ◽

Edge Information ◽

Average Accuracy ◽

Ship Classification ◽

Edge Features ◽

High Level ◽

Better Than

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.

Download Full-text

Automatic Detection and Classification of Focal Liver Lesions Based on Deep Convolutional Neural Networks: A Preliminary Study

Frontiers in Oncology ◽

10.3389/fonc.2020.581210 ◽

2021 ◽

Vol 10 ◽

Author(s):

Jiarong Zhou ◽

Wenzhe Wang ◽

Biwen Lei ◽

Wenhao Ge ◽

Yu Huang ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Automatic Detection ◽

Classification Performance ◽

Focal Liver Lesions ◽

Liver Lesions ◽

Deep Convolutional Neural Networks ◽

Average Accuracy ◽

Preliminary Study

With the increasing daily workload of physicians, computer-aided diagnosis (CAD) systems based on deep learning play an increasingly important role in pattern recognition of diagnostic medical images. In this paper, we propose a framework based on hierarchical convolutional neural networks (CNNs) for automatic detection and classification of focal liver lesions (FLLs) in multi-phasic computed tomography (CT). A total of 616 nodules, composed of three types of malignant lesions (hepatocellular carcinoma, intrahepatic cholangiocarcinoma, and metastasis) and benign lesions (hemangioma, focal nodular hyperplasia, and cyst), were randomly divided into training and test sets at an approximate ratio of 3:1. To evaluate the performance of our model, other commonly adopted CNN models and two physicians were included for comparison. Our model achieved the best results to detect FLLs, with an average test precision of 82.8%, recall of 93.4%, and F1-score of 87.8%. Our model initially classified FLLs into malignant and benign and then classified them into more detailed classes. For the binary and six-class classification, our model achieved average accuracy results of 82.5 and73.4%, respectively, which were better than the other three classification neural networks. Interestingly, the classification performance of the model was placed between a junior physician and a senior physician. Overall, this preliminary study demonstrates that our proposed multi-modality and multi-scale CNN structure can locate and classify FLLs accurately in a limited dataset, and would help inexperienced physicians to reach a diagnosis in clinical practice.

Download Full-text

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

10.1101/840256 ◽

2019 ◽

Author(s):

Marek A. Pedziwiatr ◽

Matthias Kümmerer ◽

Thomas S.A. Wallis ◽

Matthias Bethge ◽

Christoph Teufel

Keyword(s):

Neural Network ◽

Neural Networks ◽

Eye Movements ◽

Convolutional Neural Networks ◽

Deep Neural Network ◽

Image Features ◽

Human Vision ◽

Deep Convolutional Neural Networks ◽

High Level

AbstractEye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic importance across an image, have recently been proposed to support the hypothesis that meaning rather than image features guide human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.

Download Full-text

Attention-based deep learning networks for identification of human gait using radar micro-Doppler spectrograms

International Journal of Microwave and Wireless Technologies ◽

10.1017/s1759078721000830 ◽

2021 ◽

pp. 1-6

Author(s):

Hannah Garcia Doherty ◽

Roberto Arnaiz Burgueño ◽

Roeland P. Trommel ◽

Vasileios Papanastasiou ◽

Ronny I. A. Harmanny

Keyword(s):

Neural Networks ◽

Feature Vector ◽

Classification Performance ◽

Input Image ◽

Human Gait ◽

Learning Networks ◽

Class Label ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Feature Dimension

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.

Download Full-text

Depth in convolutional neural networks solves scene segmentation

10.1101/2019.12.16.877753 ◽

2019 ◽

Cited By ~ 1

Author(s):

N Seijdel ◽

N Tsakmakidis ◽

EHF De Haan ◽

SM Bohte ◽

HS Scholte

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Visual Processing ◽

Human Performance ◽

Object Identification ◽

Image Features ◽

Background Information ◽

Natural Scenes ◽

Scene Segmentation ◽

Deep Convolutional Neural Networks

AbstractFeedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations (‘routines’) that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or “binding” features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.

Download Full-text

Semantic representation for visual reasoning

MATEC Web of Conferences ◽

10.1051/matecconf/201927702006 ◽

2019 ◽

Vol 277 ◽

pp. 02006 ◽

Cited By ~ 2

Author(s):

Xubin Ni ◽

Lirong Yin ◽

Xiaobing Chen ◽

Shan Liu ◽

Bo Yang ◽

...

Keyword(s):

Neural Networks ◽

Semantic Representation ◽

Image Features ◽

Gram Matrix ◽

Human Reasoning ◽

Style Transfer ◽

Related Information ◽

Abstract Description ◽

High Level ◽

Regular Networks

In the field of visual reasoning, image features are widely used as the input of neural networks to get answers. However, image features are too redundant to learn accurate characterizations for regular networks. While in human reasoning, abstract description is usually constructed to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced in this paper to make visual reasoning more efficient. The idea of the Gram matrix used in the neural style transfer research is transferred here to build a relation matrix which enables the related information between objects to be better represented. The model using semantic representation as input outperforms the same model using image features as input which verifies that more accurate results can be obtained through the introduction of high-level semantic representation in the field of visual reasoning.

Download Full-text

Image Shadow Removal Using End-to-End Deep Convolutional Neural Networks

Applied Sciences ◽

10.3390/app9051009 ◽

2019 ◽

Vol 9 (5) ◽

pp. 1009 ◽

Cited By ~ 2

Author(s):

Hui Fan ◽

Meng Han ◽

Jinjiang Li

Keyword(s):

Signal To Noise Ratio ◽

Network Models ◽

Structural Similarity ◽

Shadow Removal ◽

Deep Convolutional Neural Networks ◽

Edge Information ◽

Qualitative And Quantitative ◽

Automatic Methods ◽

End To End ◽

High Level

Image degradation caused by shadows is likely to cause technological issues in image segmentation and target recognition. In view of the existing shadow removal methods, there are problems such as small and trivial shadow processing, the scarcity of end-to-end automatic methods, the neglecting of light, and high-level semantic information such as materials. An end-to-end deep convolutional neural network is proposed to further improve the image shadow removal effect. The network mainly consists of two network models, an encoder–decoder network and a small refinement network. The former predicts the alpha shadow scale factor, and the latter refines to obtain sharper edge information. In addition, a new image database (remove shadow database, RSDB) is constructed; and qualitative and quantitative evaluations are made on databases such as UIUC, UCF and newly-created databases (RSDB) with various real images. Using the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) for quantitative analysis, the algorithm has a big improvement on the PSNR and the SSIM as opposed to other methods. In terms of qualitative comparisons, the network shadow has a clearer and shadow-free image that is consistent with the original image color and texture, and the detail processing effect is much better. The experimental results show that the proposed algorithm is superior to other algorithms, and it is more robust in subjective vision and objective quantization.

Download Full-text

Classifying stable and unstable videos with deep convolutional networks

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.202012125 ◽

2020 ◽

pp. 90-92

Author(s):

Mehmet Sarigul ◽

Levent Karacan

Keyword(s):

Neural Networks ◽

Classification Performance ◽

Video Stabilization ◽

Learning Networks ◽

Stabilization Method ◽

Deep Convolutional Neural Networks ◽

Convolutional Networks ◽

Number Of Layers ◽

The Stability

Since the invention of cameras, video shooting has become a passion for human. However, the quality of videos recorded with devices such as handheld cameras, head cameras, and vehicle cameras may be low due to shaking, jittering and unwanted periodic movements. Although the issue of video stabilization has been studied for decades, there is no consensus on how to measure the performance of a video stabilization method. In many studies in the literature, different metrics have been used for comparison of different methods. In this study, deep convolutional neural networks are used as a decision maker for video stabilization. VGG networks with different number of layers are used to determine the stability status of the videos. It was observed that VGG networks showed a classification performance up to 96.537% using only two consecutive scenes. These results show that deep learning networks can be utilized as a metric for video stabilization.

Download Full-text

Bucket of Deep Transfer Learning Features and Classification Models for Melanoma Detection

Journal of Imaging ◽

10.3390/jimaging6120129 ◽

2020 ◽

Vol 6 (12) ◽

pp. 129

Author(s):

Mario Manzo ◽

Simone Pellino

Keyword(s):

Neural Networks ◽

Skin Lesion ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Image Features ◽

Ensemble Classification ◽

Deep Convolutional Neural Networks ◽

Melanoma Detection ◽

Statistical Measures ◽

Different Types

Malignant melanoma is the deadliest form of skin cancer and, in recent years, is rapidly growing in terms of the incidence worldwide rate. The most effective approach to targeted treatment is early diagnosis. Deep learning algorithms, specifically convolutional neural networks, represent a methodology for the image analysis and representation. They optimize the features design task, essential for an automatic approach on different types of images, including medical. In this paper, we adopted pretrained deep convolutional neural networks architectures for the image representation with purpose to predict skin lesion melanoma. Firstly, we applied a transfer learning approach to extract image features. Secondly, we adopted the transferred learning features inside an ensemble classification context. Specifically, the framework trains individual classifiers on balanced subspaces and combines the provided predictions through statistical measures. Experimental phase on datasets of skin lesion images is performed and results obtained show the effectiveness of the proposed approach with respect to state-of-the-art competitors.

Download Full-text

Deep Convolutional Neural Networks for Hyperspectral Image Classification

Journal of Sensors ◽

10.1155/2015/258619 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 419

Author(s):

Wei Hu ◽

Yangyu Huang ◽

Li Wei ◽

Fan Zhang ◽

Hengchao Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Hyperspectral Image ◽

Image Data ◽

Classification Performance ◽

Spectral Signature ◽

Support Vector ◽

Data Sets ◽

Deep Convolutional Neural Networks ◽

Visual Tasks

Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify hyperspectral images directly in spectral domain. More specifically, the architecture of the proposed classifier contains five layers with weights which are the input layer, the convolutional layer, the max pooling layer, the full connection layer, and the output layer. These five layers are implemented on each spectral signature to discriminate against others. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than some traditional methods, such as support vector machines and the conventional deep learning-based methods.

Download Full-text

Combining Models from Multiple Sources for RGB-D Scene Recognition

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/631 ◽

2017 ◽

Cited By ~ 10

Author(s):

Xinhang Song ◽

Shuqiang Jiang ◽

Luis Herranz

Keyword(s):

Neural Networks ◽

Scene Recognition ◽

Combination Method ◽

Multiple Sources ◽

Deep Convolutional Neural Networks ◽

Low Level ◽

Depth Data ◽

Source Models ◽

High Level ◽

Limited Depth

Depth can complement RGB with useful cues about object volumes and scene layout. However, RGB-D image datasets are still too small for directly training deep convolutional neural networks (CNNs), in contrast to the massive monomodal RGB datasets. Previous works in RGB-D recognition typically combine two separate networks for RGB and depth data, pretrained with a large RGB dataset and then fine tuned to the respective target RGB and depth datasets. These approaches have several limitations: 1) only use low-level filters learned from RGB data, thus not being able to exploit properly depth-specific patterns, and 2) RGB and depth features are only combined at high-levels but rarely at lower-levels. In this paper, we propose a framework that leverages both knowledge acquired from large RGB datasets together with depth-specific cues learned from the limited depth data, obtaining more effective multi-source and multi-modal representations. We propose a multi-modal combination method that selects discriminative combinations of layers from the different source models and target modalities, capturing both high-level properties of the task and intrinsic low-level properties of both modalities.

Download Full-text