scholarly journals From photos to sketches - how humans and deep neural networks process objects across different levels of visual abstraction

2021 ◽  
Author(s):  
Johannes Janek Daniel Singer ◽  
Katja Seeliger ◽  
Tim Christian Kietzmann ◽  
Martin N Hebart

Line drawings convey meaning with just a few strokes. Despite strong simplifications, humans can recognize objects depicted in such abstracted images without effort. To what degree do deep convolutional neural networks (CNNs) mirror this human ability to generalize to abstracted object images? While CNNs trained on natural images have been shown to exhibit poor classification performance on drawings, other work has demonstrated highly similar latent representations in the networks for abstracted and natural images. Here, we address these seemingly conflicting findings by analyzing the activation patterns of a CNN trained on natural images across a set of photos, drawings and sketches of the same objects and comparing them to human behavior. We find a highly similar representational structure across levels of visual abstraction in early and intermediate layers of the network. This similarity, however, does not translate to later stages in the network, resulting in low classification performance for drawings and sketches. We identified that texture bias in CNNs contributes to the dissimilar representational structure in late layers and the poor performance on drawings. Finally, by fine-tuning late network layers with object drawings, we show that performance can be largely restored, demonstrating the general utility of features learned on natural images in early and intermediate layers for the recognition of drawings. In conclusion, generalization to abstracted images such as drawings seems to be an emergent property of CNNs trained on natural images, which is, however, suppressed by domain-related biases that arise during later processing stages in the network.

Author(s):  
Hannah Garcia Doherty ◽  
Roberto Arnaiz Burgueño ◽  
Roeland P. Trommel ◽  
Vasileios Papanastasiou ◽  
Ronny I. A. Harmanny

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.


Author(s):  
Bo Wang ◽  
Xiaoting Yu ◽  
Chengeng Huang ◽  
Qinghong Sheng ◽  
Yuanyuan Wang ◽  
...  

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.


2021 ◽  
Author(s):  
Akinori Minagi ◽  
Hokuto Hirano ◽  
Kazuhiro Takemoto

Abstract Transfer learning from natural images is well used in deep neural networks (DNNs) for medical image classification to achieve computer-aided clinical diagnosis. Although the adversarial vulnerability of DNNs hinders practical applications owing to the high stakes of diagnosis, adversarial attacks are expected to be limited because training data — which are often required for adversarial attacks — are generally unavailable in terms of security and privacy preservation. Nevertheless, we hypothesized that adversarial attacks are also possible using natural images because pre-trained models do not change significantly after fine-tuning. We focused on three representative DNN-based medical image classification tasks (i.e., skin cancer, referable diabetic retinopathy, and pneumonia classifications) and investigated whether medical DNN models with transfer learning are vulnerable to universal adversarial perturbations (UAPs), generated using natural images. UAPs from natural images are useful for both non-targeted and targeted attacks. The performance of UAPs from natural images was significantly higher than that of random controls, although slightly lower than that of UAPs from training images. Vulnerability to UAPs from natural images was observed between different natural image datasets and between different model architectures. The use of transfer learning causes a security hole, which decreases the reliability and safety of computer-based disease diagnosis. Model training from random initialization (without transfer learning) reduced the performance of UAPs from natural images; however, it did not completely avoid vulnerability to UAPs. The vulnerability of UAPs from natural images will become a remarkable security threat.


Author(s):  
Mehmet Sarigul ◽  
Levent Karacan

Since the invention of cameras, video shooting has become a passion for human. However, the quality of videos recorded with devices such as handheld cameras, head cameras, and vehicle cameras may be low due to shaking, jittering and unwanted periodic movements. Although the issue of video stabilization has been studied for decades, there is no consensus on how to measure the performance of a video stabilization method. In many studies in the literature, different metrics have been used for comparison of different methods. In this study, deep convolutional neural networks are used as a decision maker for video stabilization. VGG networks with different number of layers are used to determine the stability status of the videos. It was observed that VGG networks showed a classification performance up to 96.537% using only two consecutive scenes. These results show that deep learning networks can be utilized as a metric for video stabilization.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2393 ◽  
Author(s):  
Daniel Octavian Melinte ◽  
Luige Vladareanu

The interaction between humans and an NAO robot using deep convolutional neural networks (CNN) is presented in this paper based on an innovative end-to-end pipeline method that applies two optimized CNNs, one for face recognition (FR) and another one for the facial expression recognition (FER) in order to obtain real-time inference speed for the entire process. Two different models for FR are considered, one known to be very accurate, but has low inference speed (faster region-based convolutional neural network), and one that is not as accurate but has high inference speed (single shot detector convolutional neural network). For emotion recognition transfer learning and fine-tuning of three CNN models (VGG, Inception V3 and ResNet) has been used. The overall results show that single shot detector convolutional neural network (SSD CNN) and faster region-based convolutional neural network (Faster R-CNN) models for face detection share almost the same accuracy: 97.8% for Faster R-CNN on PASCAL visual object classes (PASCAL VOCs) evaluation metrics and 97.42% for SSD Inception. In terms of FER, ResNet obtained the highest training accuracy (90.14%), while the visual geometry group (VGG) network had 87% accuracy and Inception V3 reached 81%. The results show improvements over 10% when using two serialized CNN, instead of using only the FER CNN, while the recent optimization model, called rectified adaptive moment optimization (RAdam), lead to a better generalization and accuracy improvement of 3%-4% on each emotion recognition CNN.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Wei Hu ◽  
Yangyu Huang ◽  
Li Wei ◽  
Fan Zhang ◽  
Hengchao Li

Recently, convolutional neural networks have demonstrated excellent performance on various visual tasks, including the classification of common two-dimensional images. In this paper, deep convolutional neural networks are employed to classify hyperspectral images directly in spectral domain. More specifically, the architecture of the proposed classifier contains five layers with weights which are the input layer, the convolutional layer, the max pooling layer, the full connection layer, and the output layer. These five layers are implemented on each spectral signature to discriminate against others. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than some traditional methods, such as support vector machines and the conventional deep learning-based methods.


Author(s):  
Mikhail Krinitskiy ◽  
Polina Verezemskaya ◽  
Kirill Grashchenkov ◽  
Natalia Tilinina ◽  
Sergey Gulev ◽  
...  

Polar mesocyclones (MCs) are small marine atmospheric vortices. The class of intense MCs, called polar lows, are accompanied by extremely strong surface winds and heat fluxes and thus largely influencing deep ocean water formation in the polar regions. Accurate detection of polar mesocyclones in high-resolution satellite data, while challenging, is a time-consuming task, when performed manually. Existing algorithms for the automatic detection of polar mesocyclones are based on the conventional analysis of patterns of cloudiness and involve different empirically defined thresholds of geophysical variables. As a result, various detection methods typically reveal very different results when applied to a single dataset. We develop a conceptually novel approach for the detection of MCs based on the use of deep convolutional neural networks (DCNNs). As a first step, we demonstrate that DCNN model is capable of performing binary classification of 500x500km patches of satellite images regarding MC patterns presence in it. The training dataset is based on the reference database of MCs manually tracked in the Southern Hemisphere from satellite mosaics. We use a subset of this database with MC diameters falling in the range of 200-400 km. This dataset is further used for testing several different DCNN setups, specifically, DCNN built “from scratch”, DCNN based on VGG16 pre-trained weights also engaging the Transfer Learning technique, and DCNN based on VGG16 with Fine Tuning technique. Each of these networks is further applied to both infrared (IR) and a combination of infrared and water vapor (IR+WV) satellite imagery. The best skills (97% in terms of the binary classification accuracy score) is achieved with the model that averages the estimates of the ensemble of different DCNNs. The algorithm can be further extended to the automatic identification and tracking numerical scheme and applied to other atmospheric phenomena characterized by a distinct signature in satellite imagery.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Emre Kiyak ◽  
Gulay Unal

Purpose The paper aims to address the tracking algorithm based on deep learning and four deep learning tracking models developed. They compared with each other to prevent collision and to obtain target tracking in autonomous aircraft. Design/methodology/approach First, to follow the visual target, the detection methods were used and then the tracking methods were examined. Here, four models (deep convolutional neural networks (DCNN), deep convolutional neural networks with fine-tuning (DCNNFN), transfer learning with deep convolutional neural network (TLDCNN) and fine-tuning deep convolutional neural network with transfer learning (FNDCNNTL)) were developed. Findings The training time of DCNN took 9 min 33 s, while the accuracy percentage was calculated as 84%. In DCNNFN, the training time of the network was calculated as 4 min 26 s and the accuracy percentage was 91%. The training of TLDCNN) took 34 min and 49 s and the accuracy percentage was calculated as 95%. With FNDCNNTL, the training time of the network was calculated as 34 min 33 s and the accuracy percentage was nearly 100%. Originality/value Compared to the results in the literature ranging from 89.4% to 95.6%, using FNDCNNTL, better results were found in the paper.


2016 ◽  
Vol 66 ◽  
pp. 295-301 ◽  
Author(s):  
Gota Gando ◽  
Taiga Yamada ◽  
Haruhiko Sato ◽  
Satoshi Oyama ◽  
Masahito Kurihara

Sign in / Sign up

Export Citation Format

Share Document