scholarly journals Choice of Neural Network Architecture when Recognizing Objects that do not Have High-Level Features

Author(s):  
Gregory Malyshev ◽  
Vyacheslav Andreev ◽  
Olga Andreeva ◽  
Oleg Chistyakov ◽  
Dmitriy Sveshnikov

This article explores the capabilities of pretrained convolutional neural networks in relation to the problem of recognizing defects for which it is impossible to identify any abstract features. The results of training the convolutional neural network AlexNet and the fully connected classifier of the VGG16 network are compared. The efficiency of using a pretrained neural network in the problem of defect recognition is demonstrated. A graph of the change in the proportion of correctly recognized images in the process of training a fully connected classifier is presented. The article attempts to explain the efficiency of a fully connected neural network classifier trained on a critically small training dataset with images of defects. The work of a convolutional neural network with a fully connected classifier is investigated. The classifier allows for classification into five categories: «crack» type defects, «chip» type defects, «hole» type defects, «multi hole» type defects and «defect-free surface». The article provides examples of convolutional network activation channels, visualized for each of the five categories. The signs of defects on which the activation of the network channels takes place are formulated. The classification errors made by the network are analyzed. The article provides predictive probabilities, below which the result of the network operation can be considered doubtful. Practical recommendations for using the trained network are given.

2020 ◽  
Vol 2020 (10) ◽  
pp. 181-1-181-7
Author(s):  
Takahiro Kudo ◽  
Takanori Fujisawa ◽  
Takuro Yamaguchi ◽  
Masaaki Ikehara

Image deconvolution has been an important issue recently. It has two kinds of approaches: non-blind and blind. Non-blind deconvolution is a classic problem of image deblurring, which assumes that the PSF is known and does not change universally in space. Recently, Convolutional Neural Network (CNN) has been used for non-blind deconvolution. Though CNNs can deal with complex changes for unknown images, some CNN-based conventional methods can only handle small PSFs and does not consider the use of large PSFs in the real world. In this paper we propose a non-blind deconvolution framework based on a CNN that can remove large scale ringing in a deblurred image. Our method has three key points. The first is that our network architecture is able to preserve both large and small features in the image. The second is that the training dataset is created to preserve the details. The third is that we extend the images to minimize the effects of large ringing on the image borders. In our experiments, we used three kinds of large PSFs and were able to observe high-precision results from our method both quantitatively and qualitatively.


2020 ◽  
Vol 10 (6) ◽  
pp. 2104
Author(s):  
Michał Tomaszewski ◽  
Paweł Michalski ◽  
Jakub Osuchowski

This article presents an analysis of the effectiveness of object detection in digital images with the application of a limited quantity of input. The possibility of using a limited set of learning data was achieved by developing a detailed scenario of the task, which strictly defined the conditions of detector operation in the considered case of a convolutional neural network. The described solution utilizes known architectures of deep neural networks in the process of learning and object detection. The article presents comparisons of results from detecting the most popular deep neural networks while maintaining a limited training set composed of a specific number of selected images from diagnostic video. The analyzed input material was recorded during an inspection flight conducted along high-voltage lines. The object detector was built for a power insulator. The main contribution of the presented papier is the evidence that a limited training set (in our case, just 60 training frames) could be used for object detection, assuming an outdoor scenario with low variability of environmental conditions. The decision of which network will generate the best result for such a limited training set is not a trivial task. Conducted research suggests that the deep neural networks will achieve different levels of effectiveness depending on the amount of training data. The most beneficial results were obtained for two convolutional neural networks: the faster region-convolutional neural network (faster R-CNN) and the region-based fully convolutional network (R-FCN). Faster R-CNN reached the highest AP (average precision) at a level of 0.8 for 60 frames. The R-FCN model gained a worse AP result; however, it can be noted that the relationship between the number of input samples and the obtained results has a significantly lower influence than in the case of other CNN models, which, in the authors’ assessment, is a desired feature in the case of a limited training set.


2019 ◽  
Vol 9 (19) ◽  
pp. 4182 ◽  
Author(s):  
Pu Yan ◽  
Li Zhuo ◽  
Jiafeng Li ◽  
Hui Zhang ◽  
Jing Zhang

Pedestrian attributes (such as gender, age, hairstyle, and clothing) can effectively represent the appearance of pedestrians. These are high-level semantic features that are robust to illumination, deformation, etc. Therefore, they can be widely used in person re-identification, video structuring analysis and other applications. In this paper, a pedestrian attributes recognition method for surveillance scenarios using a multi-task lightweight convolutional neural network is proposed. Firstly, the labels of the attributes for each pedestrian image are integrated into a label vector. Then, a multi-task lightweight Convolutional Neural Network (CNN) is designed, which consists of five convolutional layers, three pooling layers and two fully connected layers to extract the deep features of pedestrian images. Considering that the data distribution of the datasets is unbalanced, the loss function is improved based on the sigmoid cross-entropy, and the scale factor is added to balance the amount of various attributes data. Through training the network, the mapping relationship model between the deep features of pedestrian images and the integration label vector of their attributes is established, which can be used to predict each attribute of the pedestrian. The experiments were conducted on two public pedestrian attributes datasets in surveillance scenarios, namely PETA and RAP. The results show that, compared with the state-of-the-art pedestrian attributes recognition methods, the proposed method can achieve a superior accuracy by 91.88% on PETA and 87.44% on RAP respectively.


2018 ◽  
Vol 4 (1) ◽  
pp. 297-300 ◽  
Author(s):  
Mattias P. Heinrich ◽  
Maik Stille ◽  
Thorsten M. Buzug

AbstractLow-dose CT has received increasing attention in the recent years and is considered a promising method to reduce the risk of cancer in patients. However, the reduction of the dosage leads to quantum noise in the raw data, which is carried on in the reconstructed images. Two different multilayer convolutional neural network (CNN) architectures for the denoising of CT images are investigated. ResFCN is based on a fully-convolutional network that consists of three blocks of 5×5 convolutions filters and a ResUNet that is trained with 10 convolutional blocks that are arranged in a multi-scale fashion. Both architectures feature a residual connection of the input image to ease learning. Training images are based on realistic simulations by using the XCAT phantom. The ResUNet approach shows the most promising results with a peak signal to noise ratio of 44.00 compared to ResFCN with 41.79.


2021 ◽  
Vol 22 (8) ◽  
pp. 4023
Author(s):  
Huimin Shen ◽  
Youzhi Zhang ◽  
Chunhou Zheng ◽  
Bing Wang ◽  
Peng Chen

Accurate prediction of binding affinity between protein and ligand is a very important step in the field of drug discovery. Although there are many methods based on different assumptions and rules do exist, prediction performance of protein–ligand binding affinity is not satisfactory so far. This paper proposes a new cascade graph-based convolutional neural network architecture by dealing with non-Euclidean irregular data. We represent the molecule as a graph, and use a simple linear transformation to deal with the sparsity problem of the one-hot encoding of original data. The first stage adopts ARMA graph convolutional neural network to learn the characteristics of atomic space in the protein–ligand complex. In the second stage, one variant of the MPNN graph convolutional neural network is introduced with chemical bond information and interactive atomic features. Finally, the architecture passes through the global add pool and the fully connected layer, and outputs a constant value as the predicted binding affinity. Experiments on the PDBbind v2016 data set showed that our method is better than most of the current methods. Our method is also comparable to the state-of-the-art method on the data set, and is more intuitive and simple.


2020 ◽  
Vol 39 (3) ◽  
pp. 169-185
Author(s):  
Omran Salih ◽  
Serestina Viriri

Deep learning techniques such as Deep Convolutional Networks have achieved great success in skin lesion segmentation towards melanoma detection. The performance is however restrained by distinctive and challenging features of skin lesions such as irregular and fuzzy border, noise and artefacts presence and low contrast between lesions. The methods are also restricted with scarcity of annotated lesion images training dataset and limited computing resources. Recent research in convolutional neural network (CNN) has provided a variety of new architectures for deep learning. One interesting new architecture is the local binary convolutional neural network (LBCNN), which can reduce the workload of CNNs and improve the classification accuracy. The proposed framework employs the local binary convolution on U-net architecture instead of the standard convolution in order to reduced-size deep convolutional encoder-decoder network that adopts loss function for robust segmentation. The proposed framework replaced the encoder part in U-net by LBCNN layers. The approach automatically learns and segments complex features of skin lesion images. The encoder stage learns the contextual information by extracting discriminative features while the decoder stage captures the lesion boundaries of the skin images. This addresses the issues with encoder-decoder network producing coarse segmented output with challenging skin lesions appearances such as low contrast between healthy and unhealthy tissues and fine grained variability. It also addresses issues with multi-size, multi-scale and multi-resolution skin lesion images. The deep convolutional network also adopts a reduced-size network with just five levels of encoding-decoding network. This reduces greatly the consumption of computational processing resources. The system was evaluated on publicly available dataset of ISIC and PH2. The proposed system outperforms most of the existing state-of-art.


2021 ◽  
pp. 1-41
Author(s):  
Ethan Harris ◽  
Daniela Mihai ◽  
Jonathon Hare

Recent work suggests that changing convolutional neural network (CNN) architecture by introducing a bottleneck in the second layer can yield changes in learned function. To understand this relationship fully requires a way of quantitatively comparing trained networks. The fields of electrophysiology and psychophysics have developed a wealth of methods for characterizing visual systems that permit such comparisons. Inspired by these methods, we propose an approach to obtaining spatial and color tuning curves for convolutional neurons that can be used to classify cells in terms of their spatial and color opponency. We perform these classifications for a range of CNNs with different depths and bottleneck widths. Our key finding is that networks with a bottleneck show a strong functional organization: almost all cells in the bottleneck layer become both spatially and color opponent, and cells in the layer following the bottleneck become nonopponent. The color tuning data can further be used to form a rich understanding of how color a network encodes color. As a concrete demonstration, we show that shallower networks without a bottleneck learn a complex nonlinear color system, whereas deeper networks with tight bottlenecks learn a simple channel opponent code in the bottleneck layer. We develop a method of obtaining a hue sensitivity curve for a trained CNN that enables high-level insights that complement the low-level findings from the color tuning data. We go on to train a series of networks under different conditions to ascertain the robustness of the discussed results. Ultimately our methods and findings coalesce with prior art, strengthening our ability to interpret trained CNNs and furthering our understanding of the connection between architecture and learned representation. Trained models and code for all experiments are available at https://github.com/ecs-vlc/opponency .


2019 ◽  
Vol 14 ◽  
pp. 155892501989739 ◽  
Author(s):  
Zhoufeng Liu ◽  
Chi Zhang ◽  
Chunlei Li ◽  
Shumin Ding ◽  
Yan Dong ◽  
...  

Fabric defect recognition is an important measure for quality control in a textile factory. This article utilizes a deep convolutional neural network to recognize defects in fabrics that have complicated textures. Although convolutional neural networks are very powerful, a large number of parameters consume considerable computation time and memory bandwidth. In real-world applications, however, the fabric defect recognition task needs to be carried out in a timely fashion on a computation-limited platform. To optimize a deep convolutional neural network, a novel method is introduced to reveal the input pattern that originally caused a specific activation in the network feature maps. Using this visualization technique, this study visualizes the features in a fully trained convolutional model and attempts to change the architecture of original neural network to reduce computational load. After a series of improvements, a new convolutional network is acquired that is more efficient to the fabric image feature extraction, and the computation load and the total number of parameters in the new network is 23% and 8.9%, respectively, of the original model. The proposed neural network is specifically tailored for fabric defect recognition in resource-constrained environments. All of the source code and pretrained models are available online at https://github.com/ZCmeteor .


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 450
Author(s):  
Xudong Li ◽  
Jianhua Zheng ◽  
Mingtao Li ◽  
Wenzhen Ma ◽  
Yang Hu

In recent years, transfer learning has been widely applied in fault diagnosis for solving the problem of inconsistent distribution of the original training dataset and the online-collecting testing dataset. In particular, the domain adaptation method can solve the problem of the unlabeled testing dataset in transfer learning. Moreover, Convolutional Neural Network (CNN) is the most widely used network among existing domain adaptation approaches due to its powerful feature extraction capability. However, network designing is too empirical, and there is no network designing principle from the frequency domain. In this paper, we propose a unified convolutional neural network architecture from a frequency domain perspective for a domain adaptation named Frequency-domain Fusing Convolutional Neural Network (FFCNN). The method of FFCNN contains two parts, frequency-domain fusing layer and feature extractor. The frequency-domain fusing layer uses convolution operations to filter signals at different frequency bands and combines them into new input signals. These signals are input to the feature extractor to extract features and make domain adaptation. We apply FFCNN for three domain adaptation methods, and the diagnosis accuracy is improved compared to the typical CNN.


2019 ◽  
Vol 8 (7) ◽  
pp. 300 ◽  
Author(s):  
Recep Can ◽  
Sultan Kocaman ◽  
Candan Gokceoglu

Several scientific processes benefit from Citizen Science (CitSci) and VGI (Volunteered Geographical Information) with the help of mobile and geospatial technologies. Studies on landslides can also take advantage of these approaches to a great extent. However, the quality of the collected data by both approaches is often questionable, and automated procedures to check the quality are needed for this purpose. In the present study, a convolutional neural network (CNN) architecture is proposed to validate landslide photos collected by citizens or nonexperts and integrated into a mobile- and web-based GIS environment designed specifically for a landslide CitSci project. The VGG16 has been used as the base model since it allows finetuning, and high performance could be achieved by selecting the best hyper-parameters. Although the training dataset was small, the proposed CNN architecture was found to be effective as it could identify the landslide photos with 94% precision. The accuracy of the results is sufficient for purpose and could even be improved further using a larger amount of training data, which is expected to be obtained with the help of volunteers.


Sign in / Sign up

Export Citation Format

Share Document