Analysis of the proficiency of fully connected neural networks in the process of classifying digital images. Benchmark of different classification algorithms on high-level image features from convolutional layers

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.

Download Full-text

Semantic representation for visual reasoning

MATEC Web of Conferences ◽

10.1051/matecconf/201927702006 ◽

2019 ◽

Vol 277 ◽

pp. 02006 ◽

Cited By ~ 2

Author(s):

Xubin Ni ◽

Lirong Yin ◽

Xiaobing Chen ◽

Shan Liu ◽

Bo Yang ◽

...

Keyword(s):

Neural Networks ◽

Semantic Representation ◽

Image Features ◽

Gram Matrix ◽

Human Reasoning ◽

Style Transfer ◽

Related Information ◽

Abstract Description ◽

High Level ◽

Regular Networks

In the field of visual reasoning, image features are widely used as the input of neural networks to get answers. However, image features are too redundant to learn accurate characterizations for regular networks. While in human reasoning, abstract description is usually constructed to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced in this paper to make visual reasoning more efficient. The idea of the Gram matrix used in the neural style transfer research is transferred here to build a relation matrix which enables the related information between objects to be better represented. The model using semantic representation as input outperforms the same model using image features as input which verifies that more accurate results can be obtained through the introduction of high-level semantic representation in the field of visual reasoning.

Download Full-text

Quark jet versus gluon jet: fully-connected neural networks with high-level features

Science China Physics Mechanics and Astronomy ◽

10.1007/s11433-019-9390-8 ◽

2019 ◽

Vol 62 (9) ◽

Cited By ~ 4

Author(s):

Hui Luo ◽

Ming-Xing Luo ◽

Kai Wang ◽

Tao Xu ◽

GuoHuai Zhu

Keyword(s):

Neural Networks ◽

High Level ◽

Fully Connected

Download Full-text

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

10.1101/840256 ◽

2019 ◽

Author(s):

Marek A. Pedziwiatr ◽

Matthias Kümmerer ◽

Thomas S.A. Wallis ◽

Matthias Bethge ◽

Christoph Teufel

Keyword(s):

Neural Network ◽

Neural Networks ◽

Eye Movements ◽

Convolutional Neural Networks ◽

Deep Neural Network ◽

Image Features ◽

Human Vision ◽

Deep Convolutional Neural Networks ◽

High Level

AbstractEye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic importance across an image, have recently been proposed to support the hypothesis that meaning rather than image features guide human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.

Download Full-text

How to Use Machine Learning to Improve the Discrimination between Signal and Background at Particle Colliders

Applied Sciences ◽

10.3390/app112211076 ◽

2021 ◽

Vol 11 (22) ◽

pp. 11076

Author(s):

Xabier Cid Vidal ◽

Lorena Dieste Maroñas ◽

Álvaro Dosil Suárez

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Particle Physics ◽

Hadron Collider ◽

Particle Detectors ◽

Overall Performance ◽

Boosted Decision Trees ◽

High Level ◽

Fully Connected

The popularity of Machine Learning (ML) has been increasing in recent decades in almost every area, with the commercial and scientific fields being the most notorious ones. In particle physics, ML has been proven a useful resource to make the most of projects such as the Large Hadron Collider (LHC). The main advantage provided by ML is a reduction in the time and effort required for the measurements carried out by experiments, and improvements in the performance. With this work we aim to encourage scientists working with particle colliders to use ML and to try the different alternatives that are available, focusing on the separation of signal and background. We assess some of the most-used libraries in the field, such as Toolkit for Multivariate Data Analysis with ROOT, and also newer and more sophisticated options such as PyTorch and Keras. We also assess the suitability of some of the most common algorithms for signal-background discrimination, such as Boosted Decision Trees, and propose the use of others, namely Neural Networks. We compare the overall performance of different algorithms and libraries in simulated LHC data and produce some guidelines to help analysts deal with different situations. Examples include the use of low or high-level features from particle detectors or the amount of statistics that are available for training the algorithms. Our main conclusion is that the algorithms and libraries used more frequently at LHC collaborations might not always be those that provide the best results for the classification of signal candidates, and fully connected Neural Networks trained with Keras can improve the performance scores in most of the cases we formulate.

Download Full-text

Deep Affect: Using objects, scenes and facial expressions in a deep neural network to predict arousal and valence values of images

10.31234/osf.io/t9p3f ◽

2021 ◽

Author(s):

Quoc Vuong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Facial Expressions ◽

Emotional Response ◽

Deep Neural Network ◽

Emotional Responses ◽

Image Features ◽

High Level ◽

The Relationship ◽

Arousal And Valence

Images are extremely effective at eliciting emotional responses in observers and have been frequently used to investigate the neural correlates of emotion. However, the image features producing this emotional response remain unclear. This study sought to use biologically inspired computational models of the brain to test the hypothesis that these emotional responses can be attributed to the estimation of arousal and valence of objects, scenes and facial expressions in the images. Convolutional neural networks were used to extract all, or various combinations, of high-level image features related to objects, scenes and facial expressions. Subsequent deep feedforward neural networks predicted the images’ arousal and valence value. The model was provided with thousands of pre-annotated images to learn the relationship between the high-level features and the images arousal and valence values. The relationship between arousal and valence was assessed by comparing models that either learnt the constructs separately or together. The results confirmed the effectiveness of using the features to predict human emotion alongside their ability to augment each other. When utilising the object, scene and facial expression information together, the model classified arousal and valence to accuracies of 88% and 87% respectively. The effectiveness of our deep neural network of emotion perception strongly suggests that these same high-level features play a critical component in producing humans’ emotional response. Moreover, performance increased across all models when arousal and valence were learnt together, suggesting a dependent relationship between these affective dimensions. These results open up numerous avenues for future work, whilst also bridging the gap between affective Neuroscience and Computer Vision.

Download Full-text

An Approach of Feed-Forward Neural Network Throughput-Optimized Implementation in FPGA

Electronics ◽

10.3390/electronics9122193 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2193

Author(s):

Rihards Novickis ◽

Daniels Jānis Justs ◽

Kaspars Ozols ◽

Modris Greitāns

Keyword(s):

Neural Networks ◽

Manufacturing Processes ◽

Feed Forward Neural Network ◽

Feed Forward ◽

Novel Approach ◽

Wide Range ◽

Feed Forward Neural Networks ◽

High Level ◽

Computational Resources ◽

Fully Connected

Artificial Neural Networks (ANNs) have become an accepted approach for a wide range of challenges. Meanwhile, the advancement of chip manufacturing processes is approaching saturation which calls for new computing solutions. This work presents a novel approach of an FPGA-based accelerator development for fully connected feed-forward neural networks (FFNNs). A specialized tool was developed to facilitate different implementations, which splits FFNN into elementary layers, allocates computational resources and generates high-level C++ description for high-level synthesis (HLS) tools. Various topologies are implemented and benchmarked, and a comparison with related work is provided. The proposed methodology is applied for the implementation of high-throughput virtual sensor.

Download Full-text

Gated Fully Fusion for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6805 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11418-11425 ◽

Cited By ~ 2

Author(s):

Xiangtai Li ◽

Houlong Zhao ◽

Lei Han ◽

Yunhai Tong ◽

Shaohua Tan ◽

...

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Semantic Segmentation ◽

Semantic Gap ◽

Comprehensive Understanding ◽

Deep Convolutional Neural Networks ◽

Multi Level ◽

Multiple Levels ◽

High Level ◽

Fully Connected

Semantic segmentation generates comprehensive understanding of scenes through densely predicting the category for each pixel. High-level features from Deep Convolutional Neural Networks already demonstrate their effectiveness in semantic segmentation tasks, however the coarse resolution of high-level features often leads to inferior results for small/thin objects where detailed information is important. It is natural to consider importing low level features to compensate for the lost detailed information in high-level features. Unfortunately, simply combining multi-level features suffers from the semantic gap among them. In this paper, we propose a new architecture, named Gated Fully Fusion(GFF), to selectively fuse features from multiple levels using gates in a fully connected way. Specifically, features at each level are enhanced by higher-level features with stronger semantics and lower-level features with more details, and gates are used to control the propagation of useful information which significantly reduces the noises during fusion. We achieve the state of the art results on four challenging scene parsing datasets including Cityscapes, Pascal Context, COCO-stuff and ADE20K.

Download Full-text

Algorithms for segmentation and recognition of objects on medical images based on chiarlet transformation and neural networks

Informatization and communication ◽

10.34219/2078-8320-2020-11-2-35-45 ◽

2020 ◽

pp. 35-45

Author(s):

Y.A. Hamad ◽

K.V. Simonov ◽

A.S. Kents

Keyword(s):

Neural Networks ◽

Image Processing ◽

Computer Vision ◽

Edge Detection ◽

Medical Images ◽

Classification Algorithms ◽

Visual Data ◽

Recognition Of Objects

The paper considers general approaches to image processing, analysis of visual data and computer vision. The main methods for detecting features and edges associated with these approaches are presented. A brief description of modern edge detection and classification algorithms suitable for isolating and characterizing the type of pathology in the lungs in medical images is also given.

Download Full-text

Interpretation of Swedish Sign Language Using Convolutional Neural Networks and Transfer Learning

SN Computer Science ◽

10.1007/s42979-021-00612-w ◽

2021 ◽

Vol 2 (3) ◽

Author(s):

Gustaf Halvardsson ◽

Johanna Peterson ◽

César Soto-Valero ◽

Benoit Baudry

Keyword(s):

Neural Networks ◽

Sign Language ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Web Application ◽

Training Dataset ◽

Motion Processing ◽

Image Perception ◽

Sign Languages ◽

High Level

AbstractThe automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language (SSL) hand alphabet. Our model consists of the implementation of a pre-trained InceptionV3 network, and the usage of the mini-batch gradient descent optimization algorithm. We rely on transfer learning during the pre-training of the model and its data. The final accuracy of the model, based on 8 study subjects and 9400 images, is 85%. Our results indicate that the usage of CNNs is a promising approach to interpret sign languages, and transfer learning can be used to achieve high testing accuracy despite using a small training dataset. Furthermore, we describe the implementation details of our model to interpret signs as a user-friendly web application.

Download Full-text