Semantic representation for visual reasoning

In the field of visual reasoning, image features are widely used as the input of neural networks to get answers. However, image features are too redundant to learn accurate characterizations for regular networks. While in human reasoning, abstract description is usually constructed to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced in this paper to make visual reasoning more efficient. The idea of the Gram matrix used in the neural style transfer research is transferred here to build a relation matrix which enables the related information between objects to be better represented. The model using semantic representation as input outperforms the same model using image features as input which verifies that more accurate results can be obtained through the introduction of high-level semantic representation in the field of visual reasoning.

Download Full-text

DCNN-based Ship Classification using Enhanced Edge Information and Inception Module

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2022.66.3.030501 ◽

2021 ◽

Author(s):

Bo Wang ◽

Xiaoting Yu ◽

Chengeng Huang ◽

Qinghong Sheng ◽

Yuanyuan Wang ◽

...

Keyword(s):

Neural Networks ◽

Classification Performance ◽

Image Features ◽

Deep Convolutional Neural Networks ◽

Edge Information ◽

Average Accuracy ◽

Ship Classification ◽

Edge Features ◽

High Level ◽

Better Than

The excellent feature extraction ability of deep convolutional neural networks (DCNNs) has been demonstrated in many image processing tasks, by which image classification can achieve high accuracy with only raw input images. However, the specific image features that influence the classification results are not readily determinable and what lies behind the predictions is unclear. This study proposes a method combining the Sobel and Canny operators and an Inception module for ship classification. The Sobel and Canny operators obtain enhanced edge features from the input images. A convolutional layer is replaced with the Inception module, which can automatically select the proper convolution kernel for ship objects in different image regions. The principle is that the high-level features abstracted by the DCNN, and the features obtained by multi-convolution concatenation of the Inception module must ultimately derive from the edge information of the preprocessing input images. This indicates that the classification results are based on the input edge features, which indirectly interpret the classification results to some extent. Experimental results show that the combination of the edge features and the Inception module improves DCNN ship classification performance. The original model with the raw dataset has an average accuracy of 88.72%, while when using enhanced edge features as input, it achieves the best performance of 90.54% among all models. The model that replaces the fifth convolutional layer with the Inception module has the best performance of 89.50%. It performs close to VGG-16 on the raw dataset and is significantly better than other deep neural networks. The results validate the functionality and feasibility of the idea posited.

Download Full-text

CAPTCHA Image Generation: Two-Step Style-Transfer Learning in Deep Neural Networks

Sensors ◽

10.3390/s20051495 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1495

Author(s):

Hyun Kwon ◽

Hyunsoo Yoon ◽

Ki-Woong Park

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Web Services ◽

Mobile Devices ◽

Deep Neural Networks ◽

The Internet ◽

Image Generation ◽

Style Transfer ◽

Transfer Method ◽

High Level

Mobile devices such as sensors are used to connect to the Internet and provide services to users. Web services are vulnerable to automated attacks, which can restrict mobile devices from accessing websites. To prevent such automated attacks, CAPTCHAs are widely used as a security solution. However, when a high level of distortion has been applied to a CAPTCHA to make it resistant to automated attacks, the CAPTCHA becomes difficult for a human to recognize. In this work, we propose a method for generating a CAPTCHA image that will resist recognition by machines while maintaining its recognizability to humans. The method utilizes the style transfer method, and creates a new image, called a style-plugged-CAPTCHA image, by incorporating the styles of other images while keeping the content of the original CAPTCHA. In our experiment, we used the TensorFlow machine learning library and six CAPTCHA datasets in use on actual websites. The experimental results show that the proposed scheme reduces the rate of recognition by the DeCAPTCHA system to 3.5% and 3.2% using one style image and two style images, respectively, while maintaining recognizability by humans.

Download Full-text

Analysis of the proficiency of fully connected neural networks in the process of classifying digital images. Benchmark of different classification algorithms on high-level image features from convolutional layers

Expert Systems with Applications ◽

10.1016/j.eswa.2019.05.058 ◽

2019 ◽

Vol 135 ◽

pp. 12-38 ◽

Cited By ~ 5

Author(s):

Jonathan Janke ◽

Mauro Castelli ◽

Aleš Popovič

Keyword(s):

Neural Networks ◽

Digital Images ◽

Image Features ◽

Classification Algorithms ◽

High Level ◽

Fully Connected

Download Full-text

Meaning maps and saliency models based on deep convolutional neural networks are insensitive to image meaning when predicting human fixations

10.1101/840256 ◽

2019 ◽

Author(s):

Marek A. Pedziwiatr ◽

Matthias Kümmerer ◽

Thomas S.A. Wallis ◽

Matthias Bethge ◽

Christoph Teufel

Keyword(s):

Neural Network ◽

Neural Networks ◽

Eye Movements ◽

Convolutional Neural Networks ◽

Deep Neural Network ◽

Image Features ◽

Human Vision ◽

Deep Convolutional Neural Networks ◽

High Level

AbstractEye movements are vital for human vision, and it is therefore important to understand how observers decide where to look. Meaning maps (MMs), a technique to capture the distribution of semantic importance across an image, have recently been proposed to support the hypothesis that meaning rather than image features guide human gaze. MMs have the potential to be an important tool far beyond eye-movements research. Here, we examine central assumptions underlying MMs. First, we compared the performance of MMs in predicting fixations to saliency models, showing that DeepGaze II – a deep neural network trained to predict fixations based on high-level features rather than meaning – outperforms MMs. Second, we show that whereas human observers respond to changes in meaning induced by manipulating object-context relationships, MMs and DeepGaze II do not. Together, these findings challenge central assumptions underlying the use of MMs to measure the distribution of meaning in images.

Download Full-text

Deep Affect: Using objects, scenes and facial expressions in a deep neural network to predict arousal and valence values of images

10.31234/osf.io/t9p3f ◽

2021 ◽

Author(s):

Quoc Vuong

Keyword(s):

Neural Network ◽

Neural Networks ◽

Facial Expressions ◽

Emotional Response ◽

Deep Neural Network ◽

Emotional Responses ◽

Image Features ◽

High Level ◽

The Relationship ◽

Arousal And Valence

Images are extremely effective at eliciting emotional responses in observers and have been frequently used to investigate the neural correlates of emotion. However, the image features producing this emotional response remain unclear. This study sought to use biologically inspired computational models of the brain to test the hypothesis that these emotional responses can be attributed to the estimation of arousal and valence of objects, scenes and facial expressions in the images. Convolutional neural networks were used to extract all, or various combinations, of high-level image features related to objects, scenes and facial expressions. Subsequent deep feedforward neural networks predicted the images’ arousal and valence value. The model was provided with thousands of pre-annotated images to learn the relationship between the high-level features and the images arousal and valence values. The relationship between arousal and valence was assessed by comparing models that either learnt the constructs separately or together. The results confirmed the effectiveness of using the features to predict human emotion alongside their ability to augment each other. When utilising the object, scene and facial expression information together, the model classified arousal and valence to accuracies of 88% and 87% respectively. The effectiveness of our deep neural network of emotion perception strongly suggests that these same high-level features play a critical component in producing humans’ emotional response. Moreover, performance increased across all models when arousal and valence were learnt together, suggesting a dependent relationship between these affective dimensions. These results open up numerous avenues for future work, whilst also bridging the gap between affective Neuroscience and Computer Vision.

Download Full-text

Joint Semantic Synthesis and Morphological Analysis of the Derived Word

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00003 ◽

2018 ◽

Vol 6 ◽

pp. 33-48 ◽

Cited By ~ 4

Author(s):

Ryan Cotterell ◽

Hinrich Schütze

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Probabilistic Model ◽

Morphological Analysis ◽

Semantic Representation ◽

English Word ◽

Word Formation ◽

Additive Models ◽

Segmentation Accuracy ◽

Morphological Productivity

Much like sentences are composed of words, words themselves are composed of smaller units. For example, the English word questionably can be analyzed as question+ able+ ly. However, this structural decomposition of the word does not directly give us a semantic representation of the word’s meaning. Since morphology obeys the principle of compositionality, the semantics of the word can be systematically derived from the meaning of its parts. In this work, we propose a novel probabilistic model of word formation that captures both the analysis of a word w into its constituent segments and the synthesis of the meaning of w from the meanings of those segments. Our model jointly learns to segment words into morphemes and compose distributional semantic vectors of those morphemes. We experiment with the model on English CELEX data and German DErivBase (Zeller et al., 2013) data. We show that jointly modeling semantics increases both segmentation accuracy and morpheme F1 by between 3% and 5%. Additionally, we investigate different models of vector composition, showing that recurrent neural networks yield an improvement over simple additive models. Finally, we study the degree to which the representations correspond to a linguist’s notion of morphological productivity.

Download Full-text

Interpretation of Swedish Sign Language Using Convolutional Neural Networks and Transfer Learning

SN Computer Science ◽

10.1007/s42979-021-00612-w ◽

2021 ◽

Vol 2 (3) ◽

Author(s):

Gustaf Halvardsson ◽

Johanna Peterson ◽

César Soto-Valero ◽

Benoit Baudry

Keyword(s):

Neural Networks ◽

Sign Language ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Web Application ◽

Training Dataset ◽

Motion Processing ◽

Image Perception ◽

Sign Languages ◽

High Level

AbstractThe automatic interpretation of sign languages is a challenging task, as it requires the usage of high-level vision and high-level motion processing systems for providing accurate image perception. In this paper, we use Convolutional Neural Networks (CNNs) and transfer learning to make computers able to interpret signs of the Swedish Sign Language (SSL) hand alphabet. Our model consists of the implementation of a pre-trained InceptionV3 network, and the usage of the mini-batch gradient descent optimization algorithm. We rely on transfer learning during the pre-training of the model and its data. The final accuracy of the model, based on 8 study subjects and 9400 images, is 85%. Our results indicate that the usage of CNNs is a promising approach to interpret sign languages, and transfer learning can be used to achieve high testing accuracy despite using a small training dataset. Furthermore, we describe the implementation details of our model to interpret signs as a user-friendly web application.

Download Full-text

Symmetry perception with spiking neural networks

Scientific Reports ◽

10.1038/s41598-021-85232-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jonathan K. George ◽

Cesare Soci ◽

Mario Miscuglio ◽

Volker J. Sorger

Keyword(s):

Neural Networks ◽

Mirror Symmetry ◽

Satellite Image ◽

Image Data ◽

Coincidence Detection ◽

Spiking Neural Networks ◽

Feed Forward Neural Network ◽

Neuromorphic Computing ◽

Symmetry Perception ◽

High Level

AbstractMirror symmetry is an abundant feature in both nature and technology. Its successful detection is critical for perception procedures based on visual stimuli and requires organizational processes. Neuromorphic computing, utilizing brain-mimicked networks, could be a technology-solution providing such perceptual organization functionality, and furthermore has made tremendous advances in computing efficiency by applying a spiking model of information. Spiking models inherently maximize efficiency in noisy environments by placing the energy of the signal in a minimal time. However, many neuromorphic computing models ignore time delay between nodes, choosing instead to approximate connections between neurons as instantaneous weighting. With this assumption, many complex time interactions of spiking neurons are lost. Here, we show that the coincidence detection property of a spiking-based feed-forward neural network enables mirror symmetry. Testing this algorithm exemplary on geospatial satellite image data sets reveals how symmetry density enables automated recognition of man-made structures over vegetation. We further demonstrate that the addition of noise improves feature detectability of an image through coincidence point generation. The ability to obtain mirror symmetry from spiking neural networks can be a powerful tool for applications in image-based rendering, computer graphics, robotics, photo interpretation, image retrieval, video analysis and annotation, multi-media and may help accelerating the brain-machine interconnection. More importantly it enables a technology pathway in bridging the gap between the low-level incoming sensor stimuli and high-level interpretation of these inputs as recognized objects and scenes in the world.

Download Full-text

Scene Complexity: A New Perspective on Understanding the Scene Semantics of Remote Sensing and Designing Image-Adaptive Convolutional Neural Networks

Remote Sensing ◽

10.3390/rs13040742 ◽

2021 ◽

Vol 13 (4) ◽

pp. 742

Author(s):

Jian Peng ◽

Xiaoming Mei ◽

Wenbo Li ◽

Liang Hong ◽

Bingyu Sun ◽

...

Keyword(s):

Remote Sensing ◽

Neural Networks ◽

Fundamental Problem ◽

Semantic Representation ◽

Feature Learning ◽

Essential Elements ◽

Complex Scene ◽

Feature Representations ◽

The Right ◽

The Relationship

Scene understanding of remote sensing images is of great significance in various applications. Its fundamental problem is how to construct representative features. Various convolutional neural network architectures have been proposed for automatically learning features from images. However, is the current way of configuring the same architecture to learn all the data while ignoring the differences between images the right one? It seems to be contrary to our intuition: it is clear that some images are easier to recognize, and some are harder to recognize. This problem is the gap between the characteristics of the images and the learning features corresponding to specific network structures. Unfortunately, the literature so far lacks an analysis of the two. In this paper, we explore this problem from three aspects: we first build a visual-based evaluation pipeline of scene complexity to characterize the intrinsic differences between images; then, we analyze the relationship between semantic concepts and feature representations, i.e., the scalability and hierarchy of features which the essential elements in CNNs of different architectures, for remote sensing scenes of different complexity; thirdly, we introduce CAM, a visualization method that explains feature learning within neural networks, to analyze the relationship between scenes with different complexity and semantic feature representations. The experimental results show that a complex scene would need deeper and multi-scale features, whereas a simpler scene would need lower and single-scale features. Besides, the complex scene concept is more dependent on the joint semantic representation of multiple objects. Furthermore, we propose the framework of scene complexity prediction for an image and utilize it to design a depth and scale-adaptive model. It achieves higher performance but with fewer parameters than the original model, demonstrating the potential significance of scene complexity.

Download Full-text

A Review of Algorithms and Hardware Implementations for Spiking Neural Networks

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea11020023 ◽

2021 ◽

Vol 11 (2) ◽

pp. 23

Author(s):

Duy-Anh Nguyen ◽

Xuan-Tu Tran ◽

Francesca Iacopi

Keyword(s):

Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Training Algorithms ◽

Current State ◽

Hardware Implementations ◽

Neuromorphic Hardware ◽

High Level ◽

Event Based ◽

Hardware Platforms

Deep Learning (DL) has contributed to the success of many applications in recent years. The applications range from simple ones such as recognizing tiny images or simple speech patterns to ones with a high level of complexity such as playing the game of Go. However, this superior performance comes at a high computational cost, which made porting DL applications to conventional hardware platforms a challenging task. Many approaches have been investigated, and Spiking Neural Network (SNN) is one of the promising candidates. SNN is the third generation of Artificial Neural Networks (ANNs), where each neuron in the network uses discrete spikes to communicate in an event-based manner. SNNs have the potential advantage of achieving better energy efficiency than their ANN counterparts. While generally there will be a loss of accuracy on SNN models, new algorithms have helped to close the accuracy gap. For hardware implementations, SNNs have attracted much attention in the neuromorphic hardware research community. In this work, we review the basic background of SNNs, the current state and challenges of the training algorithms for SNNs and the current implementations of SNNs on various hardware platforms.

Download Full-text