Implementation of Sparse Neural Networks on Fixed Size Arrays

Mining discriminative patches for script identification in natural scene images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200260 ◽

2021 ◽

Vol 40 (1) ◽

pp. 551-563

Author(s):

Liqiong Lu ◽

Dong Wu ◽

Ziwei Tang ◽

Yaohua Yi ◽

Faliang Huang

Keyword(s):

Neural Networks ◽

Experimental Results ◽

The Other ◽

Natural Scene ◽

Fixed Size ◽

Script Identification ◽

Aspect Ratios ◽

Novel Approach ◽

Public Datasets ◽

Natural Scene Images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text

Adaptive Tiling: Applying Fixed-size Systolic Arrays To Sparse Convolutional Neural Networks

2018 24th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2018.8545462 ◽

2018 ◽

Cited By ~ 4

Author(s):

H. T. Kung ◽

Bradley McDanel ◽

Sai Qian Zhang

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Systolic Arrays ◽

Fixed Size

Download Full-text

Video-Based Human Action Recognition Using Spatial Pyramid Pooling and 3D Densely Convolutional Networks

Future Internet ◽

10.3390/fi10120115 ◽

2018 ◽

Vol 10 (12) ◽

pp. 115

Author(s):

Wanli Yang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Neural Networks ◽

Network Structure ◽

Three Dimensional ◽

Human Action Recognition ◽

Human Action ◽

Fixed Size ◽

Behavior Recognition ◽

Convolutional Network ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

In recent years, the application of deep neural networks to human behavior recognition has become a hot topic. Although remarkable achievements have been made in the field of image recognition, there are still many problems to be solved in the area of video. It is well known that convolutional neural networks require a fixed size image input, which not only limits the network structure but also affects the recognition accuracy. Although this problem has been solved in the field of images, it has not yet been broken through in the field of video. To address the input problem of fixed size video frames in video recognition, we propose a three-dimensional (3D) densely connected convolutional network based on spatial pyramid pooling (3D-DenseNet-SPP). As the name implies, the network structure is mainly composed of three parts: 3DCNN, DenseNet, and SPPNet. Our models were evaluated on a KTH dataset and UCF101 dataset separately. The experimental results showed that our model has better performance in the field of video-based behavior recognition in comparison to the existing models.

Download Full-text

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00249 ◽

2018 ◽

Vol 6 ◽

pp. 687-702 ◽

Cited By ~ 5

Author(s):

Wenpeng Yin ◽

Hinrich Schütze

Keyword(s):

Neural Networks ◽

Sentiment Analysis ◽

Recurrent Neural Networks ◽

Representation Learning ◽

Local Context ◽

Fixed Size ◽

Convolution Operation ◽

Input Text ◽

Multiple Context ◽

Textual Entailment

In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling (i.e., it is applied to pooling) rather than as attentive convolution (i.e., it is integrated into convolution). Convolution is the differentiator of CNNs in that it can powerfully model the higher-level representation of a word by taking into account its local fixed-size context in the input text t x. In this work, we propose an attentive convolution network, ATTCONV. It extends the context scope of the convolution operation, deriving higher-level features for a word not only from local context, but also from information extracted from nonlocal context by the attention mechanism commonly used in RNNs. This nonlocal context can come (i) from parts of the input text t x that are distant or (ii) from extra (i.e., external) contexts t y. Experiments on sentence modeling with zero-context (sentiment analysis), single-context (textual entailment) and multiple-context (claim verification) demonstrate the effectiveness of ATTCONV in sentence representation learning with the incorporation of context. In particular, attentive convolution outperforms attentive pooling and is a strong competitor to popular attentive RNNs. 1

Download Full-text