Mining discriminative patches for script identification in natural scene images

This paper focuses on script identification in natural scene images. Traditional CNNs (Convolution Neural Networks) cannot solve this problem perfectly for two reasons: one is the arbitrary aspect ratios of scene images which bring much difficulty to traditional CNNs with a fixed size image as the input. And the other is that some scripts with minor differences are easily confused because they share a subset of characters with the same shapes. We propose a novel approach combing Score CNN, Attention CNN and patches. Attention CNN is utilized to determine whether a patch is a discriminative patch and calculate the contribution weight of the discriminative patch to script identification of the whole image. Score CNN uses a discriminative patch as input and predict the score of each script type. Firstly patches with the same size are extracted from the scene images. Secondly these patches are used as inputs to Score CNN and Attention CNN to train two patch-level classifiers. Finally, the results of multiple discriminative patches extracted from the same image via the above two classifiers are fused to obtain the script type of this image. Using patches with the same size as inputs to CNN can avoid the problems caused by arbitrary aspect ratios of scene images. The trained classifiers can mine discriminative patches to accurately identify some confusing scripts. The experimental results show the good performance of our approach on four public datasets.

Download Full-text

Devanagari Text Detection From Natural Scene Images

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2020070104 ◽

2020 ◽

Vol 10 (3) ◽

pp. 44-59

Author(s):

Sankirti Sandeep Shiravale ◽

R. Jayadevan ◽

Sanjeev S. Sannakki

Keyword(s):

Edge Detection ◽

Image Understanding ◽

Text Detection ◽

Experimental Results ◽

Combined Approach ◽

Natural Scene ◽

Light Conditions ◽

The Individual ◽

Natural Scene Images ◽

Better Than

Text present in a camera captured scene images is semantically rich and can be used for image understanding. Automatic detection, extraction, and recognition of text are crucial in image understanding applications. Text detection from natural scene images is a tedious task due to complex background, uneven light conditions, multi-coloured and multi-sized font. Two techniques, namely ‘edge detection' and ‘colour-based clustering', are combined in this paper to detect text in scene images. Region properties are used for elimination of falsely generated annotations. A dataset of 1250 images is created and used for experimentation. Experimental results show that the combined approach performs better than the individual approaches.

Download Full-text

Reduced Annotation Based on Deep Active Learning for Arabic Text Detection in Natural Scene Images

10.36227/techrxiv.17327963 ◽

2021 ◽

Author(s):

Khalil Boukthir ◽

Abdulrahman M. Qahtani ◽

Omar Almutiry ◽

habib dhahri ◽

Adel Alimi

Keyword(s):

Active Learning ◽

Text Detection ◽

Training Data ◽

Arabic Text ◽

Natural Scene ◽

Novel Approach ◽

Training Samples ◽

Scene Text ◽

Text Images ◽

Natural Scene Images

<div>- A novel approach is presented to reduced annotation based on Deep Active Learning for Arabic text detection in Natural Scene Images.</div><div>- A new Arabic text images dataset (7k images) using the Google Street View service named TSVD.</div><div>- A new semi-automatic method for generating natural scene text images from the streets.</div><div>- Training samples is reduced to 1/5 of the original training size on average.</div><div>- Much less training data to achieve better dice index : 0.84</div>

Download Full-text

Reduced Annotation Based on Deep Active Learning for Arabic Text Detection in Natural Scene Images

10.36227/techrxiv.17327963.v1 ◽

2021 ◽

Author(s):

Khalil Boukthir ◽

Abdulrahman M. Qahtani ◽

Omar Almutiry ◽

habib dhahri ◽

Adel Alimi

Keyword(s):

Active Learning ◽

Text Detection ◽

Training Data ◽

Arabic Text ◽

Natural Scene ◽

Novel Approach ◽

Training Samples ◽

Scene Text ◽

Text Images ◽

Natural Scene Images

Download Full-text

A New Method for Arabic Text Detection in Natural Scene Images

International Journal of Image and Graphics ◽

10.1142/s0219467823500109 ◽

2021 ◽

Author(s):

Houda Gaddour ◽

Slim Kanoun ◽

Nicole Vincent

Keyword(s):

Color Stability ◽

Spatial Relations ◽

Document Analysis ◽

Text Detection ◽

New Method ◽

Arabic Text ◽

Natural Scene ◽

Script Identification ◽

Homogeneous Regions ◽

Natural Scene Images

Text in scene images can provide useful and vital information for content-based image analysis. Therefore, text detection and script identification in images are an important task. In this paper, we propose a new method for text detection in natural scene images, particularly for Arabic text, based on a bottom-up approach where four principal steps can be highlighted. The detection of extremely stable and homogeneous regions of interest (ROIs) is based on the Color Stability and Homogeneity Regions (CSHR) proposed technique. These regions are then labeled as textual or non-textual ROI. This identification is based on a structural approach. The textual ROIs are grouped to constitute zones according to spatial relations between them. Finally, the textual or non-textual nature of the constituted zones is refined. This last identification is based on handcrafted features and on features built from a Convolutional Neural Network (CNN) after learning. The proposed method was evaluated on the databases used for text detection in natural scene images: the competitions organized in 2017 edition of the International Conference on Document Analysis and Recognition (ICDAR2017), the Urdu-text database and our Natural Scene Image Database for Arabic Text detection (NSIDAT) database. The obtained experimental results seem to be interesting.

Download Full-text

Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images

IEEE Access ◽

10.1109/access.2019.2911964 ◽

2019 ◽

Vol 7 ◽

pp. 52669-52679 ◽

Cited By ~ 11

Author(s):

Liqiong Lu ◽

Yaohua Yi ◽

Faliang Huang ◽

Kaili Wang ◽

Qi Wang

Keyword(s):

Natural Scene ◽

Script Identification ◽

Natural Scene Images

Download Full-text

Topic Enhanced Controllable CVAE for Dialogue Generation (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7250 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13955-13956

Author(s):

Yiru Wang ◽

Pengda Si ◽

Zeyang Lei ◽

Yujiu Yang

Keyword(s):

Latent Variable ◽

Experimental Results ◽

The Other ◽

Topic Knowledge ◽

Other Hand ◽

Proposed Model ◽

The One ◽

Public Datasets

Neural generation models have shown great potential in conversation generation recently. However, these methods tend to generate uninformative or irrelevant responses. In this paper, we present a novel topic-enhanced controllable CVAE (TEC-CVAE) model to address this issue. On the one hand, the model learns the context-interactive topic knowledge through a novel multi-hop hybrid attention in the encoder. On the other hand, we design a topic-aware controllable decoder to constrain the expression of the stochastic latent variable in the CVAE to reduce irrelevant responses. Experimental results on two public datasets show that the two mechanisms synchronize to improve both relevance and diversity, and the proposed model outperforms other competitive methods.

Download Full-text

Script Identification in Natural Scene Images: A Dataset and Texture-Feature Based Performance Evaluation

Advances in Intelligent Systems and Computing - Proceedings of International Conference on Computer Vision and Image Processing ◽

10.1007/978-981-10-2107-7_28 ◽

2016 ◽

pp. 309-319 ◽

Cited By ~ 5

Author(s):

Manisha Verma ◽

Nitakshi Sood ◽

Partha Pratim Roy ◽

Balasubramanian Raman

Keyword(s):

Performance Evaluation ◽

Texture Feature ◽

Natural Scene ◽

Script Identification ◽

Feature Based ◽

Natural Scene Images

Download Full-text

Decorated Character Recognition Employing Modified SOM Matching

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.103.649 ◽

2011 ◽

Vol 103 ◽

pp. 649-657

Author(s):

Tsukasa Masuhara ◽

Hideaki Kawano ◽

Hideaki Orii ◽

Hiroshi Maeda

Keyword(s):

Character Recognition ◽

Recognition System ◽

Experimental Results ◽

Natural Scene ◽

Input Interface ◽

Annotation Method ◽

Natural Scene Images

Character recognition is a classical issue which has been devoted by a lot of researchers.Making character recognition system more widely available in natural scene images might open upinteresting possibility to use as an input interface of characters and an annotation method for images.Nevertheless, it is still difficult to recognize all sorts of fonts including decorated characters such ascharacters depicted on signboards. The decorated characters are constructed by using some specialtechniques for attracting viewers' attentions. Therefore, it is hard to obtain good recognition results bythe existingOCRs. In this paper,we propose a newcharacter recognition systemusing SOM. The SOMis employed to extract an essential structure concerning the topology from a character. The extractedtopological structure from each character is used to matching and the recognition is performed on thebasis of the topological matching. Experimental results show the effectiveness of the proposed methodin most forms of characters.

Download Full-text

Joint Optic Disc and Cup Segmentation Using Self-Supervised Multimodal Reconstruction Pre-Training

Proceedings ◽

10.3390/proceedings2020054025 ◽

2020 ◽

Vol 54 (1) ◽

pp. 25

Author(s):

Álvaro S. Hervella ◽

Lucía Ramos ◽

José Rouco ◽

Jorge Novo ◽

Marcos Ortega

Keyword(s):

Neural Networks ◽

Early Diagnosis ◽

Optic Disc ◽

Deep Neural Networks ◽

The Self ◽

Retinal Images ◽

Novel Approach ◽

Public Datasets

The analysis of the optic disc and cup in retinal images is important for the early diagnosis of glaucoma. In order to improve the joint segmentation of these relevant retinal structures, we propose a novel approach applying the self-supervised multimodal reconstruction of retinal images as pre-training for deep neural networks. The proposed approach is evaluated on different public datasets. The obtained results indicate that the self-supervised multimodal reconstruction pre-training improves the performance of the segmentation. Thus, the proposed approach presents a great potential for also improving the interpretable diagnosis of glaucoma.

Download Full-text

Reading numbers in natural scene images with convolutional neural networks

Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) ◽

10.1109/spac.2014.6982655 ◽

2014 ◽

Cited By ~ 4

Author(s):

Qiang Guo ◽

Jun Lei ◽

Dan Tu ◽

Guohui Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Natural Scene ◽

Natural Scene Images

Download Full-text