scene text recognition Latest Research Papers

Deep learning is a subfield of artificial intelligence that allows the computer to adopt and learn some new rules. Deep learning algorithms can identify images, objects, observations, texts, and other structures. In recent years, scene text recognition has inspired many researchers from the computer vision community, and still, it needs improvement because of the poor performance of existing scene recognition algorithms. This research paper proposed a novel approach for scene text recognition that integrates bidirectional LSTM and deep convolution neural networks. In the proposed method, first, the contour of the image is identified and then it is fed into the CNN. CNN is used to generate the ordered sequence of the features from the contoured image. The sequence of features is now coded using the Bi-LSTM. Bi-LSTM is a handy tool for extracting the features from the sequence of words. Hence, this paper combines the two powerful mechanisms for extracting the features from the image, and contour-based input image makes the recognition process faster, which makes this technique better compared to existing methods. The results of the proposed methodology are evaluated on MSRATD 50 dataset, SVHN dataset, vehicle number plate dataset, SVT dataset, and random datasets, and the accuracy is 95.22%, 92.25%, 96.69%, 94.58%, and 98.12%, respectively. According to quantitative and qualitative analysis, this approach is more promising in terms of accuracy and precision rate.

Download Full-text

TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance

Electronics ◽

10.3390/electronics10222780 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2780

Author(s):

Yue Tao ◽

Zhiwei Jia ◽

Runze Ma ◽

Shugong Xu

Keyword(s):

Text Recognition ◽

Context Modeling ◽

Research Attention ◽

Global Context ◽

Scene Text ◽

Text Features ◽

Three Stages ◽

The Relationship ◽

Scene Text Recognition ◽

Remarkable Progress

Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need an extra module (context modeling module) to help CNN to capture global dependencies to solve the inductive bias and strengthen the relationship between text features. Recently, the transformer has been proposed as a promising network for global context modeling by self-attention mechanism, but one of the main short-comings, when applied to recognition, is the efficiency. We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module. Furthermore, recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy. We propose to use a learnable initial embedding learned from the transformer encoder to make it adaptive to different input images. Above all, we introduce a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG), composed of three stages (transformation, feature extraction, and prediction). Extensive experiments show that our approach can achieve state-of-the-art on text recognition benchmarks.

Download Full-text

Towards Open-Set Text Recognition via Label-to-Prototype Learning

10.36227/techrxiv.16910062 ◽

2021 ◽

Author(s):

Chang Liu ◽

Chun Yang ◽

Hai-bo Qin ◽

Xiaobin Zhu ◽

Xu-Cheng Yin

Keyword(s):

Text Recognition ◽

Visual Features ◽

Training Set ◽

Learning Framework ◽

Small Impact ◽

Learning Module ◽

Open Set ◽

Scene Text ◽

Prototype Learning ◽

Scene Text Recognition

<div><br></div><div>Scene text recognition is a popular topic and can benefit various tasks. Although many methods have been proposed for the close-set text recognition challenges, they cannot be directly applied to open-set scenarios, where the evaluation set contains novel characters not appearing in the training set. Conventional methods require collecting new data and retraining the model to handle these novel characters, which is an expensive and tedious process. In this paper, we propose a label-to-prototype learning framework to handle novel characters without retraining the model. In the proposed framework, novel characters are effectively mapped to their corresponding prototypes with a label-to-prototype learning module. This module is trained on characters with seen labels and can be easily generalized to novel characters. Additionally, feature-level rectification is conducted via topology-preserving transformation, resulting in better alignments between visual features and constructed prototypes while having a reasonably small impact on model speed. A lot of experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.</div>

Download Full-text

Towards Open-Set Text Recognition via Label-to-Prototype Learning

10.36227/techrxiv.16910062.v1 ◽

2021 ◽

Author(s):

Chang Liu ◽

Chun Yang ◽

Hai-bo Qin ◽

Xiaobin Zhu ◽

Xu-Cheng Yin

Keyword(s):

Text Recognition ◽

Visual Features ◽

Training Set ◽

Learning Framework ◽

Small Impact ◽

Learning Module ◽

Open Set ◽

Scene Text ◽

Prototype Learning ◽

Scene Text Recognition

<div><br></div><div>Scene text recognition is a popular topic and can benefit various tasks. Although many methods have been proposed for the close-set text recognition challenges, they cannot be directly applied to open-set scenarios, where the evaluation set contains novel characters not appearing in the training set. Conventional methods require collecting new data and retraining the model to handle these novel characters, which is an expensive and tedious process. In this paper, we propose a label-to-prototype learning framework to handle novel characters without retraining the model. In the proposed framework, novel characters are effectively mapped to their corresponding prototypes with a label-to-prototype learning module. This module is trained on characters with seen labels and can be easily generalized to novel characters. Additionally, feature-level rectification is conducted via topology-preserving transformation, resulting in better alignments between visual features and constructed prototypes while having a reasonably small impact on model speed. A lot of experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets.</div>

Download Full-text