scholarly journals All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting

2020 ◽  
Vol 34 (07) ◽  
pp. 12160-12167 ◽  
Author(s):  
Hao Wang ◽  
Pu Lu ◽  
Hui Zhang ◽  
Mingkun Yang ◽  
Xiang Bai ◽  
...  

Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision. Different from the existing approaches that formulate text detection as bounding box extraction or instance segmentation, we localize a set of points on the boundary of each text instance. With the representation of such boundary points, we establish a simple yet effective scheme for end-to-end text spotting, which can read the text of arbitrary shapes. Experiments on three challenging datasets, including ICDAR2015, TotalText and COCO-Text demonstrate that the proposed method consistently surpasses the state-of-the-art in both scene text detection and end-to-end text recognition tasks.

2021 ◽  
pp. 1-11
Author(s):  
Guangcun Wei ◽  
Wansheng Rong ◽  
Yongquan Liang ◽  
Xinguang Xiao ◽  
Xiang Liu

Aiming at the problem that the traditional OCR processing method ignores the inherent connection between the text detection task and the text recognition task, This paper propose a novel end-to-end text spotting framework. The framework includes three parts: shared convolutional feature network, text detector and text recognizer. By sharing convolutional feature network, the text detection network and the text recognition network can be jointly optimized at the same time. On the one hand, it can reduce the computational burden; on the other hand, it can effectively use the inherent connection between text detection and text recognition. This model add the TCM (Text Context Module) on the basis of Mask RCNN, which can effectively solve the negative sample problem in text detection tasks. This paper propose a text recognition model based on the SAM-BiLSTM (spatial attention mechanism with BiLSTM), which can more effectively extract the semantic information between characters. This model significantly surpasses state-of-the-art methods on a number of text detection and text spotting benchmarks, including ICDAR 2015, Total-Text.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Weijia Wu ◽  
Jici Xing ◽  
Cheng Yang ◽  
Yuxing Wang ◽  
Hong Zhou

The performance of text detection is crucial for the subsequent recognition task. Currently, the accuracy of the text detector still needs further improvement, particularly those with irregular shapes in a complex environment. We propose a pixel-wise method based on instance segmentation for scene text detection. Specifically, a text instance is split into five components: a Text Skeleton and four Directional Pixel Regions, then restoring itself based on these elements and receiving supplementary information from other areas when one fails. Besides, a Confidence Scoring Mechanism is designed to filter characters similar to text instances. Experiments on several challenging benchmarks demonstrate that our method achieves state-of-the-art results in scene text detection with an F-measure of 84.6% on Total-Text and 86.3% on CTW1500.


2020 ◽  
Vol 34 (07) ◽  
pp. 11899-11907 ◽  
Author(s):  
Liang Qiao ◽  
Sanli Tang ◽  
Zhanzhan Cheng ◽  
Yunlu Xu ◽  
Yi Niu ◽  
...  

Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following text recognition part mainly because of two reasons: 1) recognizing arbitrary shaped text is still a challenging task, and 2) prevalent non-trainable pipeline strategies between text detection and text recognition will lead to suboptimal performances. To handle this incompatibility problem, in this paper we propose an end-to-end trainable text spotting approach named Text Perceptron. Concretely, Text Perceptron first employs an efficient segmentation-based text detector that learns the latent text reading order and boundary information. Then a novel Shape Transform Module (abbr. STM) is designed to transform the detected feature regions into regular morphologies without extra parameters. It unites text detection and the following recognition part into a whole framework, and helps the whole network achieve global optimization. Experiments show that our method achieves competitive performance on two standard text benchmarks, i.e., ICDAR 2013 and ICDAR 2015, and also obviously outperforms existing methods on irregular text benchmarks SCUT-CTW1500 and Total-Text.


Author(s):  
Dibyajyoti Dhar ◽  
Neelotpal Chakraborty ◽  
Sayan Choudhury ◽  
Ashis Paul ◽  
Ayatullah Faruk Mollah ◽  
...  

Text detection in natural scene images is an interesting problem in the field of information retrieval. Several methods have been proposed over the past few decades for scene text detection. However, the robustness and efficiency of these methods are downgraded due to high sensitivity towards various complexities of an image. Also, in multi-lingual environment where texts may occur in multiple languages, a method may not be suitable for detecting scene texts in certain languages. To counter these challenges, a gradient morphology-based method is proposed in this paper that proves to be robust against image complexities and efficiently detects scene texts irrespective of their languages. The method is validated using low quality images from standard multi-lingual datasets like MSRA-TD500 and MLe2e. The performance of the method is compared with that of some state-of-the-art methods, and comparably better results are observed.


Author(s):  
Rajae Moumen ◽  
Raddouane Chiheb ◽  
Rdouan Faizi

The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.


Author(s):  
Enze Xie ◽  
Yuhang Zang ◽  
Shuai Shao ◽  
Gang Yu ◽  
Cong Yao ◽  
...  

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on


Author(s):  
Yatesh Manghate ◽  
Prabha Nair ◽  
Pranita Chaudhary ◽  
Mitali Mishra ◽  
Ninad Bhivgade ◽  
...  

In the recent period, many real-world applications and institutions generates a huge amount of data which is unstructured i.e., in the form of images containing data, receipts, invoices, forms, statements, contracts etc. This rich and detailed information presented in the text is of great significance in computer vision-based applications (driverless cars, assisting blind and visually impaired people, detecting labels and packages, automatic number plate recognition etc.). Recently, there has been a hike in the efforts, researches and progresses being done in this domain for its significance in data analysis and computer vision. Here has been a diversity of challenges in unstructured data like image sensor noise, different viewing angles, blur, lighting conditions, resolution, and non-planar object. Our objective for taking up this topic for research are i) to detect and recognize the text from the data ii) to handle diversity and variability of text in natural scene iii) to explore various datasets iv) to deal with various issues occurring in scene text detection. To tackle this problem, we propose a robust scene text detection and recognition method with adaptive text region representation using deep learning model open CV with EAST algorithm as detection pipeline and tesseract. The recurrent neural network-based adaptive text region representation is proposed for text region refinement, where a pair of boundary points are predicted each time step until no new points are found. In this way, text regions in an image are detected and represented with the adaptive number of boundary points.


2020 ◽  
Vol 34 (07) ◽  
pp. 11474-11481 ◽  
Author(s):  
Minghui Liao ◽  
Zhaoyi Wan ◽  
Cong Yao ◽  
Kai Chen ◽  
Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.


Sign in / Sign up

Export Citation Format

Share Document