scholarly journals Real-Time Scene Text Detection with Differentiable Binarization

2020 ◽  
Vol 34 (07) ◽  
pp. 11474-11481 ◽  
Author(s):  
Minghui Liao ◽  
Zhaoyi Wan ◽  
Cong Yao ◽  
Kai Chen ◽  
Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.

2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Weijia Wu ◽  
Jici Xing ◽  
Cheng Yang ◽  
Yuxing Wang ◽  
Hong Zhou

The performance of text detection is crucial for the subsequent recognition task. Currently, the accuracy of the text detector still needs further improvement, particularly those with irregular shapes in a complex environment. We propose a pixel-wise method based on instance segmentation for scene text detection. Specifically, a text instance is split into five components: a Text Skeleton and four Directional Pixel Regions, then restoring itself based on these elements and receiving supplementary information from other areas when one fails. Besides, a Confidence Scoring Mechanism is designed to filter characters similar to text instances. Experiments on several challenging benchmarks demonstrate that our method achieves state-of-the-art results in scene text detection with an F-measure of 84.6% on Total-Text and 86.3% on CTW1500.


Author(s):  
Dibyajyoti Dhar ◽  
Neelotpal Chakraborty ◽  
Sayan Choudhury ◽  
Ashis Paul ◽  
Ayatullah Faruk Mollah ◽  
...  

Text detection in natural scene images is an interesting problem in the field of information retrieval. Several methods have been proposed over the past few decades for scene text detection. However, the robustness and efficiency of these methods are downgraded due to high sensitivity towards various complexities of an image. Also, in multi-lingual environment where texts may occur in multiple languages, a method may not be suitable for detecting scene texts in certain languages. To counter these challenges, a gradient morphology-based method is proposed in this paper that proves to be robust against image complexities and efficiently detects scene texts irrespective of their languages. The method is validated using low quality images from standard multi-lingual datasets like MSRA-TD500 and MLe2e. The performance of the method is compared with that of some state-of-the-art methods, and comparably better results are observed.


Author(s):  
Rajae Moumen ◽  
Raddouane Chiheb ◽  
Rdouan Faizi

The aim of this research is to propose a fully convolutional approach to address the problem of real-time scene text detection for Arabic language. Text detection is performed using a two-steps multi-scale approach. The first step uses light-weighted fully convolutional network: TextBlockDetector FCN, an adaptation of VGG-16 to eliminate non-textual elements, localize wide scale text and give text scale estimation. The second step determines narrow scale range of text using fully convolutional network for maximum performance. To evaluate the system, we confront the results of the framework to the results obtained with single VGG-16 fully deployed for text detection in one-shot; in addition to previous results in the state-of-the-art. For training and testing, we initiate a dataset of 575 images manually processed along with data augmentation to enrich training process. The system scores a precision of 0.651 vs 0.64 in the state-of-the-art and a FPS of 24.3 vs 31.7 for a VGG-16 fully deployed.


Author(s):  
Enze Xie ◽  
Yuhang Zang ◽  
Shuai Shao ◽  
Gang Yu ◽  
Cong Yao ◽  
...  

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on


2020 ◽  
Vol 34 (07) ◽  
pp. 12160-12167 ◽  
Author(s):  
Hao Wang ◽  
Pu Lu ◽  
Hui Zhang ◽  
Mingkun Yang ◽  
Xiang Bai ◽  
...  

Recently, end-to-end text spotting that aims to detect and recognize text from cluttered images simultaneously has received particularly growing interest in computer vision. Different from the existing approaches that formulate text detection as bounding box extraction or instance segmentation, we localize a set of points on the boundary of each text instance. With the representation of such boundary points, we establish a simple yet effective scheme for end-to-end text spotting, which can read the text of arbitrary shapes. Experiments on three challenging datasets, including ICDAR2015, TotalText and COCO-Text demonstrate that the proposed method consistently surpasses the state-of-the-art in both scene text detection and end-to-end text recognition tasks.


Author(s):  
Yuliang Liu ◽  
Sheng Zhang ◽  
Lianwen Jin ◽  
Lele Xie ◽  
Yaqiang Wu ◽  
...  

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.


Author(s):  
Qiangpeng Yang ◽  
Mengli Cheng ◽  
Wenmeng Zhou ◽  
Yan Chen ◽  
Minghui Qiu ◽  
...  

Incidental scene text detection, especially for multi-oriented text regions, is one of the most challenging tasks in many computer vision applications.Different from the common object detection task, scene text often suffers from a large variance of aspect ratio, scale, and orientation. To solve this problem, we propose a novel end-to-end scene text detector IncepText from an instance-aware segmentation perspective. We design a novel Inception-Text module and introduce deformable PSROI pooling to deal with multi-oriented text detection. Extensive experiments on ICDAR2015, RCTW-17, and MSRA-TD500 datasets demonstrate our method's superiority in terms of both effectiveness and efficiency. Our proposed method achieves 1st place result on ICDAR2015 challenge and the state-of-the-art performance on other datasets. Moreover, we have released our implementation as an OCR product which is available for public access.


2020 ◽  
Vol 12 (11) ◽  
pp. 200
Author(s):  
Haiyan Li ◽  
Hongtao Lu

Text detection is a prerequisite for text recognition in scene images. Previous segmentation-based methods for detecting scene text have already achieved a promising performance. However, these kinds of approaches may produce spurious text instances, as they usually confuse the boundary of dense text instances, and then infer word/text line instances relying heavily on meticulous heuristic rules. We propose a novel Assembling Text Components (AT-text) that accurately detects dense text in scene images. The AT-text localizes word/text line instances in a bottom-up mechanism by assembling a parsimonious component set. We employ a segmentation model that encodes multi-scale text features, considerably improving the classification accuracy of text/non-text pixels. The text candidate components are finely classified and selected via discriminate segmentation results. This allows the AT-text to efficiently filter out false-positive candidate components, and then to assemble the remaining text components into different text instances. The AT-text works well on multi-oriented and multi-language text without complex post-processing and character-level annotation. Compared with the existing works, it achieves satisfactory results and a considerable balance between precision and recall without a large margin in ICDAR2013 and MSRA-TD 500 public benchmark datasets.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1956
Author(s):  
Dongping Cao ◽  
Yong Zhong ◽  
Lishun Wang ◽  
Yilong He ◽  
Jiachen Dang

Scene text detection is attracting more and more attention and has become an important topic in machine vision research. With the development of mobile IoT (Internet of things) and deep learning technology, text detection research has made significant progress. This survey aims to summarize and analyze the main challenges and significant progress in scene text detection research. In this paper, we first introduce the history and progress of scene text detection and classify the traditional methods and deep learning-based methods in detail, pointing out the corresponding key issues and techniques. Then, we introduce commonly used benchmark datasets and evaluation protocols and identify state-of-the-art algorithms by comparison. Finally, we summarize and predict potential future research directions.


Sign in / Sign up

Export Citation Format

Share Document