SEMPANet: A Modified Path Aggregation Network with Squeeze-Excitation for Scene Text Detection

Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network’s latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.

Download Full-text

Scene Text Detection with Supervised Pyramid Context Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019038 ◽

2019 ◽

Vol 33 ◽

pp. 9038-9045 ◽

Cited By ~ 30

Author(s):

Enze Xie ◽

Yuhang Zang ◽

Shuai Shao ◽

Gang Yu ◽

Cong Yao ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Text Detection ◽

Detection Methods ◽

Natural Scenes ◽

The Past ◽

Scene Text Detection ◽

Scene Text ◽

Previous State ◽

Feature Pyramid

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on

Download Full-text

A decade: Review of scene text detection methods

Computer Science Review ◽

10.1016/j.cosrev.2021.100434 ◽

2021 ◽

Vol 42 ◽

pp. 100434

Author(s):

Ednawati Rainarli ◽

Suprapto ◽

Wahyono

Keyword(s):

Text Detection ◽

Detection Methods ◽

Scene Text Detection ◽

Scene Text

Download Full-text

Scene Text Detection Using Context-Aware Pyramid Feature Extraction

2020 International Conference on Computing and Data Science (CDS) ◽

10.1109/cds49703.2020.00053 ◽

2020 ◽

Author(s):

Qishu Jian

Keyword(s):

Feature Extraction ◽

Text Detection ◽

Context Aware ◽

Scene Text Detection ◽

Scene Text

Download Full-text

Multi-Lingual Scene Text Detection Using One-Class Classifier

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2019040104 ◽

2019 ◽

Vol 9 (2) ◽

pp. 48-65 ◽

Cited By ~ 6

Author(s):

Anirban Mukhopadhyay ◽

Sourav Kumar ◽

Souvik Roy Chowdhury ◽

Neelotpal Chakraborty ◽

Ayatullah Faruk Mollah ◽

...

Keyword(s):

Gabor Filter ◽

False Positives ◽

Text Detection ◽

Detection Methods ◽

Stroke Width ◽

Scene Text Detection ◽

Scene Text ◽

Stroke Width Transform ◽

One Class Classifier ◽

Occurrence Matrix

The main purpose of scene text recognition is to detect texts in a given image. The problem of text detection and recognition in such images has gained great attention in recent years due to rising demand of several applications like visual based applications, multimedia and content-based retrieval. Due to low accuracies of existing scene text detection methods, an improved pipeline is developed for text localizing task. First, candidate text regions are generated using Maximally Stable Extremal Region and Stroke Width Transform methods that capture true positives along with many false positives. A One Class Classifier is trained to label the candidate regions obtained, as text or non-text, which in this case is suitable as non-text class cannot be adequately represented to train a binary classifier. The one class classifier is trained with some popular feature descriptors like Histogram of Oriented Gradients, Grey Level Co-Occurrence Matrix, Discrete Cosine Transform and Gabor filter. Experimental results show high recall for text containing regions and reducing false positives.

Download Full-text

A Straightforward and Efficient Instance-Aware Curved Text Detector

Sensors ◽

10.3390/s21061945 ◽

2021 ◽

Vol 21 (6) ◽

pp. 1945

Author(s):

Fan Zhao ◽

Sidi Shao ◽

Lin Zhang ◽

Zhiquan Wen

Keyword(s):

High Speed ◽

Optimization Problem ◽

Text Detection ◽

Excellent Performance ◽

Swarm Optimization ◽

Shape Approximation ◽

Scene Text Detection ◽

Scene Text ◽

Curved Text ◽

Fine Tune

A challenging aspect of scene text detection is to handle curved texts. In order to avoid the tedious manual annotations for training curve text detector, and to overcome the limitation of regression-based text detectors to irregular text, we introduce straightforward and efficient instance-aware curved scene text detector, namely, look more than twice (LOMT), which makes the regression-based text detection results gradually change from loosely bounded box to compact polygon. LOMT mainly composes of curve text shape approximation module and component merging network. The shape approximation module uses a particle swarm optimization-based text shape approximation method (called PSO-TSA) to fine-tune the quadrilateral text detection results to fit the curved text. The component merging network merges incomplete text sub-parts of text instances into more complete polygon through instance awareness, called ICMN. Experiments on five text datasets demonstrate that our method not only achieves excellent performance but also has relatively high speed. Ablation experiments show that PSO-TSA can solve the text’s shape optimization problem efficiently, and ICMN has a satisfactory merger effect.

Download Full-text

Omnidirectional Scene Text Detection with Sequential-free Box Discretization

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/423 ◽

2019 ◽

Cited By ~ 9

Author(s):

Yuliang Liu ◽

Sheng Zhang ◽

Lianwen Jin ◽

Lele Xie ◽

Yaqiang Wu ◽

...

Keyword(s):

State Of The Art ◽

Detection Performance ◽

Text Detection ◽

Detection Methods ◽

Bounding Box ◽

Scene Text Detection ◽

Scene Text ◽

Art Methods ◽

In The Wild ◽

Ablation Study

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.

Download Full-text

TextFuseNet: Scene Text Detection with Richer Fused Features

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/72 ◽

2020 ◽

Author(s):

Jian Ye ◽

Zhe Chen ◽

Juhua Liu ◽

Bo Du

Keyword(s):

Text Detection ◽

Feature Representation ◽

Natural Scenes ◽

Weak Supervision ◽

Feature Representations ◽

General Semantics ◽

Scene Text Detection ◽

Scene Text ◽

Multi Level ◽

Different Levels

Arbitrary shape text detection in natural scenes is an extremely challenging task. Unlike existing text detection approaches that only perceive texts based on limited feature representations, we propose a novel framework, namely TextFuseNet, to exploit the use of richer features fused for text detection. More specifically, we propose to perceive texts from three levels of feature representations, i.e., character-, word- and global-level, and then introduce a novel text representation fusion technique to help achieve robust arbitrary text detection. The multi-level feature representation can adequately describe texts by dissecting them into individual characters while still maintaining their general semantics. TextFuseNet then collects and merges the texts’ features from different levels using a multi-path fusion architecture which can effectively align and fuse different representations. In practice, our proposed TextFuseNet can learn a more adequate description of arbitrary shapes texts, suppressing false positives and producing more accurate detection results. Our proposed framework can also be trained with weak supervision for those datasets that lack character-level annotations. Experiments on several datasets show that the proposed TextFuseNet achieves state-of-the-art performance. Specifically, we achieve an F-measure of 94.3% on ICDAR2013, 92.1% on ICDAR2015, 87.1% on Total-Text and 86.6% on CTW-1500, respectively.

Download Full-text

DeRPN: Taking a Further Step toward More General Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019046 ◽

2019 ◽

Vol 33 ◽

pp. 9046-9053 ◽

Cited By ~ 6

Author(s):

Lele Xie ◽

Yuliang Liu ◽

Lianwen Jin ◽

Zecheng Xie

Keyword(s):

Object Detection ◽

Text Detection ◽

Detection Methods ◽

Scene Text Detection ◽

Scene Text ◽

Dimension Decomposition ◽

Object Shapes ◽

General Object ◽

Current Detection ◽

Proper Setting

Most current detection methods have adopted anchor boxes as regression references. However, the detection performance is sensitive to the setting of the anchor boxes. A proper setting of anchor boxes may vary significantly across different datasets, which severely limits the universality of the detectors. To improve the adaptivity of the detectors, in this paper, we present a novel dimension-decomposition region proposal network (DeRPN) that can perfectly displace the traditional Region Proposal Network (RPN). DeRPN utilizes an anchor string mechanism to independently match object widths and heights, which is conducive to treating variant object shapes. In addition, a novel scale-sensitive loss is designed to address the imbalanced loss computations of different scaled objects, which can avoid the small objects being overwhelmed by larger ones. Comprehensive experiments conducted on both general object detection datasets (Pascal VOC 2007, 2012 and MS COCO) and scene text detection datasets (ICDAR 2013 and COCO-Text) all prove that our DeRPN can significantly outperform RPN. It is worth mentioning that the proposed DeRPN can be employed directly on different models, tasks, and datasets without any modifications of hyperparameters or specialized optimization, which further demonstrates its adaptivity. The code has been released at https://github.com/HCIILAB/DeRPN.

Download Full-text

Texts as Lines: Text Detection with Weak Supervision

Mathematical Problems in Engineering ◽

10.1155/2020/3871897 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Weijia Wu ◽

Jici Xing ◽

Cheng Yang ◽

Yuxing Wang ◽

Hong Zhou

Keyword(s):

Synthetic Data ◽

Text Detection ◽

Detection Methods ◽

Deep Convolutional Neural Networks ◽

Weak Supervision ◽

Annotation Information ◽

Scene Text Detection ◽

Supervised Methods ◽

Weakly Supervised ◽

Curved Text

Scene text detection methods based on deep learning have recently shown remarkable improvement. Most text detection methods train deep convolutional neural networks with full masks requiring pixel accuracy for good quality training. Normally, a skilled engineer needs to drag tens of points to create a full mask for the curved text. Therefore, data labelling based on full masks is time consuming and laborious, particularly for curved texts. To reduce the labelling cost, a weakly supervised method is first proposed in this paper. Unlike the other detectors (e.g., PSENet or TextSnake) that use full masks, our method only needs coarse masks for training. More specifically, the coarse mask for one text instance is a line across the text region in our method. Compared with full mask labelling, data labelling using the proposed method could save labelling time while losing much annotation information. In this context, a network pretrained on synthetic data with full masks is used to enhance the coarse masks in a real image. Finally, the enhanced masks are fed back to train our network. Analysis of experiments performed using the model shows that the performance of our method is close to that of the fully supervised methods on ICDAR2015, CTW1500, Total-Text, and MSRA-TD5000.

Download Full-text

PSENet-based efficient scene text detection

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00808-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Guanglong Liao ◽

Zhongjie Zhu ◽

Yongqiang Bai ◽

Tingna Liu ◽

Zhibo Xie

Keyword(s):

Feature Extraction ◽

Text Detection ◽

Residual Network ◽

Detection Scheme ◽

Backbone Network ◽

Scene Text Detection ◽

Scene Text ◽

Text Information ◽

Computer Vision Applications ◽

Extract Information

AbstractText detection is a key technique and plays an important role in computer vision applications, but efficient and precise text detection is still challenging. In this paper, an efficient scene text detection scheme is proposed based on the Progressive Scale Expansion Network (PSENet). A Mixed Pooling Module (MPM) is designed to effectively capture the dependence of text information at different distances, where different pooling operations are employed to better extract information of text shape. The backbone network is optimized by combining two extensions of the Residual Network (ResNet), i.e., ResNeXt and Res2Net, to enhance feature extraction effectiveness. Experimental results show that the precision of our scheme is improved more than by 5% compared with the original PSENet.

Download Full-text