PSENet-based efficient scene text detection

AbstractText detection is a key technique and plays an important role in computer vision applications, but efficient and precise text detection is still challenging. In this paper, an efficient scene text detection scheme is proposed based on the Progressive Scale Expansion Network (PSENet). A Mixed Pooling Module (MPM) is designed to effectively capture the dependence of text information at different distances, where different pooling operations are employed to better extract information of text shape. The backbone network is optimized by combining two extensions of the Residual Network (ResNet), i.e., ResNeXt and Res2Net, to enhance feature extraction effectiveness. Experimental results show that the precision of our scheme is improved more than by 5% compared with the original PSENet.

Download Full-text

Scene Text Detection Using Context-Aware Pyramid Feature Extraction

2020 International Conference on Computing and Data Science (CDS) ◽

10.1109/cds49703.2020.00053 ◽

2020 ◽

Author(s):

Qishu Jian

Keyword(s):

Feature Extraction ◽

Text Detection ◽

Context Aware ◽

Scene Text Detection ◽

Scene Text

Download Full-text

MFECN: Multi-level Feature Enhanced Cumulative Network for Scene Text Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3440087 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-22

Author(s):

Zhandong Liu ◽

Wengang Zhou ◽

Houqiang Li

Keyword(s):

Text Detection ◽

Detection Algorithms ◽

Features Fusion ◽

Irregular Shapes ◽

Scene Text Detection ◽

Scene Text ◽

Text Information ◽

Multi Level ◽

Public Datasets ◽

High Level

Recently, many scene text detection algorithms have achieved impressive performance by using convolutional neural networks. However, most of them do not make full use of the context among the hierarchical multi-level features to improve the performance of scene text detection. In this article, we present an efficient multi-level features enhanced cumulative framework based on instance segmentation for scene text detection. At first, we adopt a Multi-Level Features Enhanced Cumulative ( MFEC ) module to capture features of cumulative enhancement of representational ability. Then, a Multi-Level Features Fusion ( MFF ) module is designed to fully integrate both high-level and low-level MFEC features, which can adaptively encode scene text information. To verify the effectiveness of the proposed method, we perform experiments on six public datasets (namely, CTW1500, Total-text, MSRA-TD500, ICDAR2013, ICDAR2015, and MLT2017), and make comparisons with other state-of-the-art methods. Experimental results demonstrate that the proposed Multi-Level Features Enhanced Cumulative Network (MFECN) detector can well handle scene text instances with irregular shapes (i.e., curved, oriented, and horizontal) and achieves better or comparable results.

Download Full-text

SEMPANet: A Modified Path Aggregation Network with Squeeze-Excitation for Scene Text Detection

Sensors ◽

10.3390/s21082657 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2657

Author(s):

Shuangshuang Li ◽

Wenming Cao

Keyword(s):

Feature Extraction ◽

Early Stage ◽

Text Detection ◽

Detection Methods ◽

Good Effect ◽

Natural Scenes ◽

Training Time ◽

Scene Text Detection ◽

Scene Text ◽

Curved Text

Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network’s latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.

Download Full-text