CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.

Download Full-text

Scene text detection with improved receptive field and adaptive feature fusion

10.1117/12.2604527 ◽

2021 ◽

Author(s):

Liangjun Wang ◽

Weijie Gu ◽

Yuhang Ji

Keyword(s):

Receptive Field ◽

Feature Fusion ◽

Text Detection ◽

Scene Text Detection ◽

Scene Text

Download Full-text

Scene Text Detection with Supervised Pyramid Context Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019038 ◽

2019 ◽

Vol 33 ◽

pp. 9038-9045 ◽

Cited By ~ 30

Author(s):

Enze Xie ◽

Yuhang Zang ◽

Shuai Shao ◽

Gang Yu ◽

Cong Yao ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Text Detection ◽

Detection Methods ◽

Natural Scenes ◽

The Past ◽

Scene Text Detection ◽

Scene Text ◽

Previous State ◽

Feature Pyramid

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on

Download Full-text