Method for Detecting Chinese Texts in Natural Scenes Based on Improved Faster R-CNN

Author(s):  
Shuhua Liu ◽  
Hua Ban ◽  
Yu Song ◽  
Mengyu Zhang ◽  
Fengqin Yang

In this study, a natural scene text detection method based on the improved faster region-based convolutional neural network (R-CNN) is proposed. This method extracts image features with the Inception-ResNet architecture, adopts a region proposal network to generate region proposals for the extracted features, merges the fine-tuned features with the region proposals, and finally, uses Fast R-CNN to classify and locate text. The proposed method solves the problems of varying text sizes and the text being obscured in the image. Compared with the original Faster R-CNN, the multilevel Inception-ResNet network model presented in this study can extract deeper text features. The extracted feature map is further sparsely represented by Reduction B, Inception ResNet C and Avg Pool, and then is fused with text regions obtained by the text feature mapping lower layer network to acquire the exact text regions. The text detection method presented in this study is tested on the 2017 dataset of ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17), which contains a large number of distorted, blurry, different scale and size texts. An accuracy of 76.4% is achieved in this platform, thereby proving the efficiency of the proposed method.

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 122685-122694
Author(s):  
Xiao Qin ◽  
Jianhui Jiang ◽  
Chang-An Yuan ◽  
Shaojie Qiao ◽  
Wei Fan

Author(s):  
Enze Xie ◽  
Yuhang Zang ◽  
Shuai Shao ◽  
Gang Yu ◽  
Cong Yao ◽  
...  

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on


Author(s):  
Jincheng Li ◽  
Yusheng Hao ◽  
Weilan Wang ◽  
Tiejun Wang ◽  
Qiaoqiao Li

Scene text detection is an important research branch of artificial intelligence technology whose goal is to locate text in scene images. In the Tibetan areas of China, scene images containing both Tibetan and Chinese texts are ubiquitous. Thus, detecting bilingual Tibetan-Chinese scene texts is important in promoting intelligent applications for minority languages. In this study, a scene text detection database for bilingual Tibetan-Chinese is constructed using a manually labeled method and an automatic synthesis method, and a text detection method is proposed. First, we predict a text rectangular region and the text center region for each text instance and simultaneously learned the expansion distance of the text center region. Second, based on the classification score of the text center region and the text rectangular region, we obtain the final confidence of each text instance and then filter out the text center region with lower confidence. Third, through the learned expansion distance, the full-text instance from the remaining text center region is recovered. The results show that our method obtains good detection performance; it achieves an accuracy of up to 75.47% during the text detection phase, laying the foundation for scene text recognition in the subsequent step.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Lin Li ◽  
Shengsheng Yu ◽  
Luo Zhong ◽  
Xiaozhen Li

Multilingual text detection in natural scenes is still a challenging task in computer vision. In this paper, we apply an unsupervised learning algorithm to learn language-independent stroke feature and combine unsupervised stroke feature learning and automatically multilayer feature extraction to improve the representational power of text feature. We also develop a novel nonlinear network based on traditional Convolutional Neural Network that is able to detect multilingual text regions in the images. The proposed method is evaluated on standard benchmarks and multilingual dataset and demonstrates improvement over the previous work.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2657
Author(s):  
Shuangshuang Li ◽  
Wenming Cao

Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network’s latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.


2015 ◽  
Vol 9 (4) ◽  
pp. 500-510 ◽  
Author(s):  
Gang Zhou ◽  
Yuehu Liu ◽  
Liang Xu ◽  
Zhenhong Jia

2021 ◽  
pp. 139-153
Author(s):  
Gangyan Zeng ◽  
Yuan Zhang ◽  
Yu Zhou ◽  
Xiaomeng Yang

Sign in / Sign up

Export Citation Format

Share Document