scene text detection
Recently Published Documents


TOTAL DOCUMENTS

323
(FIVE YEARS 188)

H-INDEX

25
(FIVE YEARS 10)

2021 ◽  
Vol 12 (3) ◽  
pp. 484-489
Author(s):  
Francisca O Nwokoma ◽  
Juliet N Odii ◽  
Ikechukwu I Ayogu ◽  
James C Ogbonna

Camera-based scene text detection and recognition is a research area that has attracted countless attention and had made noticeable progress in the area of deep learning technology, computer vision, and pattern recognition. They are highly recommended for capturing text on-scene images (signboards), documents with a multipart and complex background, images on thick books and documents that are highly fragile. This technology encourages real-time processing since handheld cameras are built with very high processing speed and internal memory, are quite easy and flexible to use than the traditional scanner whose usability is limited as they are not portable in size and cannot be used on images captured by cameras. However, characters captured by traditional scanners pose fewer computational difficulties as compared to camera captured images that are associated with divers’ challenges with consequences of high computational complexity and recognition difficulties. This paper, therefore, reviews the various factors that increase the computational difficulties of Camera-Based OCR, and made some recommendations as per the best practices for Camera-Based OCR systems.


Information ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 524
Author(s):  
Yuan Li ◽  
Mayire Ibrayim ◽  
Askar Hamdulla

In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks. However, due to the limitation of the receptive field of the central nervous system and the simple representation of text by using rectangular bounding boxes, the previous methods may be insufficient for working with more challenging instances of text. To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net). The framework is based on the lightweight backbone network Resnet, and the feature learning is enhanced by embedding the depth weighted convolution module (DWCM) while retaining the original feature information extracted by CNN. At the same time, the 3D-Attention module is also introduced to merge the context information of adjacent areas, so as to refine the features in each spatial size. In addition, because the Feature Pyramid Network (FPN) cannot completely solve the interdependence problem by simple element-wise addition to process cross-layer information flow, this paper introduces a Cross-Level Feature Fusion Module (CLFFM) based on FPN, which is called Cross-Level Feature Pyramid Network (Cross-Level FPN). The proposed CLFFM can better handle cross-layer information flow and output detailed feature information, thus improving the accuracy of text region detection. Compared to the original network framework, the framework provides a more advanced performance in detecting text images of complex scenes, and extensive experiments on three challenging datasets validate the realizability of our approach.


Author(s):  
Jincheng Li ◽  
Yusheng Hao ◽  
Weilan Wang ◽  
Tiejun Wang ◽  
Qiaoqiao Li

Scene text detection is an important research branch of artificial intelligence technology whose goal is to locate text in scene images. In the Tibetan areas of China, scene images containing both Tibetan and Chinese texts are ubiquitous. Thus, detecting bilingual Tibetan-Chinese scene texts is important in promoting intelligent applications for minority languages. In this study, a scene text detection database for bilingual Tibetan-Chinese is constructed using a manually labeled method and an automatic synthesis method, and a text detection method is proposed. First, we predict a text rectangular region and the text center region for each text instance and simultaneously learned the expansion distance of the text center region. Second, based on the classification score of the text center region and the text rectangular region, we obtain the final confidence of each text instance and then filter out the text center region with lower confidence. Third, through the learned expansion distance, the full-text instance from the remaining text center region is recovered. The results show that our method obtains good detection performance; it achieves an accuracy of up to 75.47% during the text detection phase, laying the foundation for scene text recognition in the subsequent step.


2021 ◽  
Vol 42 ◽  
pp. 100434
Author(s):  
Ednawati Rainarli ◽  
Suprapto ◽  
Wahyono

2021 ◽  
pp. 107767
Author(s):  
Yuanhong Zhong ◽  
Xinyu Cheng ◽  
Tao Chen ◽  
Jing Zhang ◽  
Zhaokun Zhou ◽  
...  

Author(s):  
Xiangyu Zhang ◽  
Jianfeng Yang ◽  
Xiumei Li ◽  
Minghao Liu ◽  
Ruichun Kang ◽  
...  

Author(s):  
Guanglong Liao ◽  
Zhongjie Zhu ◽  
Yongqiang Bai ◽  
Tingna Liu ◽  
Zhibo Xie

AbstractText detection is a key technique and plays an important role in computer vision applications, but efficient and precise text detection is still challenging. In this paper, an efficient scene text detection scheme is proposed based on the Progressive Scale Expansion Network (PSENet). A Mixed Pooling Module (MPM) is designed to effectively capture the dependence of text information at different distances, where different pooling operations are employed to better extract information of text shape. The backbone network is optimized by combining two extensions of the Residual Network (ResNet), i.e., ResNeXt and Res2Net, to enhance feature extraction effectiveness. Experimental results show that the precision of our scheme is improved more than by 5% compared with the original PSENet.


Sign in / Sign up

Export Citation Format

Share Document