Multi-Directional Scene Text Detection Based on Improved YOLOv3

To address the problem of low detection rate caused by the close alignment and multi-directional position of text words in practical application and the need to improve the detection speed of the algorithm, this paper proposes a multi-directional text detection algorithm based on improved YOLOv3, and applies it to natural text detection. To detect text in multiple directions, this paper introduces a method of box definition based on sliding vertices. Then, a new rotating box loss function MD-Closs based on CIOU is proposed to improve the detection accuracy. In addition, a step-by-step NMS method is used to further reduce the amount of calculation. Experimental results show that on the ICDAR 2015 data set, the accuracy rate is 86.2%, the recall rate is 81.9%, and the timeliness is 21.3 fps, which shows that the proposed algorithm has a good detection effect on text detection in natural scenes.

Download Full-text

A novel scene text detection algorithm based on convolutional neural network

2016 Visual Communications and Image Processing (VCIP) ◽

10.1109/vcip.2016.7805444 ◽

2016 ◽

Cited By ~ 3

Author(s):

Xiaohang Ren ◽

Kai Chen ◽

Xiaokang Yang ◽

Yi Zhou ◽

Jianhua He ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Detection Algorithm ◽

Text Detection ◽

Scene Text Detection ◽

Scene Text

Download Full-text

Scene Text Detection with Supervised Pyramid Context Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019038 ◽

2019 ◽

Vol 33 ◽

pp. 9038-9045 ◽

Cited By ~ 30

Author(s):

Enze Xie ◽

Yuhang Zang ◽

Shuai Shao ◽

Gang Yu ◽

Cong Yao ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Text Detection ◽

Detection Methods ◽

Natural Scenes ◽

The Past ◽

Scene Text Detection ◽

Scene Text ◽

Previous State ◽

Feature Pyramid

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on

Download Full-text

SEMPANet: A Modified Path Aggregation Network with Squeeze-Excitation for Scene Text Detection

Sensors ◽

10.3390/s21082657 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2657

Author(s):

Shuangshuang Li ◽

Wenming Cao

Keyword(s):

Feature Extraction ◽

Early Stage ◽

Text Detection ◽

Detection Methods ◽

Good Effect ◽

Natural Scenes ◽

Training Time ◽

Scene Text Detection ◽

Scene Text ◽

Curved Text

Recently, various object detection frameworks have been applied to text detection tasks and have achieved good performance in the final detection. With the further expansion of text detection application scenarios, the research value of text detection topics has gradually increased. Text detection in natural scenes is more challenging for horizontal text based on a quadrilateral detection box and for curved text of any shape. Most networks have a good effect on the balancing of target samples in text detection, but it is challenging to deal with small targets and solve extremely unbalanced data. We continued to use PSENet to deal with such problems in this work. On the other hand, we studied the problem that most of the existing scene text detection methods use ResNet and FPN as the backbone of feature extraction, and improved the ResNet and FPN network parts of PSENet to make it more conducive to the combination of feature extraction in the early stage. A SEMPANet framework without an anchor and in one stage is proposed to implement a lightweight model, which is embodied in the training time of about 24 h. Finally, we selected the two most representative datasets for oriented text and curved text to conduct experiments. On ICDAR2015, the improved network’s latest results further verify its effectiveness; it reached 1.01% in F-measure compared with PSENet-1s. On CTW1500, the improved network performed better than the original network on average.

Download Full-text

A Target Detection Algorithm for Remote Sensing Images Based on Deep Learning

Contrast Media & Molecular Imaging ◽

10.1155/2021/3474921 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Yi Lv ◽

Zhengbo Yin ◽

Zhezhou Yu

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Target Detection ◽

Detection Efficiency ◽

False Positive Rate ◽

Remote Sensing Image ◽

Detection Algorithm ◽

Detection Accuracy ◽

Data Set ◽

Detection Effect

In order to improve the accuracy of remote sensing image target detection, this paper proposes a remote sensing image target detection algorithm DFS based on deep learning. Firstly, dimension clustering module, loss function, and sliding window segmentation detection are designed. The data set used in the experiment comes from GoogleEarth, and there are 6 types of objects: airplanes, boats, warehouses, large ships, bridges, and ports. Training set, verification set, and test set contain 73490 images, 22722 images, and 2138 images, respectively. It is assumed that the number of detected positive samples and negative samples is A and B, respectively, and the number of undetected positive samples and negative samples is C and D, respectively. The experimental results show that the precision-recall curve of DFS for six types of targets shows that DFS has the best detection effect for bridges and the worst detection effect for boats. The main reason is that the size of the bridge is relatively large, and it is clearly distinguished from the background in the image, so the detection difficulty is low. However, the target of the boat is very small, and it is easy to be mixed with the background, so it is difficult to detect. The MAP of DFS is improved by 12.82%, the detection accuracy is improved by 13%, and the recall rate is slightly decreased by 1% compared with YOLOv2. According to the number of detection targets, the number of false positives (FPs) of DFS is much less than that of YOLOv2. The false positive rate is greatly reduced. In addition, the average IOU of DFS is 11.84% higher than that of YOLOv2. For small target detection efficiency and large remote sensing image detection, the DFS algorithm has obvious advantages.

Download Full-text

Scene Text Detection Algorithm Based on Color Clustering of Textual Pixels

Laser & Optoelectronics Progress ◽

10.3788/lop56.071006 ◽

2019 ◽

Vol 56 (7) ◽

pp. 071006

Author(s):

李敏 Li Min ◽

郑建彬 Zheng Jianbin ◽

詹恩奇 Zhan Enqi ◽

汪阳 Wang Yang

Keyword(s):

Detection Algorithm ◽

Text Detection ◽

Scene Text Detection ◽

Scene Text ◽

Color Clustering

Download Full-text

Real-Time Scene Text Detection with Differentiable Binarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6812 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11474-11481 ◽

Cited By ~ 9

Author(s):

Minghui Liao ◽

Zhaoyi Wan ◽

Cong Yao ◽

Kai Chen ◽

Xiang Bai

Keyword(s):

State Of The Art ◽

Text Detection ◽

Detection Accuracy ◽

Post Processing ◽

Segmentation Method ◽

Performance Improvements ◽

Scene Text Detection ◽

Scene Text ◽

Benchmark Datasets ◽

Bounding Boxes

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simplifies the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on five benchmark datasets, which consistently achieves state-of-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are significant so that we can look for an ideal tradeoff between detection accuracy and efficiency. Specifically, with a backbone of ResNet-18, our detector achieves an F-measure of 82.8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https://github.com/MhLiao/DB.

Download Full-text

A Scene Text Detection Algorithm based on ResNet and Faster R-CNN

Proceedings of the 2019 International Conference on Artificial Intelligence and Computer Science - AICS 2019 ◽

10.1145/3349341.3349521 ◽

2019 ◽

Author(s):

Qubo Xie ◽

Ke Zhou ◽

Xiaohu Fan

Keyword(s):

Detection Algorithm ◽

Text Detection ◽

Scene Text Detection ◽

Scene Text

Download Full-text

TextFuseNet: Scene Text Detection with Richer Fused Features

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/72 ◽

2020 ◽

Author(s):

Jian Ye ◽

Zhe Chen ◽

Juhua Liu ◽

Bo Du

Keyword(s):

Text Detection ◽

Feature Representation ◽

Natural Scenes ◽

Weak Supervision ◽

Feature Representations ◽

General Semantics ◽

Scene Text Detection ◽

Scene Text ◽

Multi Level ◽

Different Levels

Arbitrary shape text detection in natural scenes is an extremely challenging task. Unlike existing text detection approaches that only perceive texts based on limited feature representations, we propose a novel framework, namely TextFuseNet, to exploit the use of richer features fused for text detection. More specifically, we propose to perceive texts from three levels of feature representations, i.e., character-, word- and global-level, and then introduce a novel text representation fusion technique to help achieve robust arbitrary text detection. The multi-level feature representation can adequately describe texts by dissecting them into individual characters while still maintaining their general semantics. TextFuseNet then collects and merges the texts’ features from different levels using a multi-path fusion architecture which can effectively align and fuse different representations. In practice, our proposed TextFuseNet can learn a more adequate description of arbitrary shapes texts, suppressing false positives and producing more accurate detection results. Our proposed framework can also be trained with weak supervision for those datasets that lack character-level annotations. Experiments on several datasets show that the proposed TextFuseNet achieves state-of-the-art performance. Specifically, we achieve an F-measure of 94.3% on ICDAR2013, 92.1% on ICDAR2015, 87.1% on Total-Text and 86.6% on CTW-1500, respectively.

Download Full-text