Challenges of Deep Learning-based Text Detection in the Wild

The reported accuracy of recent state-of-the-art text detection methods, mostly deep learning approaches, is in the order of 80% to 90% on standard benchmark datasets. These methods have relaxed some of the restrictions of structured text and environment (i.e., "in the wild") which are usually required for classical OCR to properly function. Even with this relaxation, there are still circumstances where these state-of-the-art methods fail. Several remaining challenges in wild images, like in-plane-rotation, illumination reflection, partial occlusion, complex font styles, and perspective distortion, cause exciting methods to perform poorly. In order to evaluate current approaches in a formal way, we standardize the datasets and metrics for comparison which had made comparison between these methods difficult in the past. We use three benchmark datasets for our evaluations: ICDAR13, ICDAR15, and COCO-Text V2.0. The objective of the paper is to quantify the current shortcomings and to identify the challenges for future text detection research.

Download Full-text

Deep Learning Approaches for Object Detection

Artificial Intelligence Evolution ◽

10.37256/aie.122020564 ◽

2020 ◽

pp. 123-145

Author(s):

Sushma Jaiswal ◽

Tarun Jaiswal

Keyword(s):

Deep Learning ◽

Object Detection ◽

Autonomous Driving ◽

General Idea ◽

Detection Methods ◽

Learning Approaches ◽

Detection Techniques ◽

Second Stage ◽

Fast Pace ◽

Benchmark Datasets

In computer vision, object detection is a very important, exciting and mind-blowing study. Object detection work in numerous fields such as observing security, independently/autonomous driving and etc. Deep-learning based object detection techniques have developed at a very fast pace and have attracted the attention of many researchers. The main focus of the 21st century is the development of the object-detection framework, comprehensively and genuinely. In this investigation, we initially investigate and evaluate the various object detection approaches and designate the benchmark datasets. We also delivered the wide-ranging general idea of object detection approaches in an organized way. We covered the first and second stage detectors of object detection methods. And lastly, we consider the construction of these object detection approaches to give dimensions for further research.

Download Full-text

Scene Text Detection with Supervised Pyramid Context Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019038 ◽

2019 ◽

Vol 33 ◽

pp. 9038-9045 ◽

Cited By ~ 30

Author(s):

Enze Xie ◽

Yuhang Zang ◽

Shuai Shao ◽

Gang Yu ◽

Cong Yao ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Text Detection ◽

Detection Methods ◽

Natural Scenes ◽

The Past ◽

Scene Text Detection ◽

Scene Text ◽

Previous State ◽

Feature Pyramid

Scene text detection methods based on deep learning have achieved remarkable results over the past years. However, due to the high diversity and complexity of natural scenes, previous state-of-the-art text detection methods may still produce a considerable amount of false positives, when applied to images captured in real-world environments. To tackle this issue, mainly inspired by Mask R-CNN, we propose in this paper an effective model for scene text detection, which is based on Feature Pyramid Network (FPN) and instance segmentation. We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.Benefited from the guidance of semantic information and sharing FPN, SPCNET obtains significantly enhanced performance while introducing marginal extra computation. Experiments on standard datasets demonstrate that our SPCNET clearly outperforms start-of-the-art methods. Specifically, it achieves an F-measure of 92.1% on ICDAR2013, 87.2% on ICDAR2015, 74.1% on ICDAR2017 MLT and 82.9% on

Download Full-text

Omnidirectional Scene Text Detection with Sequential-free Box Discretization

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/423 ◽

2019 ◽

Cited By ~ 9

Author(s):

Yuliang Liu ◽

Sheng Zhang ◽

Lianwen Jin ◽

Lele Xie ◽

Yaqiang Wu ◽

...

Keyword(s):

State Of The Art ◽

Detection Performance ◽

Text Detection ◽

Detection Methods ◽

Bounding Box ◽

Scene Text Detection ◽

Scene Text ◽

Art Methods ◽

In The Wild ◽

Ablation Study

Scene text in the wild is commonly presented with high variant characteristics. Using quadrilateral bounding box to localize the text instance is nearly indispensable for detection methods. However, recent researches reveal that introducing quadrilateral bounding box for scene text detection will bring a label confusion issue which is easily overlooked, and this issue may significantly undermine the detection performance. To address this issue, in this paper, we propose a novel method called Sequential-free Box Discretization (SBD) by discretizing the bounding box into key edges (KE) which can further derive more effective methods to improve detection performance. Experiments showed that the proposed method can outperform state-of-the-art methods in many popular scene text benchmarks, including ICDAR 2015, MLT, and MSRA-TD500. Ablation study also showed that simply integrating the SBD into Mask R-CNN framework, the detection performance can be substantially improved. Furthermore, an experiment on the general object dataset HRSC2016 (multi-oriented ships) showed that our method can outperform recent state-of-the-art methods by a large margin, demonstrating its powerful generalization ability.

Download Full-text

Deep understanding of shopper behaviours and interactions using RGB-D vision

Machine Vision and Applications ◽

10.1007/s00138-020-01118-w ◽

2020 ◽

Vol 31 (7-8) ◽

Cited By ~ 1

Author(s):

Marina Paolanti ◽

Rocco Pietrini ◽

Adriano Mancini ◽

Emanuele Frontoni ◽

Primo Zingaretti

Keyword(s):

Machine Learning ◽

Deep Learning ◽

State Of The Art ◽

Deep Understanding ◽

Learning Approaches ◽

Multimedia Big Data ◽

People Counting ◽

The Past ◽

Feature Based ◽

Top View

Abstract In retail environments, understanding how shoppers move about in a store’s spaces and interact with products is very valuable. While the retail environment has several favourable characteristics that support computer vision, such as reasonable lighting, the large number and diversity of products sold, as well as the potential ambiguity of shoppers’ movements, mean that accurately measuring shopper behaviour is still challenging. Over the past years, machine-learning and feature-based tools for people counting as well as interactions analytic and re-identification were developed with the aim of learning shopper skills based on occlusion-free RGB-D cameras in a top-view configuration. However, after moving into the era of multimedia big data, machine-learning approaches evolved into deep learning approaches, which are a more powerful and efficient way of dealing with the complexities of human behaviour. In this paper, a novel VRAI deep learning application that uses three convolutional neural networks to count the number of people passing or stopping in the camera area, perform top-view re-identification and measure shopper–shelf interactions from a single RGB-D video flow with near real-time performances has been introduced. The framework is evaluated on the following three new datasets that are publicly available: TVHeads for people counting, HaDa for shopper–shelf interactions and TVPR2 for people re-identification. The experimental results show that the proposed methods significantly outperform all competitive state-of-the-art methods (accuracy of 99.5% on people counting, 92.6% on interaction classification and 74.5% on re-id), bringing to different and significative insights for implicit and extensive shopper behaviour analysis for marketing applications.

Download Full-text

A Comparative Study of Transfer Learning Approaches for Video Anomaly Detection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520030 ◽

2020 ◽

pp. 2152003

Author(s):

Matheus Gutoski ◽

Manassés Ribeiro ◽

Leandro T. Hattori ◽

Marcelo Romero ◽

André E. Lazzaretti ◽

...

Keyword(s):

Neural Network ◽

Anomaly Detection ◽

Transfer Learning ◽

Visual Analysis ◽

State Of The Art ◽

Detection Methods ◽

Support Vector ◽

Learning Approaches ◽

Vector Machines ◽

Benchmark Datasets

Recent research has shown that features obtained from pretrained Convolutional Neural Network (CNN) models can be promptly applied to a variety of problems they were not originally designed to solve. This concept, often referred to as Transfer Learning (TL), is a common practice when labeled data is limited. In some fields, such as video anomaly detection, TL is still an underexplored subject in the sense that it is not clear whether the architecture of the pretrained CNN model impacts on the video anomaly detection performance. In order to clarify this issue, we perform an extensive benchmark using 12 different pretrained CNN models on ImageNet as feature extractors and apply the features obtained to seven video anomaly detection benchmark datasets. This work presents some interesting findings about video anomaly detection using TL. The highlights of our findings were revealed by our experiments, which have shown that a simple classification process using One-Class Support Vector Machines yields similar results to state-of-the-art models. Moreover, a statistical analysis suggests that architectural differences are negligible when choosing a pretrained model for video anomaly detection, since all models presented similar performance. At last, we present an in-depth visual analysis of the Avenue dataset, and reveal several aspects that may be limiting the performance of state-of-the-art video anomaly detection methods.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

Flying Free: A Research Overview of Deep Learning in Drone Navigation Autonomy

Drones ◽

10.3390/drones5020052 ◽

2021 ◽

Vol 5 (2) ◽

pp. 52

Author(s):

Thomas Lee ◽

Susan Mckeever ◽

Jane Courtney

Keyword(s):

Deep Learning ◽

Research Work ◽

Learning Approaches ◽

Clear Definition ◽

Top Down ◽

Research Activity ◽

Comprehensive Overview ◽

The Past ◽

Computer Vision Applications ◽

Definition Of

With the rise of Deep Learning approaches in computer vision applications, significant strides have been made towards vehicular autonomy. Research activity in autonomous drone navigation has increased rapidly in the past five years, and drones are moving fast towards the ultimate goal of near-complete autonomy. However, while much work in the area focuses on specific tasks in drone navigation, the contribution to the overall goal of autonomy is often not assessed, and a comprehensive overview is needed. In this work, a taxonomy of drone navigation autonomy is established by mapping the definitions of vehicular autonomy levels, as defined by the Society of Automotive Engineers, to specific drone tasks in order to create a clear definition of autonomy when applied to drones. A top–down examination of research work in the area is conducted, focusing on drone navigation tasks, in order to understand the extent of research activity in each area. Autonomy levels are cross-checked against the drone navigation tasks addressed in each work to provide a framework for understanding the trajectory of current research. This work serves as a guide to research in drone autonomy with a particular focus on Deep Learning-based solutions, indicating key works and areas of opportunity for development of this area in the future.

Download Full-text

Deep learning approaches for speech emotion recognition: state of the art and research challenges

Multimedia Tools and Applications ◽

10.1007/s11042-020-09874-7 ◽

2021 ◽

Author(s):

Rashid Jahangir ◽

Ying Wah Teh ◽

Faiqa Hanif ◽

Ghulam Mujtaba

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Research Challenges

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

Covid-19 detection via deep neural network and occlusion sensitivity maps

10.36227/techrxiv.14100890 ◽

2021 ◽

Author(s):

Noor Ahmad ◽

Muhammad Aminu ◽

Mohd Halim Mohd Noor

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

State Of The Art ◽

Color Images ◽

Fine Tuning ◽

Training Dataset ◽

Learning Approaches ◽

Learning Models ◽

Sensitivity Maps

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.

Download Full-text