A Survey of Graphical Page Object Detection with Deep Neural Networks

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

Applied Sciences ◽

10.3390/app11125344 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5344

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that make the digitization of documents viable. Since the advent of deep learning, deep learning-based object detection performance has improved many folds. This work outlines and summarizes the deep learning approaches for detecting graphical page objects in document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text

Object Detection Based on Faster R-Cnn

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c2186.0210321 ◽

2021 ◽

Vol 10 (3) ◽

pp. 72-76

Author(s):

M. Sushma Sri ◽

B. Rajendra Naik ◽

K. Jaya Sankar

Keyword(s):

Neural Networks ◽

Image Processing ◽

Deep Learning ◽

Object Detection ◽

Deep Neural Networks ◽

Rapid Development ◽

Simple Algorithm ◽

Average Precision ◽

Rapid Improvement ◽

High Level

In recent years there is rapid improvement in Object detection in areas of video analysis and image processing applications. Determing a desired object became an important aspect, so that there are many numerous of methods are evolved in Object detection. In this regard as there is rapid development in Deep Learning for its high-level processing, extracting deeper features, reliable and flexible compared to conventional techniques. In this article, the author proposes Object detection with deep neural networks and faster region convolutional neural networks methods for providing a simple algorithm which provides better accuracy and mean average precision.

Download Full-text

Adversarial Attacks for Deep Learning-Based Infrared Object Detection

Journal of the Korea Institute of Military Science and Technology ◽

10.9766/kimst.2021.24.6.591 ◽

2021 ◽

Vol 24 (6) ◽

pp. 591-601

Author(s):

Hoseong Kim ◽

Jaeguk Hyun ◽

Hyunjung Yoo ◽

Chunho Kim ◽

Hyunho Jeon

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Object Detection ◽

Image Recognition ◽

Rapid Growth ◽

Deep Neural Networks ◽

State Of The Art ◽

Visible Image ◽

Adversarial Attack

Recently, infrared object detection(IOD) has been extensively studied due to the rapid growth of deep neural networks(DNN). Adversarial attacks using imperceptible perturbation can dramatically deteriorate the performance of DNN. However, most adversarial attack works are focused on visible image recognition(VIR), and there are few methods for IOD. We propose deep learning-based adversarial attacks for IOD by expanding several state-of-the-art adversarial attacks for VIR. We effectively validate our claim through comprehensive experiments on two challenging IOD datasets, including FLIR and MSOD.

Download Full-text

Natural Image Matting via Guided Contextual Attention

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6809 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11450-11457 ◽

Cited By ~ 2

Author(s):

Yaoyi Li ◽

Hongtao Lu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Information Flow ◽

Deep Neural Networks ◽

State Of The Art ◽

Natural Image ◽

Image Matting ◽

Transparent Objects ◽

High Level ◽

High Computational Complexity

Over the last few years, deep learning based approaches have achieved outstanding improvements in natural image matting. Many of these methods can generate visually plausible alpha estimations, but typically yield blurry structures or textures in the semitransparent area. This is due to the local ambiguity of transparent objects. One possible solution is to leverage the far-surrounding information to estimate the local opacity. Traditional affinity-based methods often suffer from the high computational complexity, which are not suitable for high resolution alpha estimation. Inspired by affinity-based method and the successes of contextual attention in inpainting, we develop a novel end-to-end approach for natural image matting with a guided contextual attention module, which is specifically designed for image matting. Guided contextual attention module directly propagates high-level opacity information globally based on the learned low-level affinity. The proposed method can mimic information flow of affinity-based methods and utilize rich features learned by deep neural networks simultaneously. Experiment results on Composition-1k testing set and alphamatting.com benchmark dataset demonstrate that our method outperforms state-of-the-art approaches in natural image matting. Code and models are available at https://github.com/Yaoyi-Li/GCA-Matting.

Download Full-text

Efficient Single-Shot Multi-Object Tracking for Vehicles in Traffic Scenarios

Sensors ◽

10.3390/s21196358 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6358

Author(s):

Youngkeun Lee ◽

Sang-ha Lee ◽

Jisang Yoo ◽

Soonchul Kwon

Keyword(s):

Deep Learning ◽

Object Detection ◽

Object Tracking ◽

State Of The Art ◽

Tracking System ◽

Single Shot ◽

Essential Information ◽

Trade Off ◽

Previous State ◽

Improved Accuracy

Multi-object tracking is a significant field in computer vision since it provides essential information for video surveillance and analysis. Several different deep learning-based approaches have been developed to improve the performance of multi-object tracking by applying the most accurate and efficient combinations of object detection models and appearance embedding extraction models. However, two-stage methods show a low inference speed since the embedding extraction can only be performed at the end of the object detection. To alleviate this problem, single-shot methods, which simultaneously perform object detection and embedding extraction, have been developed and have drastically improved the inference speed. However, there is a trade-off between accuracy and efficiency. Therefore, this study proposes an enhanced single-shot multi-object tracking system that displays improved accuracy while maintaining a high inference speed. With a strong feature extraction and fusion, the object detection of our model achieves an AP score of 69.93% on the UA-DETRAC dataset and outperforms previous state-of-the-art methods, such as FairMOT and JDE. Based on the improved object detection performance, our multi-object tracking system achieves a MOTA score of 68.5% and a PR-MOTA score of 24.5% on the same dataset, also surpassing the previous state-of-the-art trackers.

Download Full-text

Implementing State-of-the-Art Deep Learning Approaches for Archaeological Object Detection in Remotely-Sensed Data: The Results of Cross-Domain Collaboration

Journal of Computer Applications in Archaeology ◽

10.5334/jcaa.78 ◽

2021 ◽

Vol 4 (1) ◽

pp. 274-289

Author(s):

Martin Olivier ◽

Wouter Verschoof-van der Vaart

Keyword(s):

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Remotely Sensed ◽

Learning Approaches ◽

Remotely Sensed Data ◽

Archaeological Object ◽

Cross Domain

Download Full-text

Transcription Alignment of Historical Vietnamese Manuscripts without Human-Annotated Learning Samples

Applied Sciences ◽

10.3390/app11114894 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4894

Author(s):

Anna Scius-Bertrand ◽

Michael Jungo ◽

Beat Wolf ◽

Andreas Fischer ◽

Marc Bui

Keyword(s):

Object Detection ◽

State Of The Art ◽

Positive Impact ◽

Detection System ◽

Training Data ◽

Detection Accuracy ◽

Current State ◽

Alignment Task ◽

Scanned Image ◽

Automatic Transcription

The current state of the art for automatic transcription of historical manuscripts is typically limited by the requirement of human-annotated learning samples, which are are necessary to train specific machine learning models for specific languages and scripts. Transcription alignment is a simpler task that aims to find a correspondence between text in the scanned image and its existing Unicode counterpart, a correspondence which can then be used as training data. The alignment task can be approached with heuristic methods dedicated to certain types of manuscripts, or with weakly trained systems reducing the required amount of annotations. In this article, we propose a novel learning-based alignment method based on fully convolutional object detection that does not require any human annotation at all. Instead, the object detection system is initially trained on synthetic printed pages using a font and then adapted to the real manuscripts by means of self-training. On a dataset of historical Vietnamese handwriting, we demonstrate the feasibility of annotation-free alignment as well as the positive impact of self-training on the character detection accuracy, reaching a detection accuracy of 96.4% with a YOLOv5m model without using any human annotation.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Data-Driven Structural Health Monitoring and Damage Detection through Deep Learning: State-of-the-Art Review

Sensors ◽

10.3390/s20102778 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2778 ◽

Cited By ~ 12

Author(s):

Mohsen Azimi ◽

Armin Eslamlou ◽

Gokhan Pekcan

Keyword(s):

Deep Learning ◽

Structural Health Monitoring ◽

Health Monitoring ◽

High Speed ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Driven ◽

Structural Health ◽

Promising Tool ◽

Significant Attention

Data-driven methods in structural health monitoring (SHM) is gaining popularity due to recent technological advancements in sensors, as well as high-speed internet and cloud-based computation. Since the introduction of deep learning (DL) in civil engineering, particularly in SHM, this emerging and promising tool has attracted significant attention among researchers. The main goal of this paper is to review the latest publications in SHM using emerging DL-based methods and provide readers with an overall understanding of various SHM applications. After a brief introduction, an overview of various DL methods (e.g., deep neural networks, transfer learning, etc.) is presented. The procedure and application of vibration-based, vision-based monitoring, along with some of the recent technologies used for SHM, such as sensors, unmanned aerial vehicles (UAVs), etc. are discussed. The review concludes with prospects and potential limitations of DL-based methods in SHM applications.

Download Full-text

Deep learning approaches for speech emotion recognition: state of the art and research challenges

Multimedia Tools and Applications ◽

10.1007/s11042-020-09874-7 ◽

2021 ◽

Author(s):

Rashid Jahangir ◽

Ying Wah Teh ◽

Faiqa Hanif ◽

Ghulam Mujtaba

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Research Challenges

Download Full-text