Deep Learning for Historical Document Analysis and Recognition—A Survey

Francesco Lombardi; Simone Marinai

doi:10.3390/jimaging6100110

Deep Learning for Historical Document Analysis and Recognition—A Survey

Journal of Imaging ◽

10.3390/jimaging6100110 ◽

2020 ◽

Vol 6 (10) ◽

pp. 110

Author(s):

Francesco Lombardi ◽

Simone Marinai

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Point Of View ◽

Historical Documents ◽

Learning Approaches ◽

Historical Document ◽

Research Directions ◽

Research Fields ◽

Definition Of ◽

Novel Applications

Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception. Our study analyzes the papers published in the last few years on this topic from different perspectives: we first provide a pragmatic definition of historical documents from the point of view of the research in the area, then we look at the various sub-tasks addressed in this research. Guided by these tasks, we go through the different input-output relations that are expected from the used deep learning approaches and therefore we accordingly describe the most used models. We also discuss research datasets published in the field and their applications. This analysis shows that the latest research is a leap forward since it is not the simple use of recently proposed algorithms to previous problems, but novel tasks and novel applications of state of the art methods are now considered. Rather than just providing a conclusive picture of the current research in the topic we lastly suggest some potential future trends that can represent a stimulus for innovative research directions.

Get full-text (via PubEx)

Smartphone Location Recognition: A Deep Learning-Based Approach

Sensors ◽

10.3390/s20010214 ◽

2019 ◽

Vol 20 (1) ◽

pp. 214 ◽

Cited By ~ 4

Author(s):

Itzik Klein

Keyword(s):

Deep Learning ◽

Sampling Rate ◽

Dead Reckoning ◽

Step Length ◽

Point Of View ◽

Learning Approaches ◽

Location Recognition ◽

Length Estimation ◽

Recorded Data ◽

Definition Of

One of the approaches for indoor positioning using smartphones is pedestrian dead reckoning. There, the user step length is estimated using empirical or biomechanical formulas. Such calculation was shown to be very sensitive to the smartphone location on the user. In addition, knowledge of the smartphone location can also help for direct step-length estimation and heading determination. In a wider point of view, smartphone location recognition is part of human activity recognition employed in many fields and applications, such as health monitoring. In this paper, we propose to use deep learning approaches to classify the smartphone location on the user, while walking, and require robustness in terms of the ability to cope with recordings that differ (in sampling rate, user dynamics, sensor type, and more) from those available in the train dataset. The contributions of the paper are: (1) Definition of the smartphone location recognition framework using accelerometers, gyroscopes, and deep learning; (2) examine the proposed approach on 107 people and 31 h of recorded data obtained from eight different datasets; and (3) enhanced algorithms for using only accelerometers for the classification process. The experimental results show that the smartphone location can be classified with high accuracy using only the smartphone’s accelerometers.

Get full-text (via PubEx)

Flying Free: A Research Overview of Deep Learning in Drone Navigation Autonomy

Drones ◽

10.3390/drones5020052 ◽

2021 ◽

Vol 5 (2) ◽

pp. 52

Author(s):

Thomas Lee ◽

Susan Mckeever ◽

Jane Courtney

Keyword(s):

Deep Learning ◽

Research Work ◽

Learning Approaches ◽

Clear Definition ◽

Top Down ◽

Research Activity ◽

Comprehensive Overview ◽

The Past ◽

Computer Vision Applications ◽

Definition Of

With the rise of Deep Learning approaches in computer vision applications, significant strides have been made towards vehicular autonomy. Research activity in autonomous drone navigation has increased rapidly in the past five years, and drones are moving fast towards the ultimate goal of near-complete autonomy. However, while much work in the area focuses on specific tasks in drone navigation, the contribution to the overall goal of autonomy is often not assessed, and a comprehensive overview is needed. In this work, a taxonomy of drone navigation autonomy is established by mapping the definitions of vehicular autonomy levels, as defined by the Society of Automotive Engineers, to specific drone tasks in order to create a clear definition of autonomy when applied to drones. A top–down examination of research work in the area is conducted, focusing on drone navigation tasks, in order to understand the extent of research activity in each area. Autonomy levels are cross-checked against the drone navigation tasks addressed in each work to provide a framework for understanding the trajectory of current research. This work serves as a guide to research in drone autonomy with a particular focus on Deep Learning-based solutions, indicating key works and areas of opportunity for development of this area in the future.

Get full-text (via PubEx)

Deep learning approaches for speech emotion recognition: state of the art and research challenges

Multimedia Tools and Applications ◽

10.1007/s11042-020-09874-7 ◽

2021 ◽

Author(s):

Rashid Jahangir ◽

Ying Wah Teh ◽

Faiqa Hanif ◽

Ghulam Mujtaba

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Research Challenges

Get full-text (via PubEx)

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Get full-text (via PubEx)

Covid-19 detection via deep neural network and occlusion sensitivity maps

10.36227/techrxiv.14100890 ◽

2021 ◽

Author(s):

Noor Ahmad ◽

Muhammad Aminu ◽

Mohd Halim Mohd Noor

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

State Of The Art ◽

Color Images ◽

Fine Tuning ◽

Training Dataset ◽

Learning Approaches ◽

Learning Models ◽

Sensitivity Maps

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.

Get full-text (via PubEx)

Comparing Deep-Learning Architectures and Traditional Machine-Learning Approaches for Satire Identification in Spanish Tweets

Mathematics ◽

10.3390/math8112075 ◽

2020 ◽

Vol 8 (11) ◽

pp. 2075

Author(s):

Óscar Apolinario-Arzube ◽

José Antonio García-Díaz ◽

José Medina-Moreira ◽

Harry Luna-Aveiga ◽

Rafael Valencia-García

Keyword(s):

Machine Learning ◽

Deep Learning ◽

User Interfaces ◽

State Of The Art ◽

Learning Approaches ◽

Word Embeddings ◽

Linguistic Features ◽

Intended Meaning ◽

Language User ◽

Learning Architectures

Automatic satire identification can help to identify texts in which the intended meaning differs from the literal meaning, improving tasks such as sentiment analysis, fake news detection or natural-language user interfaces. Typically, satire identification is performed by training a supervised classifier for finding linguistic clues that can determine whether a text is satirical or not. For this, the state-of-the-art relies on neural networks fed with word embeddings that are capable of learning interesting characteristics regarding the way humans communicate. However, as far as our knowledge goes, there are no comprehensive studies that evaluate these techniques in Spanish in the satire identification domain. Consequently, in this work we evaluate several deep-learning architectures with Spanish pre-trained word-embeddings and compare the results with strong baselines based on term-counting features. This evaluation is performed with two datasets that contain satirical and non-satirical tweets written in two Spanish variants: European Spanish and Mexican Spanish. Our experimentation revealed that term-counting features achieved similar results to deep-learning approaches based on word-embeddings, both outperforming previous results based on linguistic features. Our results suggest that term-counting features and traditional machine learning models provide competitive results regarding automatic satire identification, slightly outperforming state-of-the-art models.

Get full-text (via PubEx)

Learning to Combine Local and Global Image Information for Contactless Palmprint Recognition

Sensors ◽

10.3390/s22010073 ◽

2021 ◽

Vol 22 (1) ◽

pp. 73

Author(s):

Marjan Stoimchev ◽

Marija Ivanovska ◽

Vitomir Štruc

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Input Image ◽

Palmprint Recognition ◽

Learning Approaches ◽

Elastic Deformations ◽

Feature Representations ◽

Palmar Surface ◽

Proposed Model ◽

Visual Artifacts

In the past few years, there has been a leap from traditional palmprint recognition methodologies, which use handcrafted features, to deep-learning approaches that are able to automatically learn feature representations from the input data. However, the information that is extracted from such deep-learning models typically corresponds to the global image appearance, where only the most discriminative cues from the input image are considered. This characteristic is especially problematic when data is acquired in unconstrained settings, as in the case of contactless palmprint recognition systems, where visual artifacts caused by elastic deformations of the palmar surface are typically present in spatially local parts of the captured images. In this study we address the problem of elastic deformations by introducing a new approach to contactless palmprint recognition based on a novel CNN model, designed as a two-path architecture, where one path processes the input in a holistic manner, while the second path extracts local information from smaller image patches sampled from the input image. As elastic deformations can be assumed to most significantly affect the global appearance, while having a lesser impact on spatially local image areas, the local processing path addresses the issues related to elastic deformations thereby supplementing the information from the global processing path. The model is trained with a learning objective that combines the Additive Angular Margin (ArcFace) Loss and the well-known center loss. By using the proposed model design, the discriminative power of the learned image representation is significantly enhanced compared to standard holistic models, which, as we show in the experimental section, leads to state-of-the-art performance for contactless palmprint recognition. Our approach is tested on two publicly available contactless palmprint datasets—namely, IITD and CASIA—and is demonstrated to perform favorably against state-of-the-art methods from the literature. The source code for the proposed model is made publicly available.

Get full-text (via PubEx)

Covid-19 detection via deep neural network and occlusion sensitivity maps

10.36227/techrxiv.14100890.v1 ◽

2021 ◽

Author(s):

Noor Ahmad ◽

Muhammad Aminu ◽

Mohd Halim Mohd Noor

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

State Of The Art ◽

Color Images ◽

Fine Tuning ◽

Training Dataset ◽

Learning Approaches ◽

Learning Models ◽

Sensitivity Maps

Get full-text (via PubEx)

Deep Machine Learning provides state-of-the-art performance in image-based plant phenotyping

10.1101/053033 ◽

2016 ◽

Cited By ~ 12

Author(s):

Michael P. Pound ◽

Alexandra J. Burgess ◽

Michael H. Wilson ◽

Jonathan A. Atkinson ◽

Marcus Griffiths ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Analysis ◽

Paradigm Shift ◽

State Of The Art ◽

Plant Phenotyping ◽

Learning Approaches ◽

Challenging Problem ◽

Feature Identification ◽

Art Performance

AbstractDeep learning is an emerging field that promises unparalleled results on many data analysis problems. We show the success offered by such techniques when applied to the challenging problem of image-based plant phenotyping, and demonstrate state-of-the-art results for root and shoot feature identification and localisation. We predict a paradigm shift in image-based phenotyping thanks to deep learning approaches.

Get full-text (via PubEx)

Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/702 ◽

2021 ◽

Author(s):

Nir Lipovetzky

Keyword(s):

State Of The Art ◽

General Definition ◽

Future Research ◽

Memory Consumption ◽

Research Directions ◽

Model Free ◽

Future Research Directions ◽

Common Problems ◽

Definition Of ◽

Simulation Engines

Width-based algorithms search for solutions through a general definition of state novelty. These algorithms have been shown to result in state-of-the-art performance in classical planning, and have been successfully applied to model-based and model-free settings where the dynamics of the problem are given through simulation engines. Width-based algorithms performance is understood theoretically through the notion of planning width, providing polynomial guarantees on their runtime and memory consumption. To facilitate synergies across research communities, this paper summarizes the area of width-based planning, and surveys current and future research directions.

Get full-text (via PubEx)