HybridTabNet: Towards Better Table Detection in Scanned Document Images

Tables in the document image are one of the most important entities since they contain crucial information. Therefore, accurate table detection can significantly improve information extraction from tables. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperforms earlier state-of-the-art results without depending on pre and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduces the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on UNLV dataset. The achieved results reflect the superior performance of the proposed method.

Download Full-text

HybridTabNet: Towards Better Table Detection in Scanned Document Images

Applied Sciences ◽

10.3390/app11188396 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8396

Author(s):

Danish Nazir ◽

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

...

Keyword(s):

State Of The Art ◽

Superior Performance ◽

Document Images ◽

Post Processing ◽

Backbone Network ◽

Unseen Data ◽

Crucial Information ◽

Leave One Out ◽

Processing Steps ◽

Prior State

Tables in document images are an important entity since they contain crucial information. Therefore, accurate table detection can significantly improve the information extraction from documents. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperformed earlier state-of-the-art results without depending on pre- and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduced the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on the UNLV dataset. The achieved results reflect the superior performance of the proposed method.

Download Full-text

A ROBUST SYSTEM FOR THRESHOLDING AND SKEW DETECTION IN MIXED TEXT/GRAPHICS DOCUMENTS

International Journal of Image and Graphics ◽

10.1142/s0219467805001744 ◽

2005 ◽

Vol 05 (02) ◽

pp. 247-265 ◽

Cited By ~ 5

Author(s):

ADNAN AMIN ◽

SUE WU

Keyword(s):

Automatic System ◽

Real Life ◽

Detection Algorithm ◽

Document Image ◽

Superior Performance ◽

Document Images ◽

Skew Detection ◽

Robust System ◽

Graphic Images ◽

Scanned Images

This article presents an automatic system that takes in grayscale scanned images, which could be mixed text/graphic documents, and performs thresholding and skew detection on the document images. The system consists of two major components; multistage thresholding and skew detection. The proposed skew detection algorithm has no restriction on detectable angle range and does not rely on large blocks of text. It works well on textual document images, graphical images and mixed text and graphic images. The performance of the systems was evaluated using over 60 images that consist of real life documents like envelopes and artificial mixed text/graphic icons. The superior performance of thresholding is clear compared to other techniques from the evaluation. The skew detection algorithm is robust when compared with other methods when very few text lines are present in the document image.

Download Full-text

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

Applied Sciences ◽

10.3390/app11167610 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7610

Author(s):

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Computer Vision ◽

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Detection Accuracy ◽

Document Images ◽

Backbone Network ◽

Mathematical Formulas ◽

Composite Connections ◽

Areas Of Interest

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.

Download Full-text

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

10.20944/preprints202107.0165.v1 ◽

2021 ◽

Author(s):

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Computer Vision ◽

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Detection Accuracy ◽

Document Images ◽

Backbone Network ◽

Mathematical Formulas ◽

Composite Connections ◽

Areas Of Interest

Download Full-text

Simple and Efficient Document Image Binarization Technique For Degraded Document Images

International Journal of Scientific Research ◽

10.15373/22778179/may2014/65 ◽

2012 ◽

Vol 3 (5) ◽

pp. 217-220

Author(s):

Manju Joseph ◽

◽

Jijina K.P Jijina K.P

Keyword(s):

Document Image ◽

Document Images ◽

Image Binarization ◽

Document Image Binarization ◽

Degraded Document

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

An enhanced binarization framework for degraded historical document images

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-021-00556-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Wei Xiong ◽

Lei Zhou ◽

Ling Yue ◽

Lirong Li ◽

Song Wang

Keyword(s):

Document Image ◽

Morphological Operations ◽

Document Images ◽

Minimum Entropy ◽

Stroke Width ◽

Background Estimation ◽

Structuring Element ◽

Document Image Binarization ◽

Benchmark Datasets ◽

Stroke Width Transform

AbstractBinarization plays an important role in document analysis and recognition (DAR) systems. In this paper, we present our winning algorithm in ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018), which is based on background estimation and energy minimization. First, we adopt mathematical morphological operations to estimate and compensate the document background. It uses a disk-shaped structuring element, whose radius is computed by the minimum entropy-based stroke width transform (SWT). Second, we perform Laplacian energy-based segmentation on the compensated document images. Finally, we implement post-processing to preserve text stroke connectivity and eliminate isolated noise. Experimental results indicate that the proposed method outperforms other state-of-the-art techniques on several public available benchmark datasets.

Download Full-text

Utilization of Eco-Friendly Waste Generated Nanomaterials in Water-Based Drilling Fluids; State of the Art Review

Materials ◽

10.3390/ma14154171 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4171

Author(s):

Rabia Ikram ◽

Badrul Mohamed Jan ◽

Akhmal Sidek ◽

George Kenanakis

Keyword(s):

State Of The Art ◽

Superior Performance ◽

Future Research ◽

Drilling Fluids ◽

Drill Cuttings ◽

Environmental Friendly ◽

Water Based ◽

Drilling Operations ◽

Filtration Properties

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.

Download Full-text

High-Fidelity Simulated Players for Interactive Narrative Planning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/540 ◽

2018 ◽

Author(s):

Pengcheng Wang ◽

Jonathan Rowe ◽

Wookhee Min ◽

Bradford Mott ◽

James Lester

Keyword(s):

State Of The Art ◽

Data Driven ◽

High Fidelity ◽

Interactive Narrative ◽

Interaction Data ◽

Convolutional Networks ◽

Novel Approach ◽

Adaptation Policies ◽

Narrative Planning ◽

Prior State

Interactive narrative planning offers significant potential for creating adaptive gameplay experiences. While data-driven techniques have been devised that utilize player interaction data to induce policies for interactive narrative planners, they require enormously large gameplay datasets. A promising approach to addressing this challenge is creating simulated players whose behaviors closely approximate those of human players. In this paper, we propose a novel approach to generating high-fidelity simulated players based on deep recurrent highway networks and deep convolutional networks. Empirical results demonstrate that the proposed models significantly outperform the prior state-of-the-art in generating high-fidelity simulated player models that accurately imitate human players’ narrative interactions. Using the high-fidelity simulated player models, we show the advantage of more exploratory reinforcement learning methods for deriving generalizable narrative adaptation policies.

Download Full-text

A Survey of Graphical Page Object Detection with Deep Neural Networks

10.20944/preprints202104.0739.v1 ◽

2021 ◽

Author(s):

Jwalin Bhatt ◽

Khurram Azeem Hashmi ◽

Muhammad Zeshan Afzal ◽

Didier Stricker

Keyword(s):

Deep Learning ◽

Object Detection ◽

Conceptual Understanding ◽

Deep Neural Networks ◽

State Of The Art ◽

Learning Approaches ◽

Document Images ◽

Essential Information ◽

Current State ◽

High Level

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.

Download Full-text