scholarly journals HybridTabNet: Towards Better Table Detection in Scanned Document Images

Author(s):  
Muhammad Zeshan Afzal ◽  
Khurram Hashmi ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
Danish Nazir ◽  
...  

Tables in the document image are one of the most important entities since they contain crucial information. Therefore, accurate table detection can significantly improve information extraction from tables. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperforms earlier state-of-the-art results without depending on pre and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduces the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on UNLV dataset. The achieved results reflect the superior performance of the proposed method.

2021 ◽  
Vol 11 (18) ◽  
pp. 8396
Author(s):  
Danish Nazir ◽  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
...  

Tables in document images are an important entity since they contain crucial information. Therefore, accurate table detection can significantly improve the information extraction from documents. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperformed earlier state-of-the-art results without depending on pre- and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduced the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on the UNLV dataset. The achieved results reflect the superior performance of the proposed method.


2005 ◽  
Vol 05 (02) ◽  
pp. 247-265 ◽  
Author(s):  
ADNAN AMIN ◽  
SUE WU

This article presents an automatic system that takes in grayscale scanned images, which could be mixed text/graphic documents, and performs thresholding and skew detection on the document images. The system consists of two major components; multistage thresholding and skew detection. The proposed skew detection algorithm has no restriction on detectable angle range and does not rely on large blocks of text. It works well on textual document images, graphical images and mixed text and graphic images. The performance of the systems was evaluated using over 60 images that consist of real life documents like envelopes and artificial mixed text/graphic icons. The superior performance of thresholding is clear compared to other techniques from the evaluation. The skew detection algorithm is robust when compared with other methods when very few text lines are present in the document image.


2021 ◽  
Vol 11 (16) ◽  
pp. 7610
Author(s):  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
Muhammad Zeshan Afzal

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.


Author(s):  
Khurram Azeem Hashmi ◽  
Alain Pagani ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  
Muhammad Zeshan Afzal

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.


2019 ◽  
Author(s):  
Wengong Jin ◽  
Regina Barzilay ◽  
Tommi S Jaakkola

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Wei Xiong ◽  
Lei Zhou ◽  
Ling Yue ◽  
Lirong Li ◽  
Song Wang

AbstractBinarization plays an important role in document analysis and recognition (DAR) systems. In this paper, we present our winning algorithm in ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018), which is based on background estimation and energy minimization. First, we adopt mathematical morphological operations to estimate and compensate the document background. It uses a disk-shaped structuring element, whose radius is computed by the minimum entropy-based stroke width transform (SWT). Second, we perform Laplacian energy-based segmentation on the compensated document images. Finally, we implement post-processing to preserve text stroke connectivity and eliminate isolated noise. Experimental results indicate that the proposed method outperforms other state-of-the-art techniques on several public available benchmark datasets.


Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4171
Author(s):  
Rabia Ikram ◽  
Badrul Mohamed Jan ◽  
Akhmal Sidek ◽  
George Kenanakis

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.


Author(s):  
Pengcheng Wang ◽  
Jonathan Rowe ◽  
Wookhee Min ◽  
Bradford Mott ◽  
James Lester

Interactive narrative planning offers significant potential for creating adaptive gameplay experiences. While data-driven techniques have been devised that utilize player interaction data to induce policies for interactive narrative planners, they require enormously large gameplay datasets. A promising approach to addressing this challenge is creating simulated players whose behaviors closely approximate those of human players. In this paper, we propose a novel approach to generating high-fidelity simulated players based on deep recurrent highway networks and deep convolutional networks. Empirical results demonstrate that the proposed models significantly outperform the prior state-of-the-art in generating high-fidelity simulated player models that accurately imitate human players’ narrative interactions. Using the high-fidelity simulated player models, we show the advantage of more exploratory reinforcement learning methods for deriving generalizable narrative adaptation policies.


Author(s):  
Jwalin Bhatt ◽  
Khurram Azeem Hashmi ◽  
Muhammad Zeshan Afzal ◽  
Didier Stricker

In any document, graphical elements like tables, figures, and formulas contain essential information. The processing and interpretation of such information require specialized algorithms. Off-the-shelf OCR components cannot process this information reliably. Therefore, an essential step in document analysis pipelines is to detect these graphical components. It leads to a high-level conceptual understanding of the documents that makes digitization of documents viable. Since the advent of deep learning, the performance of deep learning-based object detection has improved many folds. In this work, we outline and summarize the deep learning approaches for detecting graphical page objects in the document images. Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images. This work provides a comprehensive understanding of the current state-of-the-art and related challenges. Furthermore, we discuss leading datasets along with the quantitative evaluation. Moreover, it discusses briefly the promising directions that can be utilized for further improvements.


Sign in / Sign up

Export Citation Format

Share Document