HybridTabNet: Towards Better Table Detection in Scanned Document Images

Tables in document images are an important entity since they contain crucial information. Therefore, accurate table detection can significantly improve the information extraction from documents. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperformed earlier state-of-the-art results without depending on pre- and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduced the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on the UNLV dataset. The achieved results reflect the superior performance of the proposed method.

Download Full-text

HybridTabNet: Towards Better Table Detection in Scanned Document Images

10.20944/preprints202108.0360.v1 ◽

2021 ◽

Author(s):

Muhammad Zeshan Afzal ◽

Khurram Hashmi ◽

Marcus Liwicki ◽

Didier Stricker ◽

Danish Nazir ◽

...

Keyword(s):

State Of The Art ◽

Document Image ◽

Superior Performance ◽

Document Images ◽

Backbone Network ◽

Unseen Data ◽

Crucial Information ◽

Leave One Out ◽

Processing Steps ◽

Prior State

Tables in the document image are one of the most important entities since they contain crucial information. Therefore, accurate table detection can significantly improve information extraction from tables. In this work, we present a novel end-to-end trainable pipeline, HybridTabNet, for table detection in scanned document images. Our two-stage table detector uses the ResNeXt-101 backbone for feature extraction and Hybrid Task Cascade (HTC) to localize the tables in scanned document images. Moreover, we replace conventional convolutions with deformable convolutions in the backbone network. This enables our network to detect tables of arbitrary layouts precisely. We evaluate our approach comprehensively on ICDAR-13, ICDAR-17 POD, ICDAR-19, TableBank, Marmot, and UNLV. Apart from the ICDAR-17 POD dataset, our proposed HybridTabNet outperforms earlier state-of-the-art results without depending on pre and post-processing steps. Furthermore, to investigate how the proposed method generalizes unseen data, we conduct an exhaustive leave-one-out-evaluation. In comparison to prior state-of-the-art results, our method reduces the relative error by 27.57% on ICDAR-2019-TrackA-Modern, 42.64% on TableBank (Latex), 41.33% on TableBank (Word), 55.73% on TableBank (Latex + Word), 10% on Marmot, and 9.67% on UNLV dataset. The achieved results reflect the superior performance of the proposed method.

Download Full-text

Neural Arabic Text Diacritization: State-of-the-Art Results and a Novel Approach for Arabic NLP Downstream Tasks

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3470849 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-25

Author(s):

Ali Fadel ◽

Ibraheem Tuffaha ◽

Mahmoud Al-Ayyoub

Keyword(s):

Neural Network ◽

State Of The Art ◽

Conditional Random Field ◽

Arabic Text ◽

Learning Models ◽

Post Processing ◽

Feed Forward Neural Network ◽

Novel Approach ◽

Normalized Gradient ◽

Processing Steps

In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF), and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models even those requiring human-crafted language-dependent post-processing steps, unlike ours. Moreover, we show how diacritics in Arabic can be used to enhance the models of downstream NLP tasks such as Machine Translation (MT) and Sentiment Analysis (SA) by proposing novel Translation over Diacritization (ToD) and Sentiment over Diacritization (SoD) approaches.

Download Full-text

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

Applied Sciences ◽

10.3390/app11167610 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7610

Author(s):

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Computer Vision ◽

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Detection Accuracy ◽

Document Images ◽

Backbone Network ◽

Mathematical Formulas ◽

Composite Connections ◽

Areas Of Interest

This paper presents a novel architecture for detecting mathematical formulas in document images, which is an important step for reliable information extraction in several domains. Recently, Cascade Mask R-CNN networks have been introduced to solve object detection in computer vision. In this paper, we suggest a couple of modifications to the existing Cascade Mask R-CNN architecture: First, the proposed network uses deformable convolutions instead of conventional convolutions in the backbone network to spot areas of interest better. Second, it uses a dual backbone of ResNeXt-101, having composite connections at the parallel stages. Finally, our proposed network is end-to-end trainable. We evaluate the proposed approach on the ICDAR-2017 POD and Marmot datasets. The proposed approach demonstrates state-of-the-art performance on ICDAR-2017 POD at a higher IoU threshold with an f1-score of 0.917, reducing the relative error by 7.8%. Moreover, we accomplished correct detection accuracy of 81.3% on embedded formulas on the Marmot dataset, which results in a relative error reduction of 30%.

Download Full-text

Cascade Network with Deformable Composite Backbone for Formula Detection in Scanned Document Images

10.20944/preprints202107.0165.v1 ◽

2021 ◽

Author(s):

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Computer Vision ◽

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Detection Accuracy ◽

Document Images ◽

Backbone Network ◽

Mathematical Formulas ◽

Composite Connections ◽

Areas Of Interest

Download Full-text

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

10.20944/preprints202109.0059.v1 ◽

2021 ◽

Author(s):

Khurram Azeem Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Reliable Information ◽

Document Images ◽

Post Processing ◽

Preliminary Step ◽

Backbone Networks ◽

Previous State ◽

Feature Pyramid

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparatively lightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

Download Full-text

CasTabDetectoRS: Cascade Network for Table Detection in Document Images with Recursive Feature Pyramid and Switchable Atrous Convolution

Journal of Imaging ◽

10.3390/jimaging7100214 ◽

2021 ◽

Vol 7 (10) ◽

pp. 214

Author(s):

Khurram Hashmi ◽

Alain Pagani ◽

Marcus Liwicki ◽

Didier Stricker ◽

Muhammad Zeshan Afzal

Keyword(s):

Relative Error ◽

State Of The Art ◽

Error Reduction ◽

Reliable Information ◽

Document Images ◽

Post Processing ◽

Preliminary Step ◽

Backbone Networks ◽

Previous State ◽

Feature Pyramid

Table detection is a preliminary step in extracting reliable information from tables in scanned document images. We present CasTabDetectoRS, a novel end-to-end trainable table detection framework that operates on Cascade Mask R-CNN, including Recursive Feature Pyramid network and Switchable Atrous Convolution in the existing backbone architecture. By utilizing a comparativelyightweight backbone of ResNet-50, this paper demonstrates that superior results are attainable without relying on pre- and post-processing methods, heavier backbone networks (ResNet-101, ResNeXt-152), and memory-intensive deformable convolutions. We evaluate the proposed approach on five different publicly available table detection datasets. Our CasTabDetectoRS outperforms the previous state-of-the-art results on four datasets (ICDAR-19, TableBank, UNLV, and Marmot) and accomplishes comparable results on ICDAR-17 POD. Upon comparing with previous state-of-the-art results, we obtain a significant relative error reduction of 56.36%, 20%, 4.5%, and 3.5% on the datasets of ICDAR-19, TableBank, UNLV, and Marmot, respectively. Furthermore, this paper sets a new benchmark by performing exhaustive cross-datasets evaluations to exhibit the generalization capabilities of the proposed method.

Download Full-text

Seal Extraction Based on Clustering and Local Thresholding Techniques

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.3467 ◽

2014 ◽

Vol 926-930 ◽

pp. 3467-3470

Author(s):

Cheng Yun Wang ◽

You Bin Chen

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Document Images ◽

Post Processing ◽

Color Information ◽

Nearest Neighbor Classifier ◽

Local Thresholding ◽

Processing Steps ◽

Neighbor Classifier

In this paper, we propose a method to extract seal imprints from bank document images based on color information. Firstly we use K-means clustering algorithm and local thresholding techniques to distinguish seal imprints from background areas. Secondly we wipe off printed characters and other noise upon the nearest neighbor classifier. Finally through a series of post-processing steps some missing pixels are added to the seal imprints. The result of our experiments proves that this method is capable of extracting Chinese seal imprints in most cases.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

Tailoring the physical characteristics of solution blown cellulosic nonwovens by various post-treatments

Nordic Pulp & Paper Research Journal ◽

10.1515/npprj-2021-0025 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Kerstin Jedvert ◽

Linnea Viklund ◽

Mårten Alkhagen ◽

Tobias Köhnke ◽

Hans Theliander

Keyword(s):

Ionic Liquid ◽

Physical Characteristics ◽

Direct Solution ◽

Post Processing ◽

Drying Method ◽

Solution Blowing ◽

Renewable Feedstock ◽

Processing Steps

Abstract Nonwovens are increasing in demand due to their versatility which enables use in a broad range of applications. Most nonwovens are still produced from fossil-based resources and there is thus a need to develop competitive materials from renewable feedstock. In this work, nonwovens are produced from cellulose via a direct solution blowing method. Cellulose was dissolved using the ionic liquid 1-ethyl-3-methylimidazolium acetate (EMIMAc) and was regenerated into nonwovens by coagulation in water. The properties of such nonwovens were previously rather stiff and papery-like and the aim of this work was to improve the softness and feel of the materials by simple adjustments of the post-processing steps, i. e. washing and drying. It was shown that by primarily changing the drying method, it was possible to create a much softer and bulkier material using the same solution blowing parameters.

Download Full-text

Utilization of Eco-Friendly Waste Generated Nanomaterials in Water-Based Drilling Fluids; State of the Art Review

Materials ◽

10.3390/ma14154171 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4171

Author(s):

Rabia Ikram ◽

Badrul Mohamed Jan ◽

Akhmal Sidek ◽

George Kenanakis

Keyword(s):

State Of The Art ◽

Superior Performance ◽

Future Research ◽

Drilling Fluids ◽

Drill Cuttings ◽

Environmental Friendly ◽

Water Based ◽

Drilling Operations ◽

Filtration Properties

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.

Download Full-text