Using stochastic syntactic analysis for extracting a logical structure from a document image

This chapter introduces a data mining method for the discovery of association rules from images of scanned paper documents. It argues that a document image is a multi-modal unit of analysis whose semantics is deduced from a combination of both the textual content and the layout structure and the logical structure. Therefore, it proposes a method where both the spatial information derived from a complex document image analysis process (layout analysis), and the information extracted from the logical structure of the document (document image classification and understanding) and the textual information extracted by means of an OCR, are simultaneously considered to generate interesting patterns. The proposed method is based on an inductive logic programming approach, which is argued to be the most appropriate to analyze data available in more than one modality. It contributes to show a possible evolution of the unimodal knowledge discovery scheme, according to which different types of data describing the units of analysis are dealt with through the application of some preprocessing technique that transform them into a single double entry tabular data.

Download Full-text

Global-local-global method for logical structure extraction of form document image

Journal of Electronic Imaging ◽

10.1117/1.482746 ◽

2000 ◽

Vol 9 (3) ◽

pp. 296 ◽

Cited By ~ 1

Author(s):

Hong Zhao

Keyword(s):

Logical Structure ◽

Document Image ◽

Global Method ◽

Structure Extraction

Download Full-text

New method for logical structure extraction of form document image

10.1117/12.335816 ◽

1999 ◽

Cited By ~ 2

Author(s):

Bing Liu ◽

Zao Jiang ◽

Hong Zhao ◽

Tobias Ostgathe

Keyword(s):

Logical Structure ◽

Document Image ◽

New Method ◽

Structure Extraction

Download Full-text

Evidence for syntactic analysis of visual patterns

PsycEXTRA Dataset ◽

10.1037/e665402011-303 ◽

1991 ◽

Author(s):

Richard Chechile ◽

Jane Anderson

Keyword(s):

Syntactic Analysis ◽

Visual Patterns

Download Full-text

THE TECHNIQUE OF EXTRACTION TEXT AREAS ON SCANNED DOCUMENT IMAGE USING LINEAR FILTRATION

Applied Aspects of Information Technology ◽

10.15276/aait.03.2019.3 ◽

2019 ◽

Vol 2 (3) ◽

pp. 206-215

Author(s):

Alesya Ishchenko ◽

Alexandr Nesteryuk ◽

Marina Polyakova

Keyword(s):

Document Image ◽

Linear Filtration

Download Full-text

Simple and Efficient Document Image Binarization Technique For Degraded Document Images

International Journal of Scientific Research ◽

10.15373/22778179/may2014/65 ◽

2012 ◽

Vol 3 (5) ◽

pp. 217-220

Author(s):

Manju Joseph ◽

◽

Jijina K.P Jijina K.P

Keyword(s):

Document Image ◽

Document Images ◽

Image Binarization ◽

Document Image Binarization ◽

Degraded Document

Download Full-text

Generalized Convolutions of Functions and their Application to Syntactic Analysis of Distorted Sequences

Journal of Automation and Information Sciences ◽

10.1615/jautomatinfscien.v28.i1-2.80 ◽

1996 ◽

Vol 28 (1-2) ◽

pp. 70-84

Author(s):

M. I. Shlezinger

Keyword(s):

Syntactic Analysis

Download Full-text

Document Image Quality Assessment with Relaying Reference to Determine Minimum Readable Resolution for Compression

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.9.iqsp-323 ◽

2020 ◽

Vol 2020 (9) ◽

pp. 323-1-323-8

Author(s):

Litao Hu ◽

Zhenhua Hu ◽

Peter Bauer ◽

Todd J. Harris ◽

Jan P. Allebach

Keyword(s):

Image Quality ◽

Quality Assessment ◽

Image Quality Assessment ◽

Research Area ◽

Input Image ◽

Quality Score ◽

Document Image ◽

Digital Cameras ◽

Active Research ◽

Traditional Approaches

Image quality assessment has been a very active research area in the field of image processing, and there have been numerous methods proposed. However, most of the existing methods focus on digital images that only or mainly contain pictures or photos taken by digital cameras. Traditional approaches evaluate an input image as a whole and try to estimate a quality score for the image, in order to give viewers an idea of how “good” the image looks. In this paper, we mainly focus on the quality evaluation of contents of symbols like texts, bar-codes, QR-codes, lines, and hand-writings in target images. Estimating a quality score for this kind of information can be based on whether or not it is readable by a human, or recognizable by a decoder. Moreover, we mainly study the viewing quality of the scanned document of a printed image. For this purpose, we propose a novel image quality assessment algorithm that is able to determine the readability of a scanned document or regions in a scanned document. Experimental results on some testing images demonstrate the effectiveness of our method.

Download Full-text

An Adaptive Binarization Method for Cost-efficient Document Image System in Wavelet Domain

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.3.030401 ◽

2020 ◽

Vol 64 (3) ◽

pp. 30401-1-30401-14 ◽

Cited By ~ 1

Author(s):

Chih-Hsien Hsia ◽

Ting-Yu Lin ◽

Jen-Shiun Chiang

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

Document Image ◽

Background Information ◽

Raspberry Pi ◽

Discrete Wavelet ◽

Image System ◽

Low Frequencies ◽

Cost Efficient ◽

Binarization Method

Abstract In recent years, the preservation of handwritten historical documents and scripts archived by digitized images has been gradually emphasized. However, the selection of different thicknesses of the paper for printing or writing is likely to make the content of the back page seep into the front page. In order to solve this, a cost-efficient document image system is proposed. In this system, the authors use Adaptive Directional Lifting-Based Discrete Wavelet Transform to transform image data from spatial domain to frequency domain and perform on high and low frequencies, respectively. For low frequencies, the authors use local threshold to remove most background information. For high frequencies, they use modified Least Mean Square training algorithm to produce a unique weighted mask and perform convolution on original frequency, respectively. Afterward, Inverse Adaptive Directional Lifting-Based Discrete Wavelet Transform is performed to reconstruct the four subband images to a resulting image with original size. Finally, a global binarization method, Otsu’s method, is applied to transform a gray scale image to a binary image as the output result. The results show that the difference in operation time of this work between a personal computer (PC) and Raspberry Pi is little. Therefore, the proposed cost-efficient document image system which performed on Raspberry Pi embedded platform has the same performance and obtains the same results as those performed on a PC.

Download Full-text

The Logical Structure and Meaning of Romans 1:14-17

Korean Evangelical New Testament Sudies ◽

10.24229/kents.2011.10.3.005 ◽

2011 ◽

Vol 10 (3) ◽

pp. 545-570

Author(s):

김현광

Keyword(s):

Logical Structure

Download Full-text