Putting the horse before the cart: rapid access to data banks by the ‘SIGNPOSTS’* method

1992 ◽  
Vol 18 (1) ◽  
pp. 3-9
Author(s):  
Audrey M. Adams

Document Image Processing (DIP) provides an efficient computer storage of information. Efficient retrieval of information is equally important and requires good indexing. This is a semantics prob lem only human indexers can solve. Data banks even more than books require good indexers to provide efficient access to information. ‘SIGNPOSTS’ provides a matrix for establishing a fully articulated index for databases.

1990 ◽  
Author(s):  
Jerry P. Skelton ◽  
Anthony P. Cavallo ◽  
Julie Peternick

2021 ◽  
Author(s):  
Ushasi Chaudhuri

Rough set is a well-studied subject with a theoretical foundation and many applications. However, its usage in image processing has been very sparse. Most of the well-known algorithms for document image processing related to character recognition, character spotting, and logo retrieval resort to supervised classification, causing the system to slow down in the speed with increasing diversity in the documents, as well as the need to have a large training dataset. Hence, with an aim to resolve the tediousness and pitfalls of training, but without compromising on the efficiency, we introduce a rough-set-theoretic model. It is designed to perform an unsupervised classification of optical characters and logos with a small subset of attributes, called the semi-reduct. The semi-reduct attributes are mostly geometric and topological in nature, each having a small range of discrete values estimated from different combinatorial characteristics of rough-set approximations. This eventually leads to quick and easy discernibility of almost all the characters and logos. In this thesis, we first explain the basics of rough set theory. Subsequently, we propose various attributes that can be easily computed from the binary representation of the images. In subsequent chapters we show how one can select an appropriate subset of such attributes, known as semi-reduct, to perform a document processing task. We demonstrate in this thesis that using the above attributes one can design a character recognition system that is both computationally and storage efficient. Using a different semi-reduct, we show that one can also solve the very delicate task of character spotting in ancient inscriptions. Additionally, we propose appropriate pre-processing steps to binarize the old and dilapidated inscriptions. Finally, we propose a novel technique for logo retrieval using a suitably prepared semi-reduct. Comparison with other existing techniques substantiates our claim that attributes from the rough set are indeed good candidates for document image processing.


2008 ◽  
Author(s):  
Tijn van der Zant ◽  
Lambert Schomaker ◽  
Edwin Valentijn

2015 ◽  
Vol 15 (01) ◽  
pp. 1550005 ◽  
Author(s):  
Robert Keefer ◽  
Nikolaos Bourbakis

This paper offers a review of the state-of-the-art document image processing methods and their classification by identifying new trends for automatic document processing and understanding. Document image processing (DIP) is an important problem related with most of the challenges coming from the image processing field and with applications to digital document summarization, readers for the visually impaired etc. Difficulties in the processing of documents can arise from lighting conditions, page curl, page rotation in 3D, and page layout segmentation. Document image processing is usually performed in the context of higher-level applications that require an undistorted document image such as optical character recognition and document restoration/preservation. Typically, assumptions are made to constrain the processing problem in the context of a particular application. In this survey, we categorize document image processing methods on the basis of the technique, provide detailed descriptions of representative methods in each category, and examine their pros and cons. It important to notice here that the DIP field is broad, thus we try to provide a top–down/horizontal survey rather a bottom up. At the same time, we target the area of document readers for the blind, and use this application to guide us in a top–down survey of DIP. Moreover, we present a comparative survey based on important aspects of a marketable system that is dependent on document image processing techniques.


Sign in / Sign up

Export Citation Format

Share Document