document image processing
Recently Published Documents


TOTAL DOCUMENTS

55
(FIVE YEARS 5)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Ushasi Chaudhuri

Rough set is a well-studied subject with a theoretical foundation and many applications. However, its usage in image processing has been very sparse. Most of the well-known algorithms for document image processing related to character recognition, character spotting, and logo retrieval resort to supervised classification, causing the system to slow down in the speed with increasing diversity in the documents, as well as the need to have a large training dataset. Hence, with an aim to resolve the tediousness and pitfalls of training, but without compromising on the efficiency, we introduce a rough-set-theoretic model. It is designed to perform an unsupervised classification of optical characters and logos with a small subset of attributes, called the semi-reduct. The semi-reduct attributes are mostly geometric and topological in nature, each having a small range of discrete values estimated from different combinatorial characteristics of rough-set approximations. This eventually leads to quick and easy discernibility of almost all the characters and logos. In this thesis, we first explain the basics of rough set theory. Subsequently, we propose various attributes that can be easily computed from the binary representation of the images. In subsequent chapters we show how one can select an appropriate subset of such attributes, known as semi-reduct, to perform a document processing task. We demonstrate in this thesis that using the above attributes one can design a character recognition system that is both computationally and storage efficient. Using a different semi-reduct, we show that one can also solve the very delicate task of character spotting in ancient inscriptions. Additionally, we propose appropriate pre-processing steps to binarize the old and dilapidated inscriptions. Finally, we propose a novel technique for logo retrieval using a suitably prepared semi-reduct. Comparison with other existing techniques substantiates our claim that attributes from the rough set are indeed good candidates for document image processing.


2020 ◽  
Author(s):  
Álysson De Sá Soares ◽  
Ricardo Batista Das Neves Junior ◽  
Byron Leite Dantas Bezerra

The digital relationship between companies and customers happens through online systems where consumers must upload their identification documents pictures to prove their identities. The existence of this large volume of document images encourages the research development to generate image processing systems to automate tasks usually performed by humans, such as Document Type Classification and Document Reading. The lack of identification documents public datasets delays the research development in document image processing because researchers need to attempt partnerships with private or governmental institutions to obtain the data or build their dataset. In this context, this work presents as main contributions a system to support the automatic creation of identification document public datasets and the Brazilian Identity Document Dataset (BID Dataset): the first Brazilian identification documents public dataset. To accomplish the current personal data privacy law, all information in the BID Dataset comes from fake data. This work aims to increase the velocity of research development in identification document image processing, considering that researchers will be able to use the BID Dataset to develop their research freely.


2019 ◽  
Vol 90 ◽  
pp. 12-22 ◽  
Author(s):  
Rizwan Qureshi ◽  
Muhammad Uzair ◽  
Khurram Khurshid ◽  
Hong Yan

2019 ◽  
Author(s):  
Ilia V. Safonov ◽  
Ilya V. Kurilin ◽  
Michael N. Rychagov ◽  
Ekaterina V. Tolstaya

2018 ◽  
Vol 4 (7) ◽  
pp. 84 ◽  
Author(s):  
Laurence Likforman-Sulem ◽  
Ergina Kavallieratou

Author(s):  
Yann Leydier ◽  
Jean Duong ◽  
Stephane Bres ◽  
Veronique Eglin ◽  
Frank Lebourgeois ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document