A Document Image Preprocessing System for Keyword Spotting

In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.

Download Full-text

Development of a Two-Stage Segmentation-Based Word Searching Method for Handwritten Document Images

Journal of Intelligent Systems ◽

10.1515/jisys-2017-0384 ◽

2018 ◽

Vol 29 (1) ◽

pp. 719-735 ◽

Cited By ~ 2

Author(s):

Samir Malakar ◽

Manosij Ghosh ◽

Ram Sarkar ◽

Mita Nasipuri

Keyword(s):

Feature Vector ◽

Binary Classification ◽

Research Problem ◽

Document Image ◽

Feature Descriptor ◽

Keyword Spotting ◽

Two Stage ◽

Handwritten Documents ◽

And Gender ◽

Searching Method

Abstract Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.

Download Full-text

Weighted PCA for improving Document Image Retrieval System based on keyword spotting accuracy

2013 36th International Conference on Telecommunications and Signal Processing (TSP) ◽

10.1109/tsp.2013.6614043 ◽

2013 ◽

Cited By ~ 8

Author(s):

Reza Tavoli ◽

Ehsan Kozegar ◽

Mohammad Shojafar ◽

Hossein Soleimani ◽

Zahra Pooranian

Keyword(s):

Image Retrieval ◽

Retrieval System ◽

Document Image ◽

Keyword Spotting ◽

Image Retrieval System

Download Full-text

TWO SIMPLE ALGORITHMS FOR DOCUMENT IMAGE PREPROCESSING - Making a document scanning application more user-friendly

Proceedings of the Seventh International Conference on Enterprise Information Systems ◽

10.5220/0002554501160121 ◽

2005 ◽

Keyword(s):

Document Image ◽

Image Preprocessing ◽

User Friendly

Download Full-text

Degraded document image preprocessing using local adaptive sharpening and illumination compensation

Pattern Analysis and Applications ◽

10.1007/s10044-021-01038-z ◽

2022 ◽

Author(s):

Hong Xia Wang ◽

Bang Song ◽

Jian Chen ◽

Yi Yang

Keyword(s):

Document Image ◽

Illumination Compensation ◽

Image Preprocessing ◽

Degraded Document

Download Full-text

Keyword Spotting in Historical Bangla Handwritten Document Image Using CNN

2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP) ◽

10.1109/icaccp.2019.8882879 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sugata Das ◽

Sekhar Mandal

Keyword(s):

Document Image ◽

Keyword Spotting ◽

Handwritten Document

Download Full-text

Automated document image preprocessing management utilizing grey-scale image analysis and neural network classification

6th International Conference on Image Processing and its Applications ◽

10.1049/cp:19970944 ◽

1997 ◽

Cited By ~ 3

Author(s):

J. Sauvola

Keyword(s):

Neural Network ◽

Image Analysis ◽

Document Image ◽

Image Preprocessing ◽

Neural Network Classification ◽

Grey Scale

Download Full-text

Hough Transform-Based Angular Features for Learning-Free Handwritten Keyword Spotting

Sensors ◽

10.3390/s21144648 ◽

2021 ◽

Vol 21 (14) ◽

pp. 4648

Author(s):

Subhranil Kundu ◽

Samir Malakar ◽

Zong Woo Geem ◽

Yoon Young Moon ◽

Pawan Kumar Singh ◽

...

Keyword(s):

Feature Extraction ◽

Hough Transform ◽

Feature Matching ◽

State Of The Art ◽

Document Image ◽

Keyword Spotting ◽

Vertical Zone ◽

Word Images ◽

Query Word ◽

Zone Division

Handwritten keyword spotting (KWS) is of great interest to the document image research community. In this work, we propose a learning-free keyword spotting method following query by example (QBE) setting for handwritten documents. It consists of four key processes: pre-processing, vertical zone division, feature extraction, and feature matching. The pre-processing step deals with the noise found in the word images, and the skewness of the handwritings caused by the varied writing styles of the individuals. Next, the vertical zone division splits the word image into several zones. The number of vertical zones is guided by the number of letters in the query word image. To obtain this information (i.e., number of letters in a query word image) during experimentation, we use the text encoding of the query word image. The user provides the information to the system. The feature extraction process involves the use of the Hough transform. The last step is feature matching, which first compares the features extracted from the word images and then generates a similarity score. The performance of this algorithm has been tested on three publicly available datasets: IAM, QUWI, and ICDAR KWS 2015. It is noticed that the proposed method outperforms state-of-the-art learning-free KWS methods considered here for comparison while evaluated on the present datasets. We also evaluate the performance of the present KWS model using state-of-the-art deep features and it is found that the features used in the present work perform better than the deep features extracted using InceptionV3, VGG19, and DenseNet121 models.

Download Full-text

Neural networks for document image preprocessing: state of the art

Artificial Intelligence Review ◽

10.1007/s10462-012-9337-z ◽

2012 ◽

Vol 42 (2) ◽

pp. 253-273 ◽

Cited By ~ 43

Author(s):

Amjad Rehman ◽

Tanzila Saba

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Document Image ◽

Image Preprocessing

Download Full-text