A Document Image Preprocessing System for Keyword Spotting

Author(s):  
C. B. Jeong ◽  
S. H. Kim
2000 ◽  
Vol 80 (1) ◽  
pp. 45-55 ◽  
Author(s):  
Win-Long Lee ◽  
Kuo-Chin Fan

Author(s):  
CEM ERGÜN ◽  
SAJEDEH NOROZPOUR

In this paper, a new representation of Farsi words is proposed to present the keyword spotting problems in Farsi document image retrieval. In this regard, we define a signature for each Farsi word based on the word connected component layout. The mentioned signature is shown as boxes, and then, by sketching vertical and horizontal lines, we construct a grid of each word to provide a new descriptor. One of the advantages of this method is that it can be used for both handwritten and machine-printed texts. Finally, to evaluate the performance of our system in comparison to other methods, a database that contains 19,582 printed Farsi words is examined, and after applying this approach, a recall rate of 98.1% and a precision rate of 94.3% are obtained.


2018 ◽  
Vol 29 (1) ◽  
pp. 719-735 ◽  
Author(s):  
Samir Malakar ◽  
Manosij Ghosh ◽  
Ram Sarkar ◽  
Mita Nasipuri

Abstract Word searching or keyword spotting is an important research problem in the domain of document image processing. The solution to the said problem for handwritten documents is more challenging than for printed ones. In this work, a two-stage word searching schema is introduced. In the first stage, all the irrelevant words with respect to a search word are filtered out from the document page image. This is carried out using a zonal feature vector, called pre-selection feature vector, along with a rule-based binary classification method. In the next step, a holistic word recognition paradigm is used to confirm a pre-selected word as search word. To accomplish this, a modified histogram of oriented gradients-based feature descriptor is combined with a topological feature vector. This method is experimented on a QUWI English database, which is freely available through the International Conference on Document Analysis and Recognition 2015 competition entitled “Writer Identification and Gender Classification.” This technique not only provides good retrieval performance in terms of recall, precision, and F-measure scores, but it also outperforms some state-of-the-art methods.


Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4648
Author(s):  
Subhranil Kundu ◽  
Samir Malakar ◽  
Zong Woo Geem ◽  
Yoon Young Moon ◽  
Pawan Kumar Singh ◽  
...  

Handwritten keyword spotting (KWS) is of great interest to the document image research community. In this work, we propose a learning-free keyword spotting method following query by example (QBE) setting for handwritten documents. It consists of four key processes: pre-processing, vertical zone division, feature extraction, and feature matching. The pre-processing step deals with the noise found in the word images, and the skewness of the handwritings caused by the varied writing styles of the individuals. Next, the vertical zone division splits the word image into several zones. The number of vertical zones is guided by the number of letters in the query word image. To obtain this information (i.e., number of letters in a query word image) during experimentation, we use the text encoding of the query word image. The user provides the information to the system. The feature extraction process involves the use of the Hough transform. The last step is feature matching, which first compares the features extracted from the word images and then generates a similarity score. The performance of this algorithm has been tested on three publicly available datasets: IAM, QUWI, and ICDAR KWS 2015. It is noticed that the proposed method outperforms state-of-the-art learning-free KWS methods considered here for comparison while evaluated on the present datasets. We also evaluate the performance of the present KWS model using state-of-the-art deep features and it is found that the features used in the present work perform better than the deep features extracted using InceptionV3, VGG19, and DenseNet121 models.


Sign in / Sign up

Export Citation Format

Share Document