Holistic Word Descriptor for Lexicon Reduction in Handwritten Arabic Documents

2021 ◽  
pp. 108072
Author(s):  
Said Elaiwat
Author(s):  
SAEED MOZAFFARI ◽  
KARIM FAEZ ◽  
VOLKER MÄRGNER ◽  
HAIKAL EL ABED

Given large number of words to be recognized, a two-stage strategy for eliminating unlikely candidates before recognition can be a reasonable and powerful approach for increasing the recognition speed. A holistic lexicon reduction technique for offline handwritten Arabic word recognition is proposed in this paper. The principle of this technique involves the extraction of dots and subwords from the cursive Arabic word image to describe its shape. In the first stage of reduction, the number of subwords in the input word is estimated. Then in the second stage, the word descriptor, based on the dots information, is used while taking into account only the candidates selected in the first stage. Experimental results on IFN/ENIT database, consisting of 26,459 cursive Arabic word images, show a lexicon reduction of 92.5% with accuracy of 74%.


2020 ◽  
Vol 17 (3) ◽  
pp. 299-305 ◽  
Author(s):  
Riaz Ahmad ◽  
Saeeda Naz ◽  
Muhammad Afzal ◽  
Sheikh Rashid ◽  
Marcus Liwicki ◽  
...  

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.


Sign in / Sign up

Export Citation Format

Share Document