scholarly journals Moment invariant-based features for Jawi character recognition

Author(s):  
Fitri Arnia ◽  
Khairun Saddami ◽  
Khairul Munadi

<p>Ancient manuscripts written in Malay-Arabic characters, which are known as "Jawi" characters, are mostly found in Malay world. Nowadays, many of the manuscripts have been digitalized. Unlike Roman letters, there is no optical character recognition (OCR) software for Jawi characters. This article proposes a new algorithm for Jawi character recognition based on Hu’s moment as an invariant feature that we call the tree root (TR) algorithm. The TR algorithm allows every Jawi character to have a unique combination of moment. Seven values of the Hu’s moment are calculated from all Jawi characters, which consist of 36 isolated, 27 initial, 27 middle, and 35 end characters; this makes a total of 125 characters. The TR algorithm was then applied to recognize these characters. To assess the TR algorithm, five characters that had been rotated to 90o and 180o and scaled with factors of 0.5 and 2 were used. Overall, the recognition rate of the TR algorithm was 90.4%; 113 out of 125 characters have a unique combination of moment values, while testing on rotated and scaled characters achieved 82.14% recognition rate. The proposed method showed a superior performance compared with the Support Vector Machine and Euclidian Distance as classifier.</p>

Author(s):  
Mohammed Erritali ◽  
Youssef Chouni ◽  
Youssef Ouadid

The main difficulty in developing a successful optical character recognition (OCR) system lies in the confusion between the characters. In the case of Amazigh writing (Tifinagh alphabets), some characters have similarities based on rotation or scale. Most of the researchers attempted to solve this problem by combining multiple descriptors and / or classifiers which increased the recognition rate, but at the expense of processing time that becomes more prohibitive. Thus, reducing the confusion of characters and their recognition times is the major challenge of OCR systems. In this chapter, the authors present an off-line OCR system for Tifinagh characters.


Handwritten character recognition (HCR) mainly entails optical character recognition. However, HCR involves in formatting and segmentation of the input. HCR is still an active area of research due to the fact that numerous verification in writing style, shape, size to individuals. The main difficult part of Indian handwritten recognition has overlapping between characters. These overlapping shaped characters are difficult to recognize that may lead to low recognition rate. These factors also increase the complexity of handwritten character recognition. This paper proposes a new approach to identify handwritten characters for Telugu language using Deep Learning (DL). The proposed work can be enhance the recognition rate of individual characters. The proposed approach recognizes with overall accuracy is 94%.


Author(s):  
Binod Kumar Prasad

Purpose of the study: The purpose of this work is to present an offline Optical Character Recognition system to recognise handwritten English numerals to help automation of document reading. It helps to avoid tedious and time-consuming manual typing to key in important information in a computer system to preserve it for a longer time. Methodology: This work applies Curvature Features of English numeral images by encoding them in terms of distance and slope. The finer local details of images have been extracted by using Zonal features. The feature vectors obtained from the combination of these features have been fed to the KNN classifier. The whole work has been executed using the MatLab Image Processing toolbox. Main Findings: The system produces an average recognition rate of 96.67% with K=1 whereas, with K=3, the rate increased to 97% with corresponding errors of 3.33% and 3% respectively. Out of all the ten numerals, some numerals like ‘3’ and ‘8’ have shown respectively lower recognition rates. It is because of the similarity between their structures. Applications of this study: The proposed work is related to the recognition of English numerals. The model can be used widely for recognition of any pattern like signature verification, face recognition, character or word recognition in another language under Natural Language Processing, etc. Novelty/Originality of this study: The novelty of the work lies in the process of feature extraction. Curves present in the structure of a numeral sample have been encoded based on distance and slope thereby presenting Distance features and Slope features. Vertical Delta Distance Coding (VDDC) and Horizontal Delta Distance Coding (HDDC) encode a curve from vertical and horizontal directions to reveal concavity and convexity from different angles.


Theoretical—This paper shows a camera based assistive content perusing of item marks from articles to support outwardly tested individuals. Camera fills in as fundamental wellspring of info. To recognize the items, the client will move the article before camera and this moving item will be identified by Background Subtraction (BGS) Method. Content district will be naturally confined as Region of Interest (ROI). Content is extricated from ROI by consolidating both guideline based and learning based technique. A tale standard based content limitation calculation is utilized by recognizing geometric highlights like pixel esteem, shading force, character size and so forth and furthermore highlights like Gradient size, slope width and stroke width are found out utilizing SVM classifier and a model is worked to separate content and non-content area. This framework is coordinated with OCR (Optical Character Recognition) to extricate content and the separated content is given as a voice yield to the client. The framework is assessed utilizing ICDAR-2011 dataset which comprise of 509 common scene pictures with ground truth.


Author(s):  
Yasir Babiker Hamdan ◽  
Sathish

There are many applications of the handwritten character recognition (HCR) approach still exist. Reading postal addresses in various states contains different languages in any union government like India. Bank check amounts and signature verification is one of the important application of HCR in the automatic banking system in all developed countries. The optical character recognition of the documents is comparing with handwriting documents by a human. This OCR is used for translation purposes of characters from various types of files such as image, word document files. The main aim of this research article is to provide the solution for various handwriting recognition approaches such as touch input from the mobile screen and picture file. The recognition approaches performing with various methods that we have chosen in artificial neural networks and statistical methods so on and to address nonlinearly divisible issues. This research article consisting of various approaches to compare and recognize the handwriting characters from the image documents. Besides, the research paper is comparing statistical approach support vector machine (SVM) classifiers network method with statistical, template matching, structural pattern recognition, and graphical methods. It has proved Statistical SVM for OCR system performance that is providing a good result that is configured with machine learning approach. The recognition rate is higher than other methods mentioned in this research article. The proposed model has tested on a training section that contained various stylish letters and digits to learn with a higher accuracy level. We obtained test results of 91% of accuracy to recognize the characters from documents. Finally, we have discussed several future tasks of this research further.


Author(s):  
Soumya De ◽  
R. Joe Stanley ◽  
Beibei Cheng ◽  
Sameer Antani ◽  
Rodney Long ◽  
...  

Images in biomedical publications often convey important information related to an article's content. When referenced properly, these images aid in clinical decision support. Annotations such as text labels and symbols, as provided by medical experts, are used to highlight regions of interest within the images. These annotations, if extracted automatically, could be used in conjunction with either the image caption text or the image citations (mentions) in the articles to improve biomedical information retrieval. In the current study, automatic detection and recognition of text labels in biomedical publication images was investigated. This paper presents both image analysis and feature-based approaches to extract and recognize specific regions of interest (text labels) within images in biomedical publications. Experiments were performed on 6515 characters extracted from text labels present in 200 biomedical publication images. These images are part of the data set from ImageCLEF 2010. Automated character recognition experiments were conducted using geometry-, region-, exemplar-, and profile-based correlation features and Fourier descriptors extracted from the characters. Correct recognition as high as 92.67% was obtained with a support vector machine classifier, compared to a 75.90% correct recognition rate with a benchmark Optical Character Recognition technique.


Author(s):  
Ahmed M. Zeki ◽  
Mohamad S. Zakaria ◽  
Choong-Yeun Liong

The cursive nature of Arabic writing is the main challenge to Arabic Optical Character Recognition developer. Methods to segment Arabic words into characters have been proposed. This paper provides a comprehensive review of the methods proposed by researchers to segment Arabic characters. The segmentation methods are categorized into nine different methods based on techniques used. The advantages and drawbacks of each are presented and discussed. Most researchers did not report the segmentation accuracy in their research; instead, they reported the overall recognition rate which did not reflect the influence of each sub-stage on the final recognition rate. The size of the training/testing data was not large enough to be generalized. The field of Arabic Character Recognition needs a standard set of test documents in both image and character formats, together with the ground truth and a set of performance evaluation tools, which would enable comparing the performance of different algorithms. As each method has its strengths, a hybrid segmentation approach is a promising method. The paper concludes that there is still no perfect segmentation method for ACR and much opportunity for research in this area.


2010 ◽  
Vol 171-172 ◽  
pp. 73-77
Author(s):  
Ying Jie Liu ◽  
Fu Cheng You

It is difficult to process touching or broken characters in practical applications on optical character recognition. For touching or broken characters, a method based on mathematical morphology of binary image is put forward in the paper. On the basis of the relative theories of digital image processing, the overall process is introduced including separation of touching characters and connection of broken characters. First of all, character image is pre-processed through smoothing and threshold segmentation in order to generate binary image of characters. Then character regions which are touching or broken are processed through different operators of mathematical morphology of binary image by different structuring elements. Thus the touching characters are separated and broken characters are connected. For higher recognition rate, further processes are done to achieve normal and individual character regions.


Sign in / Sign up

Export Citation Format

Share Document