Template Matching Based Probabilistic Optical Character Recognition for Urdu Nastaliq Script

This paper presents a technique for optical recognition of Urdu characters using template matching based on a probabilistic N-Gram language model. Dataset used has the collection of both printed and typed text. This model is able to perform three types of segmentations including line, ligature and character using horizontal projection, connected component labeling, corners and pointers techniques, respectively. A separate stochastic lexicon is built from a collected corpus, which contains the probability values of grams. By using template matching and the N-Gram language model, our study predicts complete segmented words with the promising result, particularly in case of bigrams. It outperforms three out of four existing models with an accuracy rate of 97.33%. Results achieved on our test dataset are encouraging in one perspective but provide direction to work for further improvement in this model.

Download Full-text

Offline OCR System for Machine-Printed Turkish Using Template Matching

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.341-342.565 ◽

2011 ◽

Vol 341-342 ◽

pp. 565-569 ◽

Cited By ~ 2

Author(s):

Ahmed Dena Rafaa ◽

Jan Nordin

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Noise Removal ◽

Connected Components ◽

Successful Implementation ◽

Horizontal Projection ◽

Text Documents ◽

Image File ◽

Result Show

One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike other approaches, template matching takes shorter time and does not require sample training but it is not able to recognize some letters with similar shape or combined letters, for this reason, this OCR system combines both the template matching and the size feature of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 630 names of cities in Turkey written by using Arial font with different sizes in uppercase, lowercase and capitalizes the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.

Download Full-text

A HYBRID LANGUAGE MODEL BASED ON STATISTICS AND LINGUISTIC RULES

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405003934 ◽

2005 ◽

Vol 19 (01) ◽

pp. 109-128 ◽

Cited By ~ 2

Author(s):

XIAOLONG WANG ◽

DANIEL S. YEUNG ◽

JAMES N. K. LIU ◽

ROBERT LUK ◽

XUAN WANG

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Large Scale ◽

Language Model ◽

Language Models ◽

Long Distance ◽

Language Understanding ◽

Linguistic Rules ◽

Hybrid Language ◽

N Gram

Language modeling is a current research topic in many domains including speech recognition, optical character recognition, handwriting recognition, machine translation and spelling correction. There are two main types of language models, the mathematical and the linguistic. The most widely used mathematical language model is the n-gram model inferred from statistics. This model has three problems: long distance restriction, recursive nature and partial language understanding. Language models based on linguistics present many difficulties when applied to large scale real texts. We present here a new hybrid language model that combines the advantages of the n-gram statistical language model with those of a linguistic language model which makes use of grammatical or semantic rules. Using suitable rules, this hybrid model can solve problems such as long distance restriction, recursive nature and partial language understanding. The new language model has been effective in experiments and has been incorporated in Chinese sentence input products for Windows and Macintosh OS.

Download Full-text

Automatic Receipt Recognition System Based on Artificial Intelligence Technology

Applied Sciences ◽

10.3390/app12020853 ◽

2022 ◽

Vol 12 (2) ◽

pp. 853

Author(s):

Cheng-Jian Lin ◽

Yu-Cheng Liu ◽

Chin-Ling Lee

Keyword(s):

Character Recognition ◽

Template Matching ◽

Recognition Accuracy ◽

Recognition System ◽

Character Segmentation ◽

Small Object ◽

Labor Costs ◽

Accuracy Rate ◽

Artificial Intelligence Technology ◽

S Model

In this study, an automatic receipt recognition system (ARRS) is developed. First, a receipt is scanned for conversion into a high-resolution image. Receipt characters are automatically placed into two categories according to the receipt characteristics: printed and handwritten characters. Images of receipts with these characters are preprocessed separately. For handwritten characters, template matching and the fixed features of the receipts are used for text positioning, and projection is applied for character segmentation. Finally, a convolutional neural network is used for character recognition. For printed characters, a modified You Only Look Once (version 4) model (YOLOv4-s) executes precise text positioning and character recognition. The proposed YOLOv4-s model reduces downsampling, thereby enhancing small-object recognition. Finally, the system produces recognition results in a tax declaration format, which can upload to a tax declaration system. Experimental results revealed that the recognition accuracy of the proposed system was 80.93% for handwritten characters. Moreover, the YOLOv4-s model had a 99.39% accuracy rate for printed characters; only 33 characters were misjudged. The recognition accuracy of the YOLOv4-s model was higher than that of the traditional YOLOv4 model by 20.57%. Therefore, the proposed ARRS can considerably improve the efficiency of tax declaration, reduce labor costs, and simplify operating procedures.

Download Full-text

Performance Evaluation of Automatic Number Plate Recognition on Android Smartphone Platform

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i4.pp1973-1982 ◽

2017 ◽

Vol 7 (4) ◽

pp. 1973

Author(s):

Teddy Surya Gunawan ◽

Abdul Mutholib ◽

Mira Kartiwi

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Processing Time ◽

Intelligent System ◽

Recognition Rate ◽

The Other ◽

Other Hand ◽

Additional Processing ◽

Artificial Neural Network Ann

Automatic Number Plate Recognition (ANPR) is an intelligent system which has the capability to recognize the character on vehicle number plate. Previous researches implemented ANPR system on personal computer (PC) with high resolution camera and high computational capability. On the other hand, not many researches have been conducted on the design and implementation of ANPR in smartphone platforms which has limited camera resolution and processing speed. In this paper, various steps to optimize ANPR, including pre-processing, segmentation, and optical character recognition (OCR) using artificial neural network (ANN) and template matching, were described. The proposed ANPR algorithm was based on Tesseract and Leptonica libraries. For comparison purpose, the template matching based OCR will be compared to ANN based OCR. Performance of the proposed algorithm was evaluated on the developed Malaysian number plates’ image database captured by smartphone’s camera. Results showed that the accuracy and processing time of the proposed algorithm using template matching was 97.5% and 1.13 seconds, respectively. On the other hand, the traditional algorithm using template matching only obtained 83.7% recognition rate with 0.98 second processing time. It shows that our proposed ANPR algorithm improved the recognition rate with negligible additional processing time.

Download Full-text

Unconstrained Handwritten Text Line Segmentation for Kannada Language

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9624.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 953-956

Keyword(s):

Character Recognition ◽

Recognition System ◽

Text Line ◽

Connected Component ◽

Horizontal Projection ◽

Text Documents ◽

Handwritten Text ◽

Kannada Language ◽

System Separation ◽

Line Segmentation

Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.

Download Full-text

PENGENALAN WARNA BERBASIS ANDROID MENGGUNAKAN METODE TEMPLATE MATCHING

Compiler ◽

10.28989/compiler.v7i1.278 ◽

2018 ◽

Vol 7 (1) ◽

Author(s):

Indra Hading Kurniawan ◽

Nurcahyani Dewi Retnowati

Keyword(s):

Success Rate ◽

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Matching Method ◽

Color Recognition ◽

Final Project ◽

Optical Character ◽

Simple Implementation ◽

Image Object

Template matching method is a simple and widely used method to recognize patterns. The weakness of this algorithm is the limited model that will be used as a template as a comparison in the database such as shape, size, and orientation. The Extraction Feature algorithm addresses the problem of template models such as the shape, size, and orientation that exist in the matching template algorithm by mapping the characteristics of the image object to be recognized. Optical character recognition is used to translate characters into digital images into text formats. Its simple implementation makes the template matching method widely used. In this final project discusses the introduction of color in an image to be detected color, this color recognition is not fully successful because of the influence of lightness. The workings of this application take picture is by taking a picture and then the application identifies the color of any existing and will issue results in the form of text percent, with a success rate of 15% and 85% failure when detecting a color.

Download Full-text

Corpus-based technique for improving Arabic OCR system

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v21.i1.pp233-241 ◽

2021 ◽

Vol 21 (1) ◽

pp. 233

Author(s):

Ahmed Hussain Aliwy ◽

Basheer Al-Sadawi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Language Model ◽

Arabic Language ◽

Document Images ◽

Statistical Language Model ◽

Text Document ◽

Optical Character ◽

Arabic Ocr

An optical character recognition (OCR) refers to a process of converting the text document images into editable and searchable text. OCR process poses several challenges in particular in the Arabic language due to it has caused a high percentage of errors. In this paper, a method, to improve the outputs of the Arabic Optical character recognition (AOCR) Systems is suggested based on a statistical language model built from the available huge corpora. This method includes detecting and correcting non-word and real words error according to the context of the word in the sentence. The results show that the percentage of improvement in the results is up to (98%) as a new accuracy for AOCR output.

Download Full-text

Construction of Statistical SVM based Recognition Model for Handwritten Character Recognition

Journal of Information Technology and Digital World - September 2019 ◽

10.36548/jitdw.2021.2.003 ◽

2021 ◽

Vol 3 (2) ◽

pp. 92-107

Author(s):

Yasir Babiker Hamdan ◽

Sathish

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Recognition Rate ◽

Banking System ◽

Developed Countries ◽

Support Vector ◽

Handwritten Character Recognition ◽

Research Article ◽

Handwritten Character

There are many applications of the handwritten character recognition (HCR) approach still exist. Reading postal addresses in various states contains different languages in any union government like India. Bank check amounts and signature verification is one of the important application of HCR in the automatic banking system in all developed countries. The optical character recognition of the documents is comparing with handwriting documents by a human. This OCR is used for translation purposes of characters from various types of files such as image, word document files. The main aim of this research article is to provide the solution for various handwriting recognition approaches such as touch input from the mobile screen and picture file. The recognition approaches performing with various methods that we have chosen in artificial neural networks and statistical methods so on and to address nonlinearly divisible issues. This research article consisting of various approaches to compare and recognize the handwriting characters from the image documents. Besides, the research paper is comparing statistical approach support vector machine (SVM) classifiers network method with statistical, template matching, structural pattern recognition, and graphical methods. It has proved Statistical SVM for OCR system performance that is providing a good result that is configured with machine learning approach. The recognition rate is higher than other methods mentioned in this research article. The proposed model has tested on a training section that contained various stylish letters and digits to learn with a higher accuracy level. We obtained test results of 91% of accuracy to recognize the characters from documents. Finally, we have discussed several future tasks of this research further.

Download Full-text

The Study of Plate Number Recognition for Parking Security System

International Journal of Advanced Technology in Mechanical, Mechatronics and Materials ◽

10.37869/ijatec.v1i3.34 ◽

2020 ◽

Vol 1 (3) ◽

pp. 100-107

Author(s):

Shakeeb M.A.N. Abdul Samad ◽

Fahri Heltha ◽

M. Faliq

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Moving Average ◽

Recognition System ◽

Binary Image ◽

Connected Components ◽

Horizontal Projection ◽

Bar Code ◽

Plate Number ◽

Number Recognition

Car Plate Number Recognition System is an important platform that can be used to identify a car vehicle identity. The Recognition System is based on image processing techniques and computer vision. A webcam is used to capture an image of the car plate number from different distance, and the identification is conducted through four processes of stages: Image Acquisition Pre-processing, Extraction, Segmentation, and Character Recognition. The Acquisition Pre-processing stage is extracted the region of interest of the image. The image is captured by live video of the webcam, then converted to grayscale and binary image. The Extraction stage is extracted the plate number characters from binary image using a connected components method. In the Segmentation stage is done by implementing horizontal projection as well as moving average filter. Lastly, in the Character Recognition, is used to identify the segmented characters of the plate number using optical character recognition. The proposed method is worked well for Malaysian's private cars plate number, and can be implemented in car park system to increase level of security of the system by confirming the bar code of the parking ticket and the plate number of the car at the incoming and outgoing gates.

Download Full-text

OPTICAL CHARACTER RECOGNITION MENGGUNAKAN ALGORITMA TEMPLATE MATCHING CORRELATION

JURNAL MASYARAKAT INFORMATIKA ◽

10.14710/jmasif.5.9.1-12 ◽

2015 ◽

Vol 5 (9) ◽

Author(s):

Suryo Hartanto ◽

Aris Sugiharto ◽

Sukmawati Nur Endah

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Optical Character

Download Full-text