Word and Chracter Segmentation in Devnagari and Odia Script – A Comparitive Analysis

Optical Character Recognition has been an active research area in computer science for several years. Several research works undertaken on various languages in India. In this paper an attempt has been made to find out the percentage of accuracy in word and character segmentation of Hindi (National language of India) and Odia is one of the Regional Language mostly spoken in Odisha and a few Eastern India states. A comparative article has been published under this article. 10 sets of each printed Odia and Devanagari scripts with different word limits were used in this study. The documents were scanned at 300dpi before adopting pre-processing and segmentation procedure. The result shows that the percentage of accuracy both in word and character segmentation is higher in Odia language as compared to Hindi language. One of the reasons is the use of headers line in Hindi which makes the segmentation process cumbersome. Thus, it can be concluded that the accuracy level can vary from one language to the other and from word segmentation to that of the character segmentation.

Download Full-text

Which OCR toolset is good and why? A comparative study

Kuwait Journal of Science ◽

10.48129/kjs.v48i2.9589 ◽

2021 ◽

Vol 48 (2) ◽

Author(s):

Pooja Jain ◽

◽

Dr. Kavita Taneja ◽

Dr. Harmunish Taneja ◽

◽

...

Keyword(s):

Comparative Study ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Research Area ◽

Real World Applications ◽

Banking Education ◽

Active Research ◽

Active Research Area ◽

Computational Technology

Optical Character Recognition (OCR) is a very active research area in many challenging fields like pattern recognition, natural language processing (NLP), computer vision, biomedical informatics, machine learning (ML), and artificial intelligence (AI). This computational technology extracts the text in an editable format (MS Word/Excel, text files, etc.) from PDF files, scanned or hand-written documents, images (photographs, advertisements, and alike), etc. for further processing and has been utilized in many real-world applications including banking, education, insurance, finance, healthcare and keyword-based search in documents, etc. Many OCR toolsets are available under various categories, including open-source, proprietary, and online services. This research paper provides a comparative study of various OCR toolsets considering a variety of parameters.

Download Full-text

Character Segmentation Of Degraded Odia Script

Asian Journal of Computer and Information Systems ◽

10.24203/ajcis.v8i2.6097 ◽

2020 ◽

Vol 8 (2) ◽

Author(s):

Ipsita Pattnaik ◽

Tushar Patnaik

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

West Bengal ◽

Input Image ◽

Character Segmentation ◽

Vertical Projection ◽

Projection Profile ◽

Regional Language ◽

Optical Character ◽

Line Segmentation

Optical Character Recognition (OCR) is a field which converts printed text into computer understandable format that is editable in nature. Odia is a regional language used in Odisha, West Bengal & Jharkhand. It is used by over forty million people and still counting. With such large dependency on a language makes it important, to preserve its script, get a digital editable version of odia script. We propose a framework that takes computer printed odia script image as an input & gives a computer readable & user editable format of same, which eventually recognizes the characters printed in input image. The system uses various techniques to improve the image & perform Line segmentation followed by word segmentation & finally character segmentation using horizontal & vertical projection profile.

Download Full-text

A Survey on Arabic Handwritten Script Recognition Systems

International Journal of Artificial Intelligence and Machine Learning ◽

10.4018/ijaiml.20210701.oa9 ◽

2021 ◽

Vol 11 (2) ◽

pp. 1-17

Author(s):

Soumia Djaghbellou ◽

Abderraouf Bouziane ◽

Abdelouahab Attia ◽

Zahid Akhtar

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Research Field ◽

Research Directions ◽

Optical Character ◽

On Line ◽

Recognition Systems ◽

Active Research ◽

Handwritten Arabic ◽

Open Issues

The optical character recognition (OCR) system is still an active research field in pattern recognition. Such systems can identify, recognize and distinguish electronically between characters and texts, printed or handwritten. They can also do a transformation of such data type into machine-processable form to facilitate the interaction between user and machine in various applications. In this paper, we present the global structure of an OCR system, with its types (on-line and off-line), categories (printed and handwritten) and its main steps. We also focused on off-line handwritten Arabic character recognition and provided a list of the main datasets publicly available. This paper also presents a survey of the works that have been carried out over recent years. Finally, some open issues and potential research directions have been highlighted

Download Full-text

Character Segmentation and Skew Correction for Handwritten Devanagari Scripts: A Friends Technique

Asian Journal of Engineering and Applied Technology ◽

10.51983/ajeat-2019.8.1.1060 ◽

2019 ◽

Vol 8 (1) ◽

pp. 50-54

Author(s):

Ashok Kumar Bathla . ◽

Sunil Kumar Gupta .

Keyword(s):

Human Brain ◽

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Novel Technique ◽

Skew Correction ◽

Optical Character ◽

Handwritten Text ◽

Scripting Language ◽

The Way

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.

Download Full-text

Barrier Access Control Using Sensors Platform and Vehicle License Plate Characters Recognition

Sensors ◽

10.3390/s19133015 ◽

2019 ◽

Vol 19 (13) ◽

pp. 3015 ◽

Cited By ~ 6

Author(s):

Farman Ullah ◽

Hafeez Anwar ◽

Iram Shahzadi ◽

Ata Ur Rehman ◽

Shizra Mehmood ◽

...

Keyword(s):

Access Control ◽

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

License Plate ◽

License Plate Recognition ◽

Ultrasonic Sensors ◽

Optical Character

The paper proposes a sensors platform to control a barrier that is installed for vehicles entrance. This platform is automatized by image-based license plate recognition of the vehicle. However, in situations where standardized license plates are not used, such image-based recognition becomes non-trivial and challenging due to the variations in license plate background, fonts and deformations. The proposed method first detects the approaching vehicle via ultrasonic sensors and, at the same time, captures its image via a camera installed along with the barrier. From this image, the license plate is automatically extracted and further processed to segment the license plate characters. Finally, these characters are recognized with the help of a standard optical character recognition (OCR) pipeline. The evaluation of the proposed system shows an accuracy of 98% for license plates extraction, 96% for character segmentation and 93% for character recognition.

Download Full-text

Optical Character Recognition based on Template Matching

Global Journal of Computer Science and Technology ◽

10.34257/gjcstcvol19is2pg31 ◽

2019 ◽

pp. 31-35

Author(s):

Md. Anwar Hossain ◽

Sadia Afrin

Keyword(s):

Character Recognition ◽

Template Matching ◽

Optical Character Recognition ◽

Research Area ◽

Text Document ◽

Optical Character ◽

Important Research Area ◽

System Prototype ◽

Text Images ◽

Image Format

This paper presents an innovative design for Optical Character Recognition (OCR) from text images by using the Template Matching method.OCR is an important research area and one of the most successful applications of technology in the field of pattern recognition and artificial intelligence.OCR provides full alphanumeric visualization of printed and handwritten characters by scanning text images and converts it into a corresponding editable text document. The main objective of this system prototype is to develop a prototype for the OCR system and to implement The Template Matching algorithm for provoking the system prototype. In this paper, we took alphabet (A-Z and a-z), and numbers (0-1), grayscale images, bitmap image format were used and recognized the alphabet and numbers by comparing between two images. Besides, we checked accuracy for different fonts of alphabet and numbers. Here we used Matlab R 2018 a software for the proper implementation of the system.

Download Full-text

AUTOMATICS NUMBER PLATE RECOGNITION USING CONVOLUTION NEURAL NETWORK

Azerbaijan Journal of High Performance Computing ◽

10.32010/26166127.2020.3.2.234.244 ◽

2020 ◽

Vol 3 (2) ◽

pp. 234-244

Author(s):

Siddhartha Roy ◽

Keyword(s):

Real Time ◽

Character Recognition ◽

Optical Character Recognition ◽

Rapid Development ◽

Red Light ◽

Training Needs ◽

Character Segmentation ◽

License Plate ◽

Training Set ◽

Optical Character

In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used for security, safety, and also commercial aspects such as parking control access, and legal steps for the red light violation, highway speed detection, and stolen vehicle detection. The license plate of any vehicle contains a number of numeric characters recognized by the computer. Each country in the world has specific characteristics of the license plate. Due to rapid development in the information system field, the previous manual license plate number writing process in the database is replaced by special intelligent device in a real-time environment. Several approaches and techniques are exploited to achieve better systems accuracy and real-time execution. It is a process of recognizing number plates using Optical Character Recognition (OCR) on images. This paper proposes a deep learning-based approach to detect and identify the Indian number plate automatically. It is based on new computer vision algorithms of both number plate detection and character segmentation. The training needs several images to obtain greater accuracy. Initially, we have developed a training set database by training different segmented characters. Several tests were done by varying the Epoch value to observe the change of accuracy. The accuracy is more than 95% that presents an acceptable value compared to related works, which is quite satisfactory and recognizes the blurred number plate.

Download Full-text

Segmentation of Handwritten Text Document Written in Devanagri Script for Simple character, skewed character and broken character

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v8i1.3427 ◽

2013 ◽

Vol 8 (1) ◽

pp. 686-691

Author(s):

Vneeta Rani ◽

Dr.Vijay Laxmi

Keyword(s):

Artificial Intelligence ◽

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Research Areas ◽

Text Document ◽

Optical Character ◽

Handwritten Text ◽

Recognition Phase ◽

Simple Character

OCR (optical character recognition) is a technology that is commonly used for recognizing patterns artificial intelligence & computer machine. With the help of OCR we can convert scanned document into editable documents which can be further used in various research areas. In this paper, we are presenting a character segmentation technique that can segment simple characters, skewed characters as well as broken characters. Character segmentation is very important phase in any OCR process because output of this phase will be served as input to various other phase like character recognition phase etc. If there is some problem in character segmentation phase then recognition of the corresponding character is very difficult or nearly impossible.

Download Full-text

Line Segmentation Challenges in Tamil Language Palm Leaf Manuscripts

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3159.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2363-2367

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Binary Images ◽

Optical Character ◽

Colour Image ◽

Segmentation Methods ◽

Segmentation Algorithms ◽

Line Segmentation ◽

Palm Leaf

The process of an Optical Character Recognition (OCR) for ancient hand written documents or palm leaf manuscripts is done by means of four phases. The four phases are ‘line segmentation’, ‘word segmentation’, ‘character segmentation’, and ‘character recognition’. The colour image of palm leaf manuscripts are changed into binary images by using various pre-processing methods. The first phase of an OCR might break through the hurdles of touching lines and overlapping lines. The character recognition becomes futile when the line segmentation is erroneous. In Tamil language palm leaf manuscript recognition, there are only a handful of line segmentation methods. Moreover, the available methods are not viable to meet the required standards. This article is proposed to fill the lacuna in terms of the methods necessary for line segmentation in Tamil language document analysis. The method proposed compares its efficiency with the line segmentation algorithms work on binary images such as the Adaptive Partial Projection (APP) and A* Path Planning (A*PP). The tools and criteria of evaluation metrics are measured from ICDAR 2013 Handwriting Segmentation Contest.

Download Full-text

Kurdish Optical Character Recognition

UKH Journal of Science and Engineering ◽

10.25079/ukhjse.v2n1y2018.pp18-27 ◽

2018 ◽

Vol 2 (1) ◽

pp. 18-27

Author(s):

Rasty Yaseen ◽

Hossein Hassani

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Accuracy Rate ◽

Optical Character ◽

Arabic Script

Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average.

Download Full-text