scholarly journals Word and Chracter Segmentation in Devnagari and Odia Script – A Comparitive Analysis

Optical Character Recognition has been an active research area in computer science for several years. Several research works undertaken on various languages in India. In this paper an attempt has been made to find out the percentage of accuracy in word and character segmentation of Hindi (National language of India) and Odia is one of the Regional Language mostly spoken in Odisha and a few Eastern India states. A comparative article has been published under this article. 10 sets of each printed Odia and Devanagari scripts with different word limits were used in this study. The documents were scanned at 300dpi before adopting pre-processing and segmentation procedure. The result shows that the percentage of accuracy both in word and character segmentation is higher in Odia language as compared to Hindi language. One of the reasons is the use of headers line in Hindi which makes the segmentation process cumbersome. Thus, it can be concluded that the accuracy level can vary from one language to the other and from word segmentation to that of the character segmentation.

2021 ◽  
Vol 48 (2) ◽  
Author(s):  
Pooja Jain ◽  
◽  
Dr. Kavita Taneja ◽  
Dr. Harmunish Taneja ◽  
◽  
...  

Optical Character Recognition (OCR) is a very active research area in many challenging fields like pattern recognition, natural language processing (NLP), computer vision, biomedical informatics, machine learning (ML), and artificial intelligence (AI). This computational technology extracts the text in an editable format (MS Word/Excel, text files, etc.) from PDF files, scanned or hand-written documents, images (photographs, advertisements, and alike), etc. for further processing and has been utilized in many real-world applications including banking, education, insurance, finance, healthcare and keyword-based search in documents, etc. Many OCR toolsets are available under various categories, including open-source, proprietary, and online services. This research paper provides a comparative study of various OCR toolsets considering a variety of parameters.


Author(s):  
Ipsita Pattnaik ◽  
Tushar Patnaik

Optical Character Recognition (OCR) is a field which converts printed text into computer understandable format that is editable in nature. Odia is a regional language used in Odisha, West Bengal & Jharkhand. It is used by over forty million people and still counting. With such large dependency on a language makes it important, to preserve its script, get a digital editable version of odia script. We propose a framework that takes computer printed odia script image as an input & gives a computer readable & user editable format of same, which eventually recognizes the characters printed in input image. The system uses various techniques to improve the image & perform Line segmentation followed by word segmentation & finally character segmentation using horizontal & vertical projection profile.


Author(s):  
Soumia Djaghbellou ◽  
Abderraouf Bouziane ◽  
Abdelouahab Attia ◽  
Zahid Akhtar

The optical character recognition (OCR) system is still an active research field in pattern recognition. Such systems can identify, recognize and distinguish electronically between characters and texts, printed or handwritten. They can also do a transformation of such data type into machine-processable form to facilitate the interaction between user and machine in various applications. In this paper, we present the global structure of an OCR system, with its types (on-line and off-line), categories (printed and handwritten) and its main steps. We also focused on off-line handwritten Arabic character recognition and provided a list of the main datasets publicly available. This paper also presents a survey of the works that have been carried out over recent years. Finally, some open issues and potential research directions have been highlighted


2019 ◽  
Vol 8 (1) ◽  
pp. 50-54
Author(s):  
Ashok Kumar Bathla . ◽  
Sunil Kumar Gupta .

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 3015 ◽  
Author(s):  
Farman Ullah ◽  
Hafeez Anwar ◽  
Iram Shahzadi ◽  
Ata Ur Rehman ◽  
Shizra Mehmood ◽  
...  

The paper proposes a sensors platform to control a barrier that is installed for vehicles entrance. This platform is automatized by image-based license plate recognition of the vehicle. However, in situations where standardized license plates are not used, such image-based recognition becomes non-trivial and challenging due to the variations in license plate background, fonts and deformations. The proposed method first detects the approaching vehicle via ultrasonic sensors and, at the same time, captures its image via a camera installed along with the barrier. From this image, the license plate is automatically extracted and further processed to segment the license plate characters. Finally, these characters are recognized with the help of a standard optical character recognition (OCR) pipeline. The evaluation of the proposed system shows an accuracy of 98% for license plates extraction, 96% for character segmentation and 93% for character recognition.


Author(s):  
Md. Anwar Hossain ◽  
Sadia Afrin

This paper presents an innovative design for Optical Character Recognition (OCR) from text images by using the Template Matching method.OCR is an important research area and one of the most successful applications of technology in the field of pattern recognition and artificial intelligence.OCR provides full alphanumeric visualization of printed and handwritten characters by scanning text images and converts it into a corresponding editable text document. The main objective of this system prototype is to develop a prototype for the OCR system and to implement The Template Matching algorithm for provoking the system prototype. In this paper, we took alphabet (A-Z and a-z), and numbers (0-1), grayscale images, bitmap image format were used and recognized the alphabet and numbers by comparing between two images. Besides, we checked accuracy for different fonts of alphabet and numbers. Here we used Matlab R 2018 a software for the proper implementation of the system.


2020 ◽  
Vol 3 (2) ◽  
pp. 234-244
Author(s):  
Siddhartha Roy ◽  

In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used for security, safety, and also commercial aspects such as parking control access, and legal steps for the red light violation, highway speed detection, and stolen vehicle detection. The license plate of any vehicle contains a number of numeric characters recognized by the computer. Each country in the world has specific characteristics of the license plate. Due to rapid development in the information system field, the previous manual license plate number writing process in the database is replaced by special intelligent device in a real-time environment. Several approaches and techniques are exploited to achieve better systems accuracy and real-time execution. It is a process of recognizing number plates using Optical Character Recognition (OCR) on images. This paper proposes a deep learning-based approach to detect and identify the Indian number plate automatically. It is based on new computer vision algorithms of both number plate detection and character segmentation. The training needs several images to obtain greater accuracy. Initially, we have developed a training set database by training different segmented characters. Several tests were done by varying the Epoch value to observe the change of accuracy. The accuracy is more than 95% that presents an acceptable value compared to related works, which is quite satisfactory and recognizes the blurred number plate.


2013 ◽  
Vol 8 (1) ◽  
pp. 686-691
Author(s):  
Vneeta Rani ◽  
Dr.Vijay Laxmi

OCR (optical character recognition) is a technology that is commonly used for recognizing patterns artificial intelligence & computer machine. With the help of OCR we can convert scanned document into editable documents which can be further used in various research areas. In this paper, we are presenting a character segmentation technique that can segment simple characters, skewed characters as well as broken characters. Character segmentation is very important phase in any OCR process because output of this phase will be served as input to various other phase like character recognition phase etc. If there is some problem in character segmentation phase then recognition of the corresponding character is very difficult or nearly impossible.


The process of an Optical Character Recognition (OCR) for ancient hand written documents or palm leaf manuscripts is done by means of four phases. The four phases are ‘line segmentation’, ‘word segmentation’, ‘character segmentation’, and ‘character recognition’. The colour image of palm leaf manuscripts are changed into binary images by using various pre-processing methods. The first phase of an OCR might break through the hurdles of touching lines and overlapping lines. The character recognition becomes futile when the line segmentation is erroneous. In Tamil language palm leaf manuscript recognition, there are only a handful of line segmentation methods. Moreover, the available methods are not viable to meet the required standards. This article is proposed to fill the lacuna in terms of the methods necessary for line segmentation in Tamil language document analysis. The method proposed compares its efficiency with the line segmentation algorithms work on binary images such as the Adaptive Partial Projection (APP) and A* Path Planning (A*PP). The tools and criteria of evaluation metrics are measured from ICDAR 2013 Handwriting Segmentation Contest.


2018 ◽  
Vol 2 (1) ◽  
pp. 18-27
Author(s):  
Rasty Yaseen ◽  
Hossein Hassani

Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average.


Sign in / Sign up

Export Citation Format

Share Document