scholarly journals Line Segmentation Challenges in Tamil Language Palm Leaf Manuscripts

The process of an Optical Character Recognition (OCR) for ancient hand written documents or palm leaf manuscripts is done by means of four phases. The four phases are ‘line segmentation’, ‘word segmentation’, ‘character segmentation’, and ‘character recognition’. The colour image of palm leaf manuscripts are changed into binary images by using various pre-processing methods. The first phase of an OCR might break through the hurdles of touching lines and overlapping lines. The character recognition becomes futile when the line segmentation is erroneous. In Tamil language palm leaf manuscript recognition, there are only a handful of line segmentation methods. Moreover, the available methods are not viable to meet the required standards. This article is proposed to fill the lacuna in terms of the methods necessary for line segmentation in Tamil language document analysis. The method proposed compares its efficiency with the line segmentation algorithms work on binary images such as the Adaptive Partial Projection (APP) and A* Path Planning (A*PP). The tools and criteria of evaluation metrics are measured from ICDAR 2013 Handwriting Segmentation Contest.

Author(s):  
P. Soujanya ◽  
Vijaya Kumar Koppula ◽  
Kishore Gaddam

Segmentation of text lines is one of the important steps in the Optical Character Recognition system. Text Line Segmentation is pre-processing step of word and character segmentation. Text Line Segmentation can be viewed simple for printing documents which contains distinct spaces between the lines. And it is more complex for the documents where text lines are overlap, touch, curvilinear and variation of space between text lines like in Telugu scripts and skewed documents. The main objective of this project is to investigate different text line segmentation algorithms like Projection Profiles, Run length smearing and Adaptive Run length smearing on low quality documents. These methods are experimented and compare their accuracy and results.


Author(s):  
Ipsita Pattnaik ◽  
Tushar Patnaik

Optical Character Recognition (OCR) is a field which converts printed text into computer understandable format that is editable in nature. Odia is a regional language used in Odisha, West Bengal & Jharkhand. It is used by over forty million people and still counting. With such large dependency on a language makes it important, to preserve its script, get a digital editable version of odia script. We propose a framework that takes computer printed odia script image as an input & gives a computer readable & user editable format of same, which eventually recognizes the characters printed in input image. The system uses various techniques to improve the image & perform Line segmentation followed by word segmentation & finally character segmentation using horizontal & vertical projection profile.


1994 ◽  
Vol 04 (01) ◽  
pp. 193-207 ◽  
Author(s):  
VADIM BIKTASHEV ◽  
VALENTIN KRINSKY ◽  
HERMANN HAKEN

The possibility of using nonlinear media as a highly parallel computation tool is discussed, specifically for image classification and recognition. Some approaches of this type are known, that are based on stationary dissipative structures which can “measure” scalar products of images. In this paper, we exploit the analogy between binary images and point sets, and use the Hausdorff metrics for comparing the images. It does not require the measure at all, and is based only on the metrics of the space whose subsets we consider. In addition to Hausdorff distance, we suggest a new “nonlinear” version of this distance for comparison of images, called “autowave” distance. This distance can be calculated very easily and yields some additional advantages for pattern recognition (e.g. noise tolerance). The method was illustrated for the problem of machine reading (Optical Character Recognition). It was compared with some famous OCR programs for PC. On a medium quality xerocopy of a journal page, in the same conditions of learning and recognition, the autowave approach resulted in much fewer mistakes. The method can be realized using only one chip with simple uniform connection of the elements. In this case, it yields an increase in computation speed of several orders of magnitude.


Optical Character Recognition has been an active research area in computer science for several years. Several research works undertaken on various languages in India. In this paper an attempt has been made to find out the percentage of accuracy in word and character segmentation of Hindi (National language of India) and Odia is one of the Regional Language mostly spoken in Odisha and a few Eastern India states. A comparative article has been published under this article. 10 sets of each printed Odia and Devanagari scripts with different word limits were used in this study. The documents were scanned at 300dpi before adopting pre-processing and segmentation procedure. The result shows that the percentage of accuracy both in word and character segmentation is higher in Odia language as compared to Hindi language. One of the reasons is the use of headers line in Hindi which makes the segmentation process cumbersome. Thus, it can be concluded that the accuracy level can vary from one language to the other and from word segmentation to that of the character segmentation.


2019 ◽  
Vol 8 (1) ◽  
pp. 50-54
Author(s):  
Ashok Kumar Bathla . ◽  
Sunil Kumar Gupta .

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 3015 ◽  
Author(s):  
Farman Ullah ◽  
Hafeez Anwar ◽  
Iram Shahzadi ◽  
Ata Ur Rehman ◽  
Shizra Mehmood ◽  
...  

The paper proposes a sensors platform to control a barrier that is installed for vehicles entrance. This platform is automatized by image-based license plate recognition of the vehicle. However, in situations where standardized license plates are not used, such image-based recognition becomes non-trivial and challenging due to the variations in license plate background, fonts and deformations. The proposed method first detects the approaching vehicle via ultrasonic sensors and, at the same time, captures its image via a camera installed along with the barrier. From this image, the license plate is automatically extracted and further processed to segment the license plate characters. Finally, these characters are recognized with the help of a standard optical character recognition (OCR) pipeline. The evaluation of the proposed system shows an accuracy of 98% for license plates extraction, 96% for character segmentation and 93% for character recognition.


2021 ◽  
Vol 11 (6) ◽  
pp. 7968-7973
Author(s):  
M. Kazmi ◽  
F. Yasir ◽  
S. Habib ◽  
M. S. Hayat ◽  
S. A. Qazi

Urdu Optical Character Recognition (OCR) based on character level recognition (analytical approach) is less popular as compared to ligature level recognition (holistic approach) due to its added complexity, characters and strokes overlapping. This paper presents a holistic approach Urdu ligature extraction technique. The proposed Photometric Ligature Extraction (PLE) technique is independent of font size and column layout and is capable to handle non-overlapping and all inter and intra overlapping ligatures. It uses a customized photometric filter along with the application of X-shearing and padding with connected component analysis, to extract complete ligatures instead of extracting primary and secondary ligatures separately. A total of ~ 2,67,800 ligatures were extracted from scanned Urdu Nastaliq printed text images with an accuracy of 99.4%. Thus, the proposed framework outperforms the existing Urdu Nastaliq text extraction and segmentation algorithms. The proposed PLE framework can also be applied to other languages using the Nastaliq script style, languages such as Arabic, Persian, Pashto, and Sindhi.


An optimality of an automatic character recognition for Tamil palm leaf manuscripts can be achieved only by an efficient segmentation of touching characters. In this article, the touching characters are segmented as a single character to achieve an optimum solution by the recognizer in Optical Character Recognition (OCR). The proposed method provides a novelty in touching character segmentation of Tamil palm leaf manuscripts. An initial process of separation of background image and foreground characters is applied on the palm leaf images by filtering and removing unwanted pieces of characters by noise removal methods. The thickening process overcomes the difficulty of small breakages in the characters. The aspect ratio of the character image can be used to categorize the character such as single or multi touching. Single touching is divided by yet another ways such as horizontal or vertical touching. Finally, the proposed algorithm for Horizontal and Vertical character segmentation named as HorVer method is applied on the horizontally and vertically touching characters to segment as independent character. Experimental result produces 91% of an accuracy on segmenting the touching characters in Tamil palm leaf manuscript images collected from various resources and Tamil Heritage Foundation (THF). A novelty method can be achieved in Tamil touching character segmentation by the proposed algorithm.


1993 ◽  
Vol 5 (6) ◽  
pp. 885-892 ◽  
Author(s):  
Jeffrey N. Kidder ◽  
Daniel Seligson

We describe a hardware solution to a high-speed optical character recognition (OCR) problem. Noisy 15 × 10 binary images of machine written digits were processed and applied as input to Intel's Electrically Trainable Analog Neural Network (ETANN). In software simulation, we trained an 80 × 54 × 10 feedforward network using a modified version of backprop. We then downloaded the synaptic weights of the trained network to ETANN and tweaked them to account for differences between the simulation and the chip itself. The best recognition error rate was 0.9% in hardware with a 3.7% rejection rate on a 1000-character test set.


Sign in / Sign up

Export Citation Format

Share Document