scholarly journals Kurdish Optical Character Recognition

2018 ◽  
Vol 2 (1) ◽  
pp. 18-27
Author(s):  
Rasty Yaseen ◽  
Hossein Hassani

Currently, no offline tool is available for Optical Character Recognition (OCR) in Kurdish. Kurdish is spoken in different dialects and uses several scripts for writing. The Persian/Arabic script is widely used among these dialects. The Persian/Arabic script is written from Right to Left (RTL), it is cursive, and it uses unique diacritics. These features, particularly the last two, affect the segmentation stage in developing a Kurdish OCR. In this article, we introduce an enhanced character segmentation based method which addresses the mentioned characteristics. We applied the method to text-only images and tested the Kurdish OCR using documents of different fonts, font sizes, and image resolutions. The results of the experiments showed that the accuracy rate of character recognition of the proposed method was 90.82% on average.

Optical Character Recognition has been an active research area in computer science for several years. Several research works undertaken on various languages in India. In this paper an attempt has been made to find out the percentage of accuracy in word and character segmentation of Hindi (National language of India) and Odia is one of the Regional Language mostly spoken in Odisha and a few Eastern India states. A comparative article has been published under this article. 10 sets of each printed Odia and Devanagari scripts with different word limits were used in this study. The documents were scanned at 300dpi before adopting pre-processing and segmentation procedure. The result shows that the percentage of accuracy both in word and character segmentation is higher in Odia language as compared to Hindi language. One of the reasons is the use of headers line in Hindi which makes the segmentation process cumbersome. Thus, it can be concluded that the accuracy level can vary from one language to the other and from word segmentation to that of the character segmentation.


2019 ◽  
Vol 8 (1) ◽  
pp. 50-54
Author(s):  
Ashok Kumar Bathla . ◽  
Sunil Kumar Gupta .

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 3015 ◽  
Author(s):  
Farman Ullah ◽  
Hafeez Anwar ◽  
Iram Shahzadi ◽  
Ata Ur Rehman ◽  
Shizra Mehmood ◽  
...  

The paper proposes a sensors platform to control a barrier that is installed for vehicles entrance. This platform is automatized by image-based license plate recognition of the vehicle. However, in situations where standardized license plates are not used, such image-based recognition becomes non-trivial and challenging due to the variations in license plate background, fonts and deformations. The proposed method first detects the approaching vehicle via ultrasonic sensors and, at the same time, captures its image via a camera installed along with the barrier. From this image, the license plate is automatically extracted and further processed to segment the license plate characters. Finally, these characters are recognized with the help of a standard optical character recognition (OCR) pipeline. The evaluation of the proposed system shows an accuracy of 98% for license plates extraction, 96% for character segmentation and 93% for character recognition.


2018 ◽  
Vol 9 (1) ◽  
pp. 28-44
Author(s):  
Urmila Shrawankar ◽  
Shruti Gedam

Finger spelling in air helps user to operate a computer in order to make human interaction easier and faster than keyboard and touch screen. This article presents a real-time video based system which recognizes the English alphabets and words written in air using finger movements only. Optical Character Recognition (OCR) is used for recognition which is trained using more than 500 various shapes and styles of all alphabets. This system works with different light situations and adapts automatically to various changing conditions; and gives a natural way of communicating where no extra hardware is used other than system camera and a bright color tape. Also, this system does not restrict writing speed and color of tape. Overall, this system achieves an average accuracy rate of character recognition for all alphabets of 94.074%. It is concluded that this system is very useful for communication with deaf and dumb people.


2019 ◽  
Vol 34 (Supplement_1) ◽  
pp. i135-i141
Author(s):  
So Miyagawa ◽  
Kirill Bulert ◽  
Marco Büchler ◽  
Heike Behlmer

Abstract Digital Humanities (DH) within Coptic Studies, an emerging field of development, will be much aided by the digitization of large quantities of typeset Coptic texts. Until recently, the only Optical Character Recognition (OCR) analysis of printed Coptic texts had been executed by Moheb S. Mekhaiel, who used the Tesseract program to create a text model for liturgical books in the Bohairic dialect of Coptic. However, this model is not suitable for the many scholarly editions of texts in the Sahidic dialect of Coptic which use noticeably different fonts. In the current study, DH and Coptological projects based in Göttingen, Germany, collaborated to develop a new Coptic OCR pipeline suitable for use with all Coptic dialects. The objective of the study was to generate a model which can facilitate digital Coptic Studies and produce Coptic corpora from existing printed texts. First, we compared the two available OCR programs that can recognize Coptic: Tesseract and Ocropy. The results indicated that the neural network model, i.e. Ocropy, performed better at recognizing the letters with supralinear strokes that characterize the published Sahidic texts. After training Ocropy for Coptic using artificial neural networks, the team achieved an accuracy rate of >91% for the OCR analysis of Coptic typeset. We subsequently compared the efficiency of Ocropy to that of manual transcribing and concluded that the use of Ocropy to extract Coptic from digital images of printed texts is highly beneficial to Coptic DH.


Author(s):  
Ipsita Pattnaik ◽  
Tushar Patnaik

Optical Character Recognition (OCR) is a field which converts printed text into computer understandable format that is editable in nature. Odia is a regional language used in Odisha, West Bengal & Jharkhand. It is used by over forty million people and still counting. With such large dependency on a language makes it important, to preserve its script, get a digital editable version of odia script. We propose a framework that takes computer printed odia script image as an input & gives a computer readable & user editable format of same, which eventually recognizes the characters printed in input image. The system uses various techniques to improve the image & perform Line segmentation followed by word segmentation & finally character segmentation using horizontal & vertical projection profile.


2020 ◽  
Vol 3 (2) ◽  
pp. 234-244
Author(s):  
Siddhartha Roy ◽  

In the last few years, Automatic Number Plate Recognition (ANPR) systems have become widely used for security, safety, and also commercial aspects such as parking control access, and legal steps for the red light violation, highway speed detection, and stolen vehicle detection. The license plate of any vehicle contains a number of numeric characters recognized by the computer. Each country in the world has specific characteristics of the license plate. Due to rapid development in the information system field, the previous manual license plate number writing process in the database is replaced by special intelligent device in a real-time environment. Several approaches and techniques are exploited to achieve better systems accuracy and real-time execution. It is a process of recognizing number plates using Optical Character Recognition (OCR) on images. This paper proposes a deep learning-based approach to detect and identify the Indian number plate automatically. It is based on new computer vision algorithms of both number plate detection and character segmentation. The training needs several images to obtain greater accuracy. Initially, we have developed a training set database by training different segmented characters. Several tests were done by varying the Epoch value to observe the change of accuracy. The accuracy is more than 95% that presents an acceptable value compared to related works, which is quite satisfactory and recognizes the blurred number plate.


2020 ◽  
Vol 8 (5) ◽  
pp. 5665-5674

Optical Character Recognition has emerged as an attractive research field nowadays. Lot of work has been done in Urdu script based on various approaches and diverse methodologies have been put forward based on Nastaliq font style. Urdu is written diagonally from top to bottom, the style known as Nastaliq. This feature of Nastaliq makes Urdu highly cursive and more sensitive leading to a difficult recognition problem. Due to the peculiarities of Nastaliq Style of writing, we have chosen ligature as a basic unit of recognition in order to reduce the complexity of system. The accuracy rate of recognizing ligature in Urdu text corresponds to the efficiency with which the ligatures are segmented. In addition to extracting connected components, the ligature segmentation takes into consideration various factors like baseline information, height, width, and centroid. In this paper ligature Recognition is performed by using multi-SVM (Sup-port Vector Machine) approach which gives an accuracy of 97% when 903 text images are fed to it.


2018 ◽  
Vol 1 (1) ◽  
pp. 39-46
Author(s):  
Önder ÖZBEK

The Ottoman alphabet was used as a writing language in Turkish with Arabic letters. The Turkish alphabet is used as a writing language in Turkish with Latin letters. There are numerous documents written in the Ottoman alphabet in the archives. In this study, the image of the words written in the Ottoman alphabet was converted into editable text by optical character recognition method. In this way the words are translated into text. Later, the characters in these words were made understandable by using their equivalents in the Turkish alphabet. It was tried to increase the accuracy rate by comparing the words translated into Turkish alphabet and the table where Turkish words were found. An algorithm that gives a similarity value is used for the comparison process.


Sign in / Sign up

Export Citation Format

Share Document