A Structured Method for the Recognition of Complex Historical Tables

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.

Download Full-text

Corpus Linguistics and Eighteenth Century Collections Online (ECCO)

Research in Corpus Linguistics ◽

10.32714/ricl.09.01.03 ◽

2021 ◽

Vol 9 (1) ◽

pp. 19-34

Author(s):

Mikko Tolonen ◽

Eetu Mäkelä ◽

Ali Ijaz ◽

Leo Lahti

Keyword(s):

Eighteenth Century ◽

Corpus Linguistics ◽

Character Recognition ◽

Optical Character Recognition ◽

Historical Source ◽

Optical Character ◽

Key Aspects ◽

Machine Readable ◽

Machine Readable Form

Eighteenth Century Collections Online (ECCO) is the most comprehensive dataset available in machine-readable form for eighteenth-century printed texts. It plays a crucial role in studies of eighteenth-century language and it has vast potential for corpus linguistics. At the same time, it is an unbalanced corpus that poses a series of different problems. The aim of this paper is to offer a general overview of ECCO for corpus linguistics by analysing, for example, its publication countries and languages. We will also analyse the role of the substantial number of reprints and new editions in the data, discuss genres and the estimates of Optical Character Recognition (OCR) quality. Our conclusion is that whereas ECCO provides a valuable source for corpus linguistics, scholars need to pay attention to historical source criticism. We have highlighted key aspects that need to be taken into consideration when considering its possible uses.

Download Full-text

Optical character recognition (OCR) using partial least square (PLS) based feature reduction: an application to artificial intelligence for biometric identification

Journal of Enterprise Information Management ◽

10.1108/jeim-02-2020-0076 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Zainab Akhtar ◽

Jong Weon Lee ◽

Muhammad Attique Khan ◽

Muhammad Sharif ◽

Sajid Ali Khan ◽

...

Keyword(s):

Artificial Intelligence ◽

Character Recognition ◽

Optical Character Recognition ◽

Feature Reduction ◽

Partial Least Square ◽

Least Square ◽

License Plate ◽

Content Type ◽

Optical Character ◽

Machine Readable

PurposeIn artificial intelligence, the optical character recognition (OCR) is an active research area based on famous applications such as automation and transformation of printed documents into machine-readable text document. The major purpose of OCR in academia and banks is to achieve a significant performance to save storage space.Design/methodology/approachA novel technique is proposed for automated OCR based on multi-properties features fusion and selection. The features are fused using serially formulation and output passed to partial least square (PLS) based selection method. The selection is done based on the entropy fitness function. The final features are classified by an ensemble classifier.FindingsThe presented method was extensively tested on two datasets such as the authors proposed and Chars74k benchmark and achieved an accuracy of 91.2 and 99.9%. Comparing the results with existing techniques, it is found that the proposed method gives improved performance.Originality/valueThe technique presented in this work will help for license plate recognition and text conversion from a printed document to machine-readable.

Download Full-text

Optical Character Recognition System for Nastalique Urdu-Like Script Languages Using Supervised Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419530045 ◽

2019 ◽

Vol 33 (10) ◽

pp. 1953004 ◽

Cited By ~ 2

Author(s):

S. S. R. Rizvi ◽

A. Sagheer ◽

K. Adnan ◽

A. Muhammad

Keyword(s):

Supervised Learning ◽

Character Recognition ◽

Optical Character Recognition ◽

Complex Structure ◽

Recognition Rate ◽

Digital Text ◽

Memory Space ◽

Optical Character ◽

Digital Format ◽

Printed Text

There are two main techniques to convert written or printed text into digital format. The first technique is to create an image of written/printed text, but images are large in size so they require huge memory space to store, as well as text in image form cannot be undergo further processes like edit, search, copy, etc. The second technique is to use an Optical Character Recognition (OCR) system. OCR’s can read documents and convert manual text documents into digital text and this digital text can be processed to extract knowledge. A huge amount of Urdu language’s data is available in handwritten or in printed form that needs to be converted into digital format for knowledge acquisition. Highly cursive, complex structure, bi-directionality, and compound in nature, etc. make the Urdu language too complex to obtain accurate OCR results. In this study, supervised learning-based OCR system is proposed for Nastalique Urdu language. The proposed system evaluations under a variety of experimental settings apprehend 98.4% training results and 97.3% test results, which is the highest recognition rate ever achieved by any Urdu language OCR system. The proposed system is simple to implement especially in software front of OCR system also the proposed technique is useful for printed text as well as handwritten text and it will help in developing more accurate Urdu OCR’s software systems in the future.

Download Full-text

Spatio Partitioning of Character Image for Automatic Recognition of Digits

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6650.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8171-8177

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Fast Response ◽

Software Systems ◽

Decision Tree Classifier ◽

Optical Character ◽

Tree Classifier ◽

Spatially Distributed ◽

Machine Readable ◽

Machine Readable Form

In the running word, there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we have number of newspapers and books which are in printed format related to different subjects the current capacity to translate paper documents quickly and accurately into machine readable form using optical character recognition technology augments the opportunities in document searching and storing as well as automated documents processing. A fast response in translating large collections of image-based electronic documents into structured electronics documents is still a problem. As an enhancement to the optical character recognition [1] (OCR) technology, I would like to propose a framework that recognize a printed digits in the character image using “spatio partitioning method”. The proposed system is efficiently recognize the digits from 0 to 9 different font size based on the new concept of feature extraction and which is classified under decision tree classifier, efficiency and time complexity of the proposed system also described. Partitioning is based on the pixel distribution of the character image; the pixel distribution describes the patter of the characters that is by spatially distributed foreground pixel.

Download Full-text

Feature Extraction Techniques Based on Swarm Intelligence in OCR

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2480.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 13-19

Keyword(s):

Feature Extraction ◽

Character Recognition ◽

Optical Character Recognition ◽

Feature Vector ◽

Extraction Techniques ◽

Optical Character ◽

Readable Form ◽

Machine Readable ◽

Machine Readable Form ◽

Recognition Technique

Optical Character Recognition is a most recent field in area of pattern recognition and machine learning in last decade. In this article, the suitable techniques are designated for better character recognition in document into machine readable form. It is belonging with Content Based Image Retrieval (CBIR) system, which solve the delinquent of searching images in huge dataset. The recognition technique of handwritten character is not developed efficiently till, because of variations in size, shape, style, slats etc. in writing skill of human being. To overcome such problems, the part of concentration is feature extraction and algorithm that take care of such variation. In this paper independent component analysis is used for extracting features. For feature vector selection particle swarm optimization and firefly algorithms are applied. It is observed that due to distributed neighborhood pixel of an image, the PSO gives better recognition rates.

Download Full-text

SELECTION TECHNIQUE FOR MULTIPLE OUTPUTS OF OPTICAL CHARACTER RECOGNITION

Eurasian Journal of Mathematical and Computer Applications ◽

10.32523/2306-6172-2020-8-2-41-51 ◽

2020 ◽

Vol 8 (2) ◽

pp. 41-51

Author(s):

I.Q. Habeeb ◽

Z.Q. Al-Zaydi ◽

H.N. Abdulkhudhur

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Selection Technique ◽

Multiple Outputs ◽

Optical Character

Download Full-text

CNN-based Rain Reduction in Street View Images

London Imaging Meeting ◽

10.2352/issn.2694-118x.2020.lim-12 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 78-81

Author(s):

Simone Zini ◽

Simone Bianco ◽

Raimondo Schettini

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

State Of The Art ◽

Weather Conditions ◽

Specific Interest ◽

Optical Character ◽

Street View ◽

In The Wild ◽

Bad Weather ◽

Detection And Recognition

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.

Download Full-text

ANALYZING DIFFERENT ALGORITHMS AND TECHNIQUES TO FIND OPTICAL CHARACTER RECOGNITION FOR TAMIL SCRIPTS

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.2020.02.00029 ◽

2020 ◽

Vol 15 (2) ◽

Author(s):

Rajkumar N

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character

Download Full-text

Automated Toll Booth using Optical Character Recognition and RFID System

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2019/11842019 ◽

2019 ◽

Vol 8 (1.4) ◽

pp. 1056-1061

Author(s):

Saurabh Ravindra Kulkarn ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Rfid System ◽

Toll Booth

Download Full-text

Pengantar dan Survey Tentang Optical Music Recognition

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.331 ◽

2014 ◽

Vol 6 (1) ◽

pp. 36-39

Author(s):

Kevin Purwito

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Music Recognition ◽

Optical Character ◽

Digital Format ◽

Music Recognition ◽

Index Terms ◽

The Many ◽

Further Development ◽

Music Symbol

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer

Download Full-text