Optical character recognition of typeset Coptic text with neural networks

Abstract Digital Humanities (DH) within Coptic Studies, an emerging field of development, will be much aided by the digitization of large quantities of typeset Coptic texts. Until recently, the only Optical Character Recognition (OCR) analysis of printed Coptic texts had been executed by Moheb S. Mekhaiel, who used the Tesseract program to create a text model for liturgical books in the Bohairic dialect of Coptic. However, this model is not suitable for the many scholarly editions of texts in the Sahidic dialect of Coptic which use noticeably different fonts. In the current study, DH and Coptological projects based in Göttingen, Germany, collaborated to develop a new Coptic OCR pipeline suitable for use with all Coptic dialects. The objective of the study was to generate a model which can facilitate digital Coptic Studies and produce Coptic corpora from existing printed texts. First, we compared the two available OCR programs that can recognize Coptic: Tesseract and Ocropy. The results indicated that the neural network model, i.e. Ocropy, performed better at recognizing the letters with supralinear strokes that characterize the published Sahidic texts. After training Ocropy for Coptic using artificial neural networks, the team achieved an accuracy rate of >91% for the OCR analysis of Coptic typeset. We subsequently compared the efficiency of Ocropy to that of manual transcribing and concluded that the use of Ocropy to extract Coptic from digital images of printed texts is highly beneficial to Coptic DH.

Download Full-text

Bus Attendance System using Optical Character Recognition

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d7732.049420 ◽

2020 ◽

Vol 9 (4) ◽

pp. 2133-2136

Keyword(s):

Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

License Plate ◽

License Plate Recognition ◽

Surveillance Camera ◽

Optical Character ◽

The Neural Network ◽

Character Extraction ◽

Image Dataset

In today’s world managing the records of attendance of staffs, students, employee or bus is a tedious task. This project focuses on automating the bus attendance process through vehicle license plate recognition. As, the license plate is a feature that is peculiar to every vehicle, it would help in efficiently marking the bus attendance. The bus attendance system using RFID is a time consuming process. Hence we developed a project to efficiently mark attendance using number plate recognition and OCR. The system was trained using faster RCNN model with bus image dataset. The proposed system is the number plate is captured through surveillance camera and the captured image will be passed as an input to the neural network for training and the number plate will be detected. Character extraction is done using OCR and extracted character matched will be checked with the database and the attendance for particular bus will be marked.

Download Full-text

Image Spam Filtering using Machine Learning Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1035.0782s419 ◽

2019 ◽

Vol 8 (2S4) ◽

pp. 186-190

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Spam Filtering ◽

Data Filtering ◽

Visual Data ◽

Optical Character ◽

Image Spam ◽

Artificial Neural

Unsolicited visual data is undesirable in any form. The art of hiding malicious content in images and adding them as attachments to electronic mails has become a popular nuisance. In recent years, attackers have developed various new techniques to evade traditional spam classification systems. Text-based spam classification has been in focus for a long time and, researchers have successfully created a prodigal system for identifying spam text in electronic mails using Optical Character Recognition technology. In the last decade, extensive work has been performed to tackle image spam but with unsatisfactory results. Various algorithms and data augmentation techniques are used today to develop an optimal model for image spam recognition. Many of these proposed systems come close to the ideal system but do not provide 100 percent accuracy. This paper highlights the role of three popular techniques in image spam filtering. We discuss the importance and application of Optical Character Recognition, Support Vector Machines and, Artificial Neural Networks in unsolicited visual data filtering. This paper sheds light on the algorithms of these techniques. We provide a comparison of their accuracy, which helps us draw useful insights for developing a robust unsolicited visual data classification system. This paper aims to bring clarity regarding the feasibility of using these techniques to develop an unsolicited visual data filtering system. This paper records that the most favourable results are obtained using Artificial Neural Networks.

Download Full-text

Some Remarks on the Application of Artificial Neural Networks to Optical Character Recognition

Artificial Intelligence and Knowledge Engineering Applications: A Bioinspired Approach - Lecture Notes in Computer Science ◽

10.1007/11499305_54 ◽

2005 ◽

pp. 529-537

Author(s):

A. Moratilla ◽

I. Olmeda

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Artificial Neural

Download Full-text

On Artificial Neural Networks’ Modelling of Non-Properly Prepared Teachers’ Implication on Students’ Academic Performance, Adopting Noisy Contaminated Optical Character Recognition (OCR)

10.9734/bpi/nper/v2/13229d ◽

2021 ◽

pp. 119-131

Author(s):

Hassan M. H. Mustafa ◽

Mohamed I. A. Ibrahim

Keyword(s):

Neural Networks ◽

Academic Performance ◽

Artificial Neural Networks ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Artificial Neural

Download Full-text

Development and training of neural networks for character recognition

Connectivity ◽

10.31673/2412-9070.2020.063338 ◽

2020 ◽

Vol 148 (6) ◽

Author(s):

R. D. Bukov ◽

◽

I. S. Shcherbyna ◽

O. V. Nehodenko ◽

Ye. S. Tykhonov

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Character Recognition ◽

Intelligent Systems ◽

Recognition System ◽

The Neural Network ◽

Reference Images ◽

Artificial Neural ◽

Self Learning

This article discusses the problem of the application of neural networks for character recognition, as well as the problem of developing methods and algorithms for the synthesis of neural networks. To solve the problems of optimizing the character recognition system, highly intelligent systems based on artificial neural networks are often used. However, artificial neural networks are not a tool for solving problems of any type. They are unsuitable for tasks such as payroll, but they have an advantage for character recognition tasks that conventional personal computers do poorly or not at all. It has been proven that artificial neural networks can be used for predictive modeling, adaptive control and applications where they can be trained using a dataset. Experiential self-learning can occur in networks that can draw inferences from a complex and seemingly unrelated set of information. The application of neural networks for solving practical problems in the field of character recognition and their classification is shown. It has been established that images can denote objects of different nature: text symbols, images, sound samples. When training the network, various sample images are offered with an indication of which class they belong to. At the end of training the network, you can present previously unknown images and receive an answer from it about belonging to a certain class. The topology of such a network is characterized by the fact that the number of neurons in the output layer, as a rule, is equal to the number of conditioned classes. This establishes a correspondence between the output of the neural network and the class it represents. A method for training a neural network is proposed, according to which the person managing the network takes a direct part in training the network, it itself sets the reference images of all symbols, as well as distorted images of the standards (plagued copies).

Download Full-text

Exchanging image processing and OCR components in a Setswana digitisation pipeline

South African Computer Journal ◽

10.18489/sacj.v32i2.707 ◽

2020 ◽

Vol 32 (2) ◽

Author(s):

Gideon Jozua Kotzé ◽

Friedel Wolff

Keyword(s):

Neural Network ◽

Neural Networks ◽

Image Processing ◽

Natural Language Processing ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Combined Test ◽

A Current

As more natural language processing (NLP) applications benefit from neural network based approaches, it makes sense to re-evaluate existing work in NLP. A complete pipeline for digitisation includes several components handling the material in sequence. Image processing after scanning the document has been shown to be an important factor in final quality. Here we compare two different approaches for visually enhancing documents before Optical Character Recognition (OCR), (1) a combination of ImageMagick and Unpaper and (2) OCRopus. We also compare Calamari, a new line-based OCR package using neural networks, with the well-known Tesseract 3 as the OCR component. Our evaluation on a set of Setswana documents reveals that the combination of ImageMagick/Unpaper and Calamari improves on a current baseline based on Tesseract 3 and ImageMagick/Unpaper with over 30%, achieving a mean character error rate of 1.69 across all combined test data.

Download Full-text

The Application of optical character recognition for mobile device via artificial neural networks with negative correlation learning algorithm

2013 International Conference on Electronics, Computer and Computation (ICECCO) ◽

10.1109/icecco.2013.6718268 ◽

2013 ◽

Cited By ~ 2

Author(s):

Burcu Kir ◽

Cemil Oz ◽

Ali Gulbag

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Mobile Device ◽

Negative Correlation ◽

Character Recognition ◽

Optical Character Recognition ◽

Learning Algorithm ◽

Negative Correlation Learning ◽

Optical Character ◽

Correlation Learning

Download Full-text

Pengantar dan Survey Tentang Optical Music Recognition

Jurnal ULTIMATICS ◽

10.31937/ti.v6i1.331 ◽

2014 ◽

Vol 6 (1) ◽

pp. 36-39

Author(s):

Kevin Purwito

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Music Recognition ◽

Optical Character ◽

Digital Format ◽

Music Recognition ◽

Index Terms ◽

The Many ◽

Further Development ◽

Music Symbol

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer

Download Full-text

Towards a Higher Accuracy of Optical Character Recognition of Chinese Rare Books in Making Use of Text Model

Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage - DATeCH2019 ◽

10.1145/3322905.3322922 ◽

2019 ◽

Author(s):

Hsiang-An Wang ◽

Pin-Ting Liu

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Text Model ◽

Rare Books

Download Full-text

Artificial neural networks applied for soil class prediction in mountainous landscape of the Serra do Mar¹

Revista Brasileira de Ciência do Solo ◽

10.1590/s0100-06832014000600003 ◽

2014 ◽

Vol 38 (6) ◽

pp. 1681-1693 ◽

Cited By ~ 7

Author(s):

Braz Calderano Filho ◽

Helena Polivanov ◽

César da Silva Chagas ◽

Waldir de Carvalho Júnior ◽

Emílio Velloso Barroso ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Kappa Index ◽

Class Prediction ◽

Soil Class ◽

Serra Do Mar ◽

The Neural Network ◽

Artificial Neural ◽

Local Geology

Soil information is needed for managing the agricultural environment. The aim of this study was to apply artificial neural networks (ANNs) for the prediction of soil classes using orbital remote sensing products, terrain attributes derived from a digital elevation model and local geology information as data sources. This approach to digital soil mapping was evaluated in an area with a high degree of lithologic diversity in the Serra do Mar. The neural network simulator used in this study was JavaNNS and the backpropagation learning algorithm. For soil class prediction, different combinations of the selected discriminant variables were tested: elevation, declivity, aspect, curvature, curvature plan, curvature profile, topographic index, solar radiation, LS topographic factor, local geology information, and clay mineral indices, iron oxides and the normalized difference vegetation index (NDVI) derived from an image of a Landsat-7 Enhanced Thematic Mapper Plus (ETM+) sensor. With the tested sets, best results were obtained when all discriminant variables were associated with geological information (overall accuracy 93.2 - 95.6 %, Kappa index 0.924 - 0.951, for set 13). Excluding the variable profile curvature (set 12), overall accuracy ranged from 93.9 to 95.4 % and the Kappa index from 0.932 to 0.948. The maps based on the neural network classifier were consistent and similar to conventional soil maps drawn for the study area, although with more spatial details. The results show the potential of ANNs for soil class prediction in mountainous areas with lithological diversity.

Download Full-text