Label Transcript is Done – Now what do we do with that Data?

The transcription of natural history collection labels is occurring via a variety of different methods – in-house curators, commercial operations, citizen scientists, visiting researchers, linked data, optical character recognition (OCR), handwritten text recognition (HTR), etc., but what can a collections data manager do with this flood of data? There are a whole raft of questions around this incoming data stream - who values it, who needs it, where is it stored, where is it displayed, who has access to it, etc. This talk plans to address these topics with reference to the Royal Botanic Garden Edinburgh herbarium dataset.

Download Full-text

CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation

Journal of Imaging ◽

10.3390/jimaging6050032 ◽

2020 ◽

Vol 6 (5) ◽

pp. 32 ◽

Cited By ~ 1

Author(s):

Yekta Said Can ◽

M. Erdem Kabadayı

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Text Recognition ◽

Historical Documents ◽

Layout Analysis ◽

Page Segmentation ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Different Types ◽

Archival Documentation

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.

Download Full-text

OPTICAL CHARACTER RECOGNITION FOR ELECTRONIC INVOICES USING AWS SERVICES

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i05.036 ◽

2021 ◽

Vol 6 (5) ◽

Author(s):

Sameer M. Patel ◽

Sarvesh S. Pai ◽

Mittal B. Jain ◽

Vaibhav P. Vasani

Keyword(s):

Character Recognition ◽

Web Application ◽

Optical Character Recognition ◽

Credit Cards ◽

Text Recognition ◽

Service Architecture ◽

The Past ◽

Optical Character ◽

Handwritten Text

Optical Character Recognition is basically the mechanical or electronic conversion of printed or handwritten text into machine understandable text. The complication of Optical Character Recognition in different conditions remains as relevant as it was in the past few years. At the present time of automation and innovations, Keyboarding remains the most common way of inputting or feeding data into computers. This is probably the most time consuming and labor-intensive operation in the industry. Automating the process of recognition of documents, credit cards, electronic invoices, and license plates of cars – all of this could help in saving time for analyzing and processing data. With the increased research and development of machine learning, the quality of text recognition is continuously growing better. Our paper is focused on providing a brief explanation of the different stages involved in the process of optical character recognition and through the proposed application; we aim to automate the process of extraction of important texts from electronic invoices. The main goal of the project is to develop a real time OCR web application with a micro service architecture, which would help in extracting necessary information from an invoice.

Download Full-text

SCENE TEXT RECOGNITION BY USING EE-MSER AND OPTICAL CHARACTER RECOGNITION FOR NATURAL IMAGES

International Journal of Advance Engineering and Research Development ◽

10.21090/ijaerd.021219 ◽

2015 ◽

Vol 2 (12) ◽

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Natural Images ◽

Text Recognition ◽

Optical Character ◽

Scene Text ◽

Scene Text Recognition

Download Full-text

Optical Character Recognition for English Handwritten Text Using Recurrent Neural Network

2020 International Conference on System, Computation, Automation and Networking (ICSCAN) ◽

10.1109/icscan49426.2020.9262379 ◽

2020 ◽

Author(s):

R. Parthiban ◽

R. Ezhilarasi ◽

D. Saravanan

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Character Recognition ◽

Optical Character Recognition ◽

Optical Character ◽

Handwritten Text

Download Full-text

Aplikasi Kalkulator Tulisan Tangan Sederhana Menggunakan Optical Character Recognition (OCR)

Applied Technology and Computing Science Journal ◽

10.33086/atcsj.v3i2.1867 ◽

2021 ◽

Vol 3 (2) ◽

pp. 103-116

Author(s):

Supriadi Supriadi

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Text Recognition ◽

Arithmetic Operations ◽

Written Text ◽

Optical Character ◽

Calculation Results

The calculator is a calculation tool that is widely used in various specialized fields of business and commerce. The use of a calculator makes it easier for humans to perform arithmetic operations, but there are obstacles in the process of inputting numbers if you want to calculate the value of numbers on written media such as paper, whiteboards and so on. The user must first see the text on written media, then read it and remember it then type the writing on a calculator tool or application. The drawback of this method is that when the user forgets the writing on the written media, the user will see the written text and remember it again so that it takes longer to perform calculations using a calculator. The method used in this study is Optical Character Recognition, this method can recognize text contained in images or handwritten images of mathematical number operations. The results of the text recognition will then be carried out by arithmetic calculations to get the calculation results. From the trials on 20 handwritten images of mathematical number operations, the results obtained were 85% accuracy of extraction and accuracy of handwritten images that can be calculated and correct by 85%

Download Full-text

Transcript Anatomization with Multi-Linguistic and Speech Synthesis Features

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35371 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1755-1758

Author(s):

Rohan Modi

Keyword(s):

Pattern Recognition ◽

Character Recognition ◽

Optical Character Recognition ◽

Speech Synthesis ◽

Handwriting Recognition ◽

Cost Effective ◽

Computer Hardware ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Audio Output

Handwriting Detection is a process or potential of a computer program to collect and analyze comprehensible input that is written by hand from various types of media such as photographs, newspapers, paper reports etc. Handwritten Text Recognition is a sub-discipline of Pattern Recognition. Pattern Recognition is refers to the classification of datasets or objects into various categories or classes. Handwriting Recognition is the process of transforming a handwritten text in a specific language into its digitally expressible script represented by a set of icons known as letters or characters. Speech synthesis is the artificial production of human speech using Machine Learning based software and audio output based computer hardware. While there are many systems which convert normal language text in to speech, the aim of this paper is to study Optical Character Recognition with speech synthesis technology and to develop a cost effective user friendly image based offline text to speech conversion system using CRNN neural networks model and Hidden Markov Model. The automated interpretation of text that has been written by hand can be very useful in various instances where processing of great amounts of handwritten data is required, such as signature verification, analysis of various types of documents and recognition of amounts written on bank cheques by hand.

Download Full-text

Research on Deep Learning Techniques in Breaking Text-Based Captchas and Designing Image-Based Captcha

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-900 ◽

2021 ◽

pp. 266-269

Author(s):

Janarthanan A ◽

Pandiyarajan C ◽

Sabarinathan M ◽

Sudhan M ◽

Kala R

Keyword(s):

Deep Learning ◽

Image Classification ◽

Character Recognition ◽

Optical Character Recognition ◽

Experimental Results ◽

Text Recognition ◽

Image Resizing ◽

Optical Character ◽

Learning Techniques ◽

Text Images

Optical character recognition (OCR) is a process of text recognition in images (one word). The input images are taken from the dataset. The collected text images are implemented to pre-processing. In pre-processing, we can implement the image resize process. Image resizing is necessary when you need to increase or decrease the total number of pixels, whereas remapping can occur when you are zooming refers to increase the quantity of pixels, so that when you zoom an image, you will see clear content. After that, we can implement the segmentation process. In segmentation, we can segment the each characters in one word. We can extract the features values from the image that means test feature. In classification process, we have to classify the text from the image. Image classification is performed the images in order to identify which image contains text. A classifier is used to identify the image containing text. The experimental results shows that the accuracy.

Download Full-text

Character Segmentation and Skew Correction for Handwritten Devanagari Scripts: A Friends Technique

Asian Journal of Engineering and Applied Technology ◽

10.51983/ajeat-2019.8.1.1060 ◽

2019 ◽

Vol 8 (1) ◽

pp. 50-54

Author(s):

Ashok Kumar Bathla . ◽

Sunil Kumar Gupta .

Keyword(s):

Human Brain ◽

Character Recognition ◽

Optical Character Recognition ◽

Character Segmentation ◽

Novel Technique ◽

Skew Correction ◽

Optical Character ◽

Handwritten Text ◽

Scripting Language ◽

The Way

Optical Character Recognition (OCR) technology allows a computer to “read” text (both typed and handwritten) the way a human brain does.Significant research efforts have been put in the area of Optical Character Segmentation (OCR) of typewritten text in various languages, however very few efforts have been put on the segmentation and skew correction of handwritten text written in Devanagari which is a scripting language of Hindi. This paper aims a novel technique for segmentation and skew correction of hand written Devanagari text. It shows the accuracy of 91% and takes less than one second to segment a particular handwritten word.

Download Full-text

Classification of printed and handwritten text using hybrid techniques for gurumukhi script

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v8i04.4298 ◽

2019 ◽

Vol 8 (04) ◽

pp. 24586-24602

Author(s):

Manpreet Kaur ◽

Balwinder Singh

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Classification Problem ◽

Hybrid Techniques ◽

Optical Character ◽

Handwritten Text ◽

Scanned Images ◽

Character Classification ◽

Incorrect Classification

Text classification is a crucial step for optical character recognition. The output of the scanner is non- editable. Though one cannot make any change in scanned text image, if required. Thus, this provides the feed for the theory of optical character recognition. Optical Character Recognition (OCR) is the process of converting scanned images of machine printed or handwritten text into a computer readable format. The process of OCR involves several steps including pre-processing after image acquisition, segmentation, feature extraction, and classification. The incorrect classification is like a garbage in and garbage out. Existing methods focuses only upon the classification of unmixed characters in Arab, English, Latin, Farsi, Bangla, and Devnagari script. The Hybrid Techniques is solving the mixed (Machine printed and handwritten) character classification problem. Classification is carried out on different kind of daily use forms like as self declaration forms, admission forms, verification forms, university forms, certificates, banking forms, dairy forms, Punjab govt forms etc. The proposed technique is capable to classify the handwritten and machine printed text written in Gurumukhi script in mixed text. The proposed technique has been tested on 150 different kinds of forms in Gurumukhi and Roman scripts. The proposed techniques achieve 93% accuracy on mixed character form and 96% accuracy achieves on unmixed character forms. The overall accuracy of the proposed technique is 94.5%.

Download Full-text

SVM and HMM Classifier Combination Based Approach for Online Handwritten Indic Character Recognition

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666181127124711 ◽

2020 ◽

Vol 13 (2) ◽

pp. 200-214

Author(s):

Rajib Ghosh ◽

Prabhat Kumar

Keyword(s):

Character Recognition ◽

Text Recognition ◽

Support Vector ◽

Present System ◽

Novel Approach ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Shafer Theory ◽

Public Datasets ◽

Indic Scripts

Background: The growing use of smart hand-held devices in the daily lives of the people urges for the requirement of online handwritten text recognition. Online handwritten text recognition refers to the identification of the handwritten text at the very moment it is written on a digitizing tablet using some pen-like stylus. Several techniques are available for online handwritten text recognition in English, Arabic, Latin, Chinese, Japanese, and Korean scripts. However, limited research is available for Indic scripts. Objective: This article presents a novel approach for online handwritten numeral and character (simple and compound) recognition of three popular Indic scripts - Devanagari, Bengali and Tamil. Methods: The proposed work employs the Zone wise Slopes of Dominant Points (ZSDP) method for feature extraction from the individual characters. Support Vector Machine (SVM) and Hidden Markov Model (HMM) classifiers are used for recognition process. Recognition efficiency is improved by combining the probabilistic outcomes of the SVM and HMM classifiers using Dempster-Shafer theory. The system is trained using separate as well as combined dataset of numerals, simple and compound characters. Results: The performance of the present system is evaluated using large self-generated datasets as well as public datasets. Results obtained from the present work demonstrate that the proposed system outperforms the existing works in this regard. Conclusion: This work will be helpful to carry out researches on online recognition of handwritten character in other Indic scripts as well as recognition of isolated words in various Indic scripts including the scripts used in the present work.

Download Full-text