Using Transfer Learning to contextually Optimize Optical Character Recognition (OCR) output and perform new Feature Extraction on a digitized cultural and historical dataset

Author(s):  
Aravind Inbasekaran ◽  
Rajesh Kumar Gnanasekaran ◽  
Richard Marciano
Author(s):  
FANG-HSUAN CHENG ◽  
WEN-HSING HSU

This paper describes typical research on Chinese optical character recognition in Taiwan. Chinese characters can be represented by a set of basic line segments called strokes. Several approaches to the recognition of handwritten Chinese characters by stroke analysis are described here. A typical optical character recognition (OCR) system consists of four main parts: image preprocessing, feature extraction, radical extraction and matching. Image preprocessing is used to provide the suitable format for data processing. Feature extraction is used to extract stable features from the Chinese character. Radical extraction is used to decompose the Chinese character into radicals. Finally, matching is used to recognize the Chinese character. The reasons for using strokes as the features for Chinese character recognition are the following. First, all Chinese characters can be represented by a combination of strokes. Second, the algorithms developed under the concept of strokes do not have to be modified when the number of characters increases. Therefore, the algorithms described in this paper are suitable for recognizing large sets of Chinese characters.


2020 ◽  
Vol 8 (4) ◽  
pp. 453
Author(s):  
Widya Dharma Sidi ◽  
I Gede Arta Wibawa

Abstract This research was conducted to determine the accuracy of the Sum of Squared Difference (SSD) Template Matching method in the Application of Learning Numbers Writing Games. This game application is an application created to help early childhood in recognizing Arabic numbers, namely numbers from 0 to 9. In the SSD Template Matching method there are several processes including Preprocessing, thinning, feature extraction, and classification (SSD template matching). In testing the game application involves 10 respondents who were asked to write numbers correctly as requested by the application. For each number writing test, it is tested by 3 times. From the tests conducted, obtained an accuracy of 94.67%. Keyword: Template Matching, Sum of Squared Difference (SSD), Education Game, Optical Character Recognition, Mobile Learning


Author(s):  
Sk. Md. Obaidullah ◽  
K. C. Santosh ◽  
Nibaran Das ◽  
Chayan Halder ◽  
Kaushik Roy

Script identification is crucial for automating optical character recognition (OCR) in multi-script documents since OCRs are script-dependent. In this paper, we present a comprehensive survey of the techniques developed for handwritten Indic script identification. Different pre-processing and feature extraction techniques, including classifiers used for script identification, are categorized and their merits and demerits are discussed. We also provide information about some handwritten Indic script datasets. Finally, we highlight the extensions and/or future scope of works together with challenges.


Author(s):  
Sandip Kundu ◽  
◽  
Hrishi Singh Chhabra ◽  
Sahi Summa Ara ◽  
Rishi Prakash Mishra ◽  
...  

Author(s):  
S. IMPEDOVO ◽  
L. OTTAVIANO ◽  
S. OCCHINEGRO

In order to highlight the interesting problems and actual results on the state of the art in optical character recognition (OCR), this paper describes and compares preprocessing, feature extraction and postprocessing techniques for commercial reading machines. Problems related to handwritten and printed character recognition are pointed out, and the functions and operations of the major components of an OCR system are described. Historical background on the development of character recognition is briefly given and the working of an optical scanner is explained. The specifications of several recognition systems that are commercially available are reported and compared.


Sign in / Sign up

Export Citation Format

Share Document