SVM and HMM Classifier Combination Based Approach for Online Handwritten Indic Character Recognition

Background: The growing use of smart hand-held devices in the daily lives of the people urges for the requirement of online handwritten text recognition. Online handwritten text recognition refers to the identification of the handwritten text at the very moment it is written on a digitizing tablet using some pen-like stylus. Several techniques are available for online handwritten text recognition in English, Arabic, Latin, Chinese, Japanese, and Korean scripts. However, limited research is available for Indic scripts. Objective: This article presents a novel approach for online handwritten numeral and character (simple and compound) recognition of three popular Indic scripts - Devanagari, Bengali and Tamil. Methods: The proposed work employs the Zone wise Slopes of Dominant Points (ZSDP) method for feature extraction from the individual characters. Support Vector Machine (SVM) and Hidden Markov Model (HMM) classifiers are used for recognition process. Recognition efficiency is improved by combining the probabilistic outcomes of the SVM and HMM classifiers using Dempster-Shafer theory. The system is trained using separate as well as combined dataset of numerals, simple and compound characters. Results: The performance of the present system is evaluated using large self-generated datasets as well as public datasets. Results obtained from the present work demonstrate that the proposed system outperforms the existing works in this regard. Conclusion: This work will be helpful to carry out researches on online recognition of handwritten character in other Indic scripts as well as recognition of isolated words in various Indic scripts including the scripts used in the present work.

Download Full-text

CNN-Based Page Segmentation and Object Classification for Counting Population in Ottoman Archival Documentation

Journal of Imaging ◽

10.3390/jimaging6050032 ◽

2020 ◽

Vol 6 (5) ◽

pp. 32 ◽

Cited By ~ 1

Author(s):

Yekta Said Can ◽

M. Erdem Kabadayı

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Text Recognition ◽

Historical Documents ◽

Layout Analysis ◽

Page Segmentation ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Different Types ◽

Archival Documentation

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.

Download Full-text

Line-segment Feature Analysis Algorithm Using Input Dimensionality Reduction for Handwritten Text Recognition

Applied Sciences ◽

10.3390/app10196904 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6904

Author(s):

Chang-Min Kim ◽

Ellen J. Hong ◽

Kyungyong Chung ◽

Roy C. Park

Keyword(s):

Dimensionality Reduction ◽

Line Segment ◽

Handwriting Recognition ◽

Text Recognition ◽

Feature Analysis ◽

Support Vector ◽

License Plate ◽

Field Methods ◽

Handwritten Text ◽

Handwritten Text Recognition

Recently, demand for handwriting recognition, such as automation of mail sorting, license plate recognition, and electronic memo pads, has exponentially increased in various industrial fields. In addition, in the image recognition field, methods using artificial convolutional neural networks, which show outstanding performance, have been applied to handwriting recognition. However, owing to the diversity of recognition application fields, the number of dimensions in the learning and reasoning processes is increasing. To solve this problem, a principal component analysis (PCA) technique is used for dimensionality reduction. However, PCA is likely to increase the accuracy loss due to data compression. Therefore, in this paper, we propose a line-segment feature analysis (LFA) algorithm for input dimensionality reduction in handwritten text recognition. This proposed algorithm extracts the line segment information, constituting the image of input data, and assigns a unique value to each segment using 3 × 3 and 5 × 5 filters. Using the unique values to identify the number of line segments and adding them up, a 1-D vector with a size of 512 is created. This vector is used as input to machine-learning. For the performance evaluation of the method, the Extending Modified National Institute of Standards and Technology (EMNIST) database was used. In the evaluation, PCA showed 96.6% and 93.86% accuracy with k-nearest neighbors (KNN) and support vector machine (SVM), respectively, while LFA showed 97.5% and 98.9% accuracy with KNN and SVM, respectively.

Download Full-text

Offline handwritten text recognition using support vector machines

2017 4th International Conference on Signal Processing and Integrated Networks (SPIN) ◽

10.1109/spin.2017.8049930 ◽

2017 ◽

Cited By ~ 3

Author(s):

Martin Rajnoha ◽

Radim Burget ◽

Malay Kishore Dutta

Keyword(s):

Support Vector Machines ◽

Text Recognition ◽

Support Vector ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Vector Machines

Download Full-text

Boosting of Deep Convolutional Architectures for Arabic Handwriting Recognition

International Journal of Multimedia Data Engineering and Management ◽

10.4018/ijmdem.2019100102 ◽

2019 ◽

Vol 10 (4) ◽

pp. 26-45 ◽

Cited By ~ 1

Author(s):

Mohamed Elleuch ◽

Monji Kherallah

Keyword(s):

Character Recognition ◽

State Of The Art ◽

Handwriting Recognition ◽

Image Data ◽

Text Recognition ◽

Deep Belief Networks ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Accuracy Rates ◽

Hierarchical Representations

In recent years, deep learning (DL) based systems have become very popular for constructing hierarchical representations from unlabeled data. Moreover, DL approaches have been shown to exceed foregoing state of the art machine learning models in various areas, by pattern recognition being one of the more important cases. This paper applies Convolutional Deep Belief Networks (CDBN) to textual image data containing Arabic handwritten script (AHS) and evaluated it on two different databases characterized by the low/high-dimension property. In addition to the benefits provided by deep networks, the system is protected against over-fitting. Experimentally, the authors demonstrated that the extracted features are effective for handwritten character recognition and show very good performance comparable to the state of the art on handwritten text recognition. Yet using Dropout, the proposed CDBN architectures achieved a promising accuracy rates of 91.55% and 98.86% when applied to IFN/ENIT and HACDB databases, respectively.

Download Full-text

A Novel Approach of Bangla Handwritten Text Recognition Using HMM

2014 14th International Conference on Frontiers in Handwriting Recognition ◽

10.1109/icfhr.2014.116 ◽

2014 ◽

Cited By ~ 22

Author(s):

Partha Pratim Roy ◽

Prasenjit Dey ◽

Sangheeta Roy ◽

Umapada Pal ◽

Fumitaka Kimura

Keyword(s):

Text Recognition ◽

Novel Approach ◽

Handwritten Text ◽

Handwritten Text Recognition

Download Full-text

Sindhi Handwritten Text Recognition Using SVM

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/201032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1627-1631

Keyword(s):

Feature Extraction ◽

Complex Problem ◽

Training Data ◽

Text Recognition ◽

Support Vector ◽

Text Data ◽

Handwritten Text ◽

Handwritten Text Recognition ◽

Text Feature ◽

Language Text

In Sindhi Language, handwritten text feature extraction is such a challenging task for all scholars, because different people write in different styles or manners, to analyze each text is such a complex problem. Feature extraction of text segmentation, classifying each character and labelling for training data to recognize text for different handwritings and testing for analyzing features of providing handwritten text data .In this research, SVM (support vector machine) is used for analyzing and tokenizing each character or word of Sindhi Language text and transform into suitable information with efficiency & accuracy. The research is not only useful for improving the knowledge of Sindhi Handwritten Text Recognition but it can be beneficial for other recognition systems

Download Full-text

Label Transcript is Done – Now what do we do with that Data?

Biodiversity Information Science and Standards ◽

10.3897/biss.2.27055 ◽

2018 ◽

Vol 2 ◽

pp. e27055

Author(s):

Robert Cubey ◽

Elspeth Haston ◽

Sally King

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Linked Data ◽

Data Stream ◽

Text Recognition ◽

Botanic Garden ◽

Optical Character ◽

Natural History Collection ◽

Handwritten Text ◽

Handwritten Text Recognition

The transcription of natural history collection labels is occurring via a variety of different methods – in-house curators, commercial operations, citizen scientists, visiting researchers, linked data, optical character recognition (OCR), handwritten text recognition (HTR), etc., but what can a collections data manager do with this flood of data? There are a whole raft of questions around this incoming data stream - who values it, who needs it, where is it stored, where is it displayed, who has access to it, etc. This talk plans to address these topics with reference to the Royal Botanic Garden Edinburgh herbarium dataset.

Download Full-text