handwritten text
Recently Published Documents


TOTAL DOCUMENTS

539
(FIVE YEARS 208)

H-INDEX

24
(FIVE YEARS 5)

Author(s):  
Shilpa Pandey ◽  
Gaurav Harit

In this article, we address the problem of localizing text and symbolic annotations on the scanned image of a printed document. Previous approaches have considered the task of annotation extraction as binary classification into printed and handwritten text. In this work, we further subcategorize the annotations as underlines, encirclements, inline text, and marginal text. We have collected a new dataset of 300 documents constituting all classes of annotations marked around or in-between printed text. Using the dataset as a benchmark, we report the results of two saliency formulations—CRF Saliency and Discriminant Saliency, for predicting salient patches, which can correspond to different types of annotations. We also compare our work with recent semantic segmentation techniques using deep models. Our analysis shows that Discriminant Saliency can be considered as the preferred approach for fast localization of patches containing different types of annotations. The saliency models were learned on a small dataset, but still, give comparable performance to the deep networks for pixel-level semantic segmentation. We show that saliency-based methods give better outcomes with limited annotated data compared to more sophisticated segmentation techniques that require a large training set to learn the model.


Author(s):  
M. Keerthana ◽  
P. Hima Varshini ◽  
K. Sri Thanvi ◽  
G. Vijaya ◽  
V. Deepa
Keyword(s):  

2021 ◽  
Vol 4 (5) ◽  
pp. 1183-1198
Author(s):  
Sergey S. Sidorovich

The Institute of Oriental Manuscripts of the Russian Academy of Sciences possesses a xylographed fragment in classical Mongolian script with a handwritten text on the reverse side (call mark G 110 recto), which was obtained in 1909 during P. K. Kozlov’s expedition in Khara-Khoto. The printed text in classical Mongolian script with several interlinear glosses in Chinese and a page footer (of the transcription of the Chinese name of the chapter and the page number) was read by the Soviet Orientalist N. Ts. Munkuyev more than 50 years ago. Munkuyev dated it by the XIV century based on the paleographic peculiarities. Moreover, based on the official history Yuan shi, he supposed that the text might be a Mongolian translation of the legislative code Da Yuan tong-zhi and suggested two possible versions of original Chinese name of the chapter, out of which an incorrect one was unfortunately chosen. Since Da Yuan tong-zhi was not preserved in full and the major part of the written monument including the chapters of interest were lost, it was impossible to find the text in scope, and the mistake in the reconstruction of the chapter name also could not be detected. However, in 2002 in South Korea a part of Zhi-zheng tiao-ge code was found, which was promulgated in 1346 and was intended to replace the outdated Da Yuan tong-zhi. In one of his previous articles, the author has shown that both codes were built according to a general pattern elaborated as far back as the Tang epoch (618–907). This enabled reconstruction of the name of the chapter mentioned in the fragment. Fortunately, the surviving part of the Zhi-zheng tiao-ge code contains the required chapters, and the Chinese glosses in the fragment allowed us to find the original Chinese text, which turned out to be a document dated 1303 and, according to the date, was evidently included in both codes. The article also contains the Chinese text of the document and its annotated translation.


2021 ◽  
Vol 3 (4) ◽  
pp. 367-376
Author(s):  
Yasir Babiker Hamdan ◽  
A. Sathesh

Due to the complex and irregular shapes of handwritten text, it is challenging to spot and recognize the handwritten words. In low-resource scripts, retrieval of words is a difficult and laborious task. The need for increasing the number of samples and introducing variations in the extended training datasets occur with the use of deep learning and neural network models. All possible variations and occurrences cannot be covered in an efficient manner with the use of the existing preprocessing strategies and theories. A scalable and elastic methodology for wrapping the extracted features is presented with the introduction of an adversarial feature deformation and regularization module in this paper. In the original deep learning framework, this module is introduced between the intermediate layers while training in an alternative manner. When compared to the conventional models, highly informative features are learnt in an efficient manner with the help of this setup. Extensive word datasets are used for testing the proposed model, which is built on popular frameworks available for word recognition and spotting, while enhancing them with the proposed module. While varying the training data size, the results are recorded and compared with the conventional models. Improvement in the mAP scores, word-error rate and low data regime is observed from the results of comparison.


Author(s):  
Anton Shayevich ◽  
Svetlana Unzhakova ◽  
Igor Spiridonov

The authors examine some problematic aspects of the practical application of the developed forensic methods and methodologies in law enforcement work. They discuss the possibilities of studying handwriting not only for identification, but also diagnostic purposes, for example, to determine the significance of information in certain parts of the handwritten text for the writer. In order to prove that such possibilities exist, the authors present a brief description and examples of the experimental use of the methodology that makes it possible to determine, relatively quickly, the attitude of the person to relevant circumstances and facts by analyzing experimental samples of handwriting obtained through copying, by hand, a specially prepared structured text.


2021 ◽  
pp. 81-95
Author(s):  
Eduardo Xamena ◽  
Héctor Emanuel Barboza ◽  
Carlos Ismael Orozco

The task of automated recognition of handwritten texts requires various phases and technologies both optical and language related. This article describes an approach for performing this task in a comprehensive manner, using machine learning throughout all phases of the process. In addition to the explanation of the employed methodology, it describes the process of building and evaluating a model of manuscript recognition for the Spanish language. The original contribution of this article is given by the training and evaluation of Offline HTR models for Spanish language manuscripts, as well as the evaluation of a platform to perform this task in a complete way. In addition, it details the work being carried out to achieve improvements in the models obtained, and to develop new models for different complex corpora that are more difficult for the HTR task.


Diacronia ◽  
2021 ◽  
Author(s):  
Constanța Burlacu ◽  
Achim Rabus

In this paper we discuss the application of the software platform Transkribus (transkribus.eu), an AI-assisted tool for Handwritten Text Recognition (HTR), to 16th century Romanian manuscript and printed sources using Cyrillic scripts. After an overview of the basic functionality of the HTR technology and Transkribus, we discuss the Romanian and bilingual Slavonic-Romanian sources we used, give an insight on training specific and generic as well as smart (i.e. transliterating from Cyrillic into Latin script) models, evaluate their performance and discuss implications of HTR for philological research in the Digital Age. We conclude with an outlook on future research perspectives.


2021 ◽  
Author(s):  
Gentian Gashi

Handwriting recognition is the process of automatically converting handwritten text into electronic text (letter codes) usable by a computer. The increase in technology reliance during an international pandemic caused by COVID-19 has showcased the importance of ensuring the information stored and digitised is done accurately and efficiently. Interpreting handwriting remains complex for both humans and computers due to the various styles and skewed characters. In this study, we conducted a correlational analysis on the association between filter sizes and the convolutional neural networks (CNN’s) classification accuracy. The testing has been conducted from the publicly available MNIST database of handwritten digits (LeCun and Cortes, 2010). The dataset consists of a training set (N=60,000) and a testing set (N=10,000). Using ANOVA, our results indicate a strong correlation (.000,P≤0.05) between filter size and classification accuracy. However, this significance is only present when increasing the filter size from 1x1 to 2x2. Larger filter sizes were insignificant therefore, a filter size above 2x2 cannot be recommended.


2021 ◽  
Vol 7 (12) ◽  
pp. 260
Author(s):  
Lazaros Tsochatzidis ◽  
Symeon Symeonidis ◽  
Alexandros Papazoglou ◽  
Ioannis Pratikakis

Offline handwritten text recognition (HTR) for historical documents aims for effective transcription by addressing challenges that originate from the low quality of manuscripts under study as well as from several particularities which are related to the historical period of writing. In this paper, the challenge in HTR is related to a focused goal of the transcription of Greek historical manuscripts that contain several particularities. To this end, in this paper, a convolutional recurrent neural network architecture is proposed that comprises octave convolution and recurrent units which use effective gated mechanisms. The proposed architecture has been evaluated on three newly created collections from Greek historical handwritten documents that will be made publicly available for research purposes as well as on standard datasets like IAM and RIMES. For evaluation we perform a concise study which shows that compared to state of the art architectures, the proposed one deals effectively with the challenging Greek historical manuscripts.


Sign in / Sign up

Export Citation Format

Share Document