scholarly journals Automatic Estimation of Age Distributions from the First Ottoman Empire Population Register Series by Using Deep Learning

Electronics ◽  
2021 ◽  
Vol 10 (18) ◽  
pp. 2253
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Recently, an increasing number of studies have applied deep learning algorithms for extracting information from handwritten historical documents. In order to accomplish that, documents must be divided into smaller parts. Page and line segmentation are vital stages in the Handwritten Text Recognition systems; it directly affects the character segmentation stage, which in turn determines the recognition success. In this study, we first applied deep learning-based layout analysis techniques to detect individuals in the first Ottoman population register series collected between the 1840s and the 1860s. Then, we employed horizontal projection profile-based line segmentation to the demographic information of these detected individuals in these registers. We further trained a CNN model to recognize automatically detected ages of individuals and estimated age distributions of people from these historical documents. Extracting age information from these historical registers is significant because it has enormous potential to revolutionize historical demography of around 20 successor states of the Ottoman Empire or countries of today. We achieved approximately 60% digit accuracy for recognizing the numbers in these registers and estimated the age distribution with Root Mean Square Error 23.61.

Author(s):  
Sri. Yugandhar Manchala ◽  
Jayaram Kinthali ◽  
Kowshik Kotha ◽  
Kanithi Santosh Kumar, Jagilinki Jayalaxmi ◽  

Segmentation is division of something into smaller parts and one of the Component of character recognition system. Separation of characters, words and lines are done in Segmentation from text documents. character recognition is a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. the accuracy of OCR system is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. The main aim of this paper is to find out the various text line segmentations are Projection profiles, Weighted Bucket Method. Proposed method is horizontal projection profile and connected component method on Handwritten Kannada language. These methods are used for experimentation and finally comparing their accuracy and results.


2020 ◽  
Vol 6 (5) ◽  
pp. 32 ◽  
Author(s):  
Yekta Said Can ◽  
M. Erdem Kabadayı

Historical document analysis systems gain importance with the increasing efforts in the digitalization of archives. Page segmentation and layout analysis are crucial steps for such systems. Errors in these steps will affect the outcome of handwritten text recognition and Optical Character Recognition (OCR) methods, which increase the importance of the page segmentation and layout analysis. Degradation of documents, digitization errors, and varying layout styles are the issues that complicate the segmentation of historical documents. The properties of Arabic scripts such as connected letters, ligatures, diacritics, and different writing styles make it even more challenging to process Arabic script historical documents. In this study, we developed an automatic system for counting registered individuals and assigning them to populated places by using a CNN-based architecture. To evaluate the performance of our system, we created a labeled dataset of registers obtained from the first wave of population registers of the Ottoman Empire held between the 1840s and 1860s. We achieved promising results for classifying different types of objects and counting the individuals and assigning them to populated places.


Author(s):  
Kavitha Ananth, Et. al.

This paper offers a solution to traditional handwriting recognition techniques using concepts of Deep learning and Word Beam Search. This paper explains about how an individual handwritten word is classified from the  handwritten text by translating into a digital form. The digital form when trained with the Connectionist Temporal Classification (CTC) loss function, the output produced is a RNN. This is a matrix containing character probabilities for each time-step. The final text is mapped using a CTC decoding algorithm by converting the character probabilities. The recognized text is constructed by a list of words from the dictionary by using the token passing algorithm. It is found the running time of token passing depends on the size of dictionary. Also the numbers like arbitrary character strings will not able to decode. In this paper the decoding search algorithm word beam search is proposed, in order to tackle these types of problems. This methodology support to constrain words similar to those contained in a dictionary. It allows the character strings such as arbitrary non-word between the words, and integrates into a word-level language model. It is found the running time is better when compared with the token passing. The proposed algorithm comprises of the decoding algorithm named vanilla beam search and token passing using the IAM dataset and Bentham data set.


2019 ◽  
Vol 94 ◽  
pp. 122-134 ◽  
Author(s):  
Joan Andreu Sánchez ◽  
Verónica Romero ◽  
Alejandro H. Toselli ◽  
Mauricio Villegas ◽  
Enrique Vidal

Author(s):  
Arthur Flor de Sousa Neto ◽  
Byron Leite Dantas Bezerra ◽  
Alejandro Hector Toselli ◽  
Estanislau Baptista Lima

Babel ◽  
2020 ◽  
Vol 66 (2) ◽  
pp. 294-310
Author(s):  
Miodrag M. Vukčević

Abstract The translation of handwritten historical documents faces many challenges due to variation in the writing style, local language, and an inevitable language change. Even the transliteration from Cyrillic to Latin characters is standardized by the bijective transliteration standard ISO 9. This presentation introduces a number of tools offered by Transkribus for the automated processing of documents, such as Handwritten Text Recognition (HTR) and Document Understanding, which are needed for the translation of historical documents. Next to the problem of decoding handwritten documents, written for example in Kurrentschrift using ancient terminology, changed meanings and different spelling have additionally to be considered during the translation of texts from earlier centuries. Resolution strategies on a case study show different methods for ensuring quality translations.


Author(s):  
Jebaveerasingh Jebadurai ◽  
Immanuel Johnraja Jebadurai ◽  
Getzi Jeba Leelipushpam Paulraj ◽  
Sushen Vallabh Vangeepuram

Sign in / Sign up

Export Citation Format

Share Document