scholarly journals HTR for Greek Historical Handwritten Documents

2021 ◽  
Vol 7 (12) ◽  
pp. 260
Author(s):  
Lazaros Tsochatzidis ◽  
Symeon Symeonidis ◽  
Alexandros Papazoglou ◽  
Ioannis Pratikakis

Offline handwritten text recognition (HTR) for historical documents aims for effective transcription by addressing challenges that originate from the low quality of manuscripts under study as well as from several particularities which are related to the historical period of writing. In this paper, the challenge in HTR is related to a focused goal of the transcription of Greek historical manuscripts that contain several particularities. To this end, in this paper, a convolutional recurrent neural network architecture is proposed that comprises octave convolution and recurrent units which use effective gated mechanisms. The proposed architecture has been evaluated on three newly created collections from Greek historical handwritten documents that will be made publicly available for research purposes as well as on standard datasets like IAM and RIMES. For evaluation we perform a concise study which shows that compared to state of the art architectures, the proposed one deals effectively with the challenging Greek historical manuscripts.

2020 ◽  
Vol 10 (21) ◽  
pp. 7711
Author(s):  
Arthur Flor de Sousa Neto ◽  
Byron Leite Dantas Bezerra ◽  
Alejandro Héctor Toselli

The increasing portability of physical manuscripts to the digital environment makes it common for systems to offer automatic mechanisms for offline Handwritten Text Recognition (HTR). However, several scenarios and writing variations bring challenges in recognition accuracy, and, to minimize this problem, optical models can be used with language models to assist in decoding text. Thus, with the aim of improving results, dictionaries of characters and words are generated from the dataset and linguistic restrictions are created in the recognition process. In this way, this work proposes the use of spelling correction techniques for text post-processing to achieve better results and eliminate the linguistic dependence between the optical model and the decoding stage. In addition, an encoder–decoder neural network architecture in conjunction with a training methodology are developed and presented to achieve the goal of spelling correction. To demonstrate the effectiveness of this new approach, we conducted an experiment on five datasets of text lines, widely known in the field of HTR, three state-of-the-art Optical Models for text recognition and eight spelling correction techniques, among traditional statistics and current approaches of neural networks in the field of Natural Language Processing (NLP). Finally, our proposed spelling correction model is analyzed statistically through HTR system metrics, reaching an average sentence correction of 54% higher than the state-of-the-art method of decoding in the tested datasets.


Babel ◽  
2020 ◽  
Vol 66 (2) ◽  
pp. 294-310
Author(s):  
Miodrag M. Vukčević

Abstract The translation of handwritten historical documents faces many challenges due to variation in the writing style, local language, and an inevitable language change. Even the transliteration from Cyrillic to Latin characters is standardized by the bijective transliteration standard ISO 9. This presentation introduces a number of tools offered by Transkribus for the automated processing of documents, such as Handwritten Text Recognition (HTR) and Document Understanding, which are needed for the translation of historical documents. Next to the problem of decoding handwritten documents, written for example in Kurrentschrift using ancient terminology, changed meanings and different spelling have additionally to be considered during the translation of texts from earlier centuries. Resolution strategies on a case study show different methods for ensuring quality translations.


Author(s):  
Mohamed Elleuch ◽  
Monji Kherallah

In recent years, deep learning (DL) based systems have become very popular for constructing hierarchical representations from unlabeled data. Moreover, DL approaches have been shown to exceed foregoing state of the art machine learning models in various areas, by pattern recognition being one of the more important cases. This paper applies Convolutional Deep Belief Networks (CDBN) to textual image data containing Arabic handwritten script (AHS) and evaluated it on two different databases characterized by the low/high-dimension property. In addition to the benefits provided by deep networks, the system is protected against over-fitting. Experimentally, the authors demonstrated that the extracted features are effective for handwritten character recognition and show very good performance comparable to the state of the art on handwritten text recognition. Yet using Dropout, the proposed CDBN architectures achieved a promising accuracy rates of 91.55% and 98.86% when applied to IFN/ENIT and HACDB databases, respectively.


2020 ◽  
Vol 21 (4) ◽  
pp. 40-44
Author(s):  
Dawn Behrend

Sex & Sexuality, Module I: Research Collections from the Kinsey Institute Library & Special Collections published by Adam Matthew Digital is a collection of digitized primary sources obtained exclusively from the Kinsey Institute Library & Special Collections dedicated to the study of human sexuality throughout the twentieth century. The collection makes use of the artificial intelligence capabilities of Handwritten Text Recognition (HTR) to enable keyword searching of handwritten documents. The documents and images in the collection have been meticulously digitized by Adam Matthew Digital making them discoverable, visually appealing, and adjustable. The proprietary interface is intuitive to navigate with the product being compatible with a range of browsers and electronic devices. Contract provisions are standard to the product and permit for use across locations and interlibrary loan sharing. As pricing is primarily determined by size and enrollment, the collection may be affordable for libraries of varying sizes. Users seeking more current research on gender and women’s studies may find ProQuest’s GenderWatch a more suitable choice, while those seeking information on sexuality from the sixteenth to mid-twentieth centuries may prefer Part III of Gale’s Archives of Sexuality & Gender with both resources providing access to a range of sources beyond that of the Kinsey Institute.


2019 ◽  
Vol 26 (1) ◽  
pp. 73-94
Author(s):  
Arda Tezcan ◽  
Véronique Hoste ◽  
Lieve Macken

AbstractVarious studies show that statistical machine translation (SMT) systems suffer from fluency errors, especially in the form of grammatical errors and errors related to idiomatic word choices. In this study, we investigate the effectiveness of using monolingual information contained in the machine-translated text to estimate word-level quality of SMT output. We propose a recurrent neural network architecture which uses morpho-syntactic features and word embeddings as word representations within surface and syntactic n-grams. We test the proposed method on two language pairs and for two tasks, namely detecting fluency errors and predicting overall post-editing effort. Our results show that this method is effective for capturing all types of fluency errors at once. Moreover, on the task of predicting post-editing effort, while solely relying on monolingual information, it achieves on-par results with the state-of-the-art quality estimation systems which use both bilingual and monolingual information.


AI ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 261-273
Author(s):  
Mario Manzo ◽  
Simone Pellino

COVID-19 has been a great challenge for humanity since the year 2020. The whole world has made a huge effort to find an effective vaccine in order to save those not yet infected. The alternative solution is early diagnosis, carried out through real-time polymerase chain reaction (RT-PCR) tests or thorax Computer Tomography (CT) scan images. Deep learning algorithms, specifically convolutional neural networks, represent a methodology for image analysis. They optimize the classification design task, which is essential for an automatic approach with different types of images, including medical. In this paper, we adopt a pretrained deep convolutional neural network architecture in order to diagnose COVID-19 disease from CT images. Our idea is inspired by what the whole of humanity is achieving, as the set of multiple contributions is better than any single one for the fight against the pandemic. First, we adapt, and subsequently retrain for our assumption, some neural architectures that have been adopted in other application domains. Secondly, we combine the knowledge extracted from images by the neural architectures in an ensemble classification context. Our experimental phase is performed on a CT image dataset, and the results obtained show the effectiveness of the proposed approach with respect to the state-of-the-art competitors.


2020 ◽  
Vol 22 (1) ◽  
pp. 51-55
Author(s):  
Dawn Behrend

Poverty, Philanthropy and Social Conditions in Victorian Britain published by Adam Matthew Digital is comprised of primary digital materials culled from three major archives in Britain and the UK focused on the experience of poverty in Victorian Britain and efforts involving economic, government, and social reform such as the Poor Law, workhouses, settlement houses, and philanthropic initiatives. Content is derived from the National Archives at Kew, British Library, and Senate House Library and includes pamphlets, correspondence, newspaper clippings, books, and other resources. A small portion of the collection utilizes Adam Matthew Digital’s Handwritten Text Recognition (HTR) to enable keyword searching of handwritten documents. The digitized images and documents are clear, searchable, and user-friendly to access, save, and share. Contract provisions are standard to the product with authenticated access across institutional locations and guidelines for Interlibrary Loan sharing. Pricing is determined by institutional size and enrollment. While the product is a one-time purchase, annual hosting fees apply for ongoing access. Content is currently heavily derived from one archive, the Senate House Library, with pamphlets from this source making up nearly half of the total holdings. Users seeking access to a more extensive collection of similar material may prefer subscribing to JSTOR which includes JSTOR 19th Century British Pamphlets with over 26,000 pamphlets along with secondary scholarly journals and eBooks on the Victorian era. While not providing the primary sources of Poverty, Philanthropy and Social Conditions in Victorian Britain or JSTOR, Historical Abstracts may be an alternative resource in providing access to notable scholarly resources on the period.


Sign in / Sign up

Export Citation Format

Share Document