scholarly journals Szerzőazonosítás Jacob és Wilhelm Grimm zajos, digitalizált levelezésében

Author(s):  
Greta Franzini ◽  
Mike Kestemont ◽  
Gabriela Rotari ◽  
Melina Jander ◽  
Jeremi K. Ochab ◽  
...  

Az alábbi cikk egy multidiszciplináris projekt eredményeit mutatja be, amely a különböző digitalizációs stratégiák számítógépes szöveganalízisben való használhatóságát járja körül. Pontosabban Jacob és Wilhelm Grimm szerzőségének automatizált megkülönböztetésére tettünk kísérletet, melyet egy HTR (HandwrittenText Recognition – kézzel írott szöveg felismerése) és OCR (Optical Character Recognition – optikai karakterfelismerés) által feldolgozott levelezéskorpuszban hajtottunk végre, korrekció nélkül – felmérve, hogy az így keletkezett zaj milyen hatással van a fivérek különböző írásmódjának azonosítására. Összegezve,úgy tűnik, hogy az OCR megbízható helyettesítője lehet a manuális átírásnak, legalábbis a szerzőazonosítás kérdéskörét illetően. Eredményeink továbbá abba az irányba mutatnak, miszerint még a különböző digitalizációs eljárásokból származó tanító- és tesztkorpuszok (training and test set) is használhatók a szerzőazonosítás során. A HTR-t tekintve a kutatás azt demonstrálja, hogy ez az automatizált átírás ugyan az OCR-hez képest szignifikánsan növeli a szövegek félrecsoportosításának veszélyét, ám körülbelül 20% feletti tisztaság már önmagában elegendő ahhoz, hogy a véletlennél nagyobb esélye legyen a helyes binárismegfeleltetésnek.

Author(s):  
Oyeniran Oluwashina Akinloye ◽  
Oyebode Ebenezer Olukunle

Numerous works have been proposed and implemented in computerization of various human languages, nevertheless, miniscule effort have also been made so as to put Yorùbá Handwritten Character on the map of Optical Character Recognition. This study presents a novel technique in the development of Yorùbá alphabets recognition system through the use of deep learning. The developed model was implemented on Matlab R2018a environment using the developed framework where 10,500 samples of dataset were for training and 2100 samples were used for testing. The training of the developed model was conducted using 30 Epoch, at 164 iteration per epoch while the total iteration is 4920 iterations. Also, the training period was estimated to 11296 minutes 41 seconds. The model yielded the network accuracy of 100% while the accuracy of the test set is 97.97%, with F1 score of 0.9800, Precision of 0.9803 and Recall value of 0.9797.


2014 ◽  
Vol 14 (2) ◽  
pp. 114-126 ◽  
Author(s):  
Marjan Kuchaki Rafsanjani ◽  
Masoud Pourshaban

Abstract In this paper we implement six different learning algorithms in Optical Character Recognition (OCR) problem and achieve the criteria of end-time, number of iterations, train-set performance, test-set performance, validate-set performance and overall performance of these methods and compare them. Finally, we show the advantages and disadvantages of each method.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


2020 ◽  
Vol 2020 (1) ◽  
pp. 78-81
Author(s):  
Simone Zini ◽  
Simone Bianco ◽  
Raimondo Schettini

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.


2014 ◽  
Vol 6 (1) ◽  
pp. 36-39
Author(s):  
Kevin Purwito

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer  


Sign in / Sign up

Export Citation Format

Share Document