Experimental simulation of an optical character recognition, speech output reading machine for the blind

Author(s):  
Rob Savoie ◽  
Pat Erickson

Around the world 285 million individuals are found to be visually challenged out of 7.4 billion populations found in a survey made by World Health Organization. These people face many problems but the major problem is reading. It is observed that they cannot read the text which is not written in braille. In the thought process of supporting them, here is a framework proposed for the visually challenged people which can perform content recognition and produce voice yield. This can assist the visually challenged people with reading any printed content and convey in speech output. A camera is utilized to capture the content from the printed content and the captured picture experiences progression of picture pre-preprocessing steps to get the content of the picture and expels the background. Characters are identified utilizing Tesseract-Optical Character recognition (OCR). The identified script is then changed into voice, utilizing open source speech synthesizer (TTS). Finally, the speech output is heard by the earphones.


Author(s):  
Zhang Yun-An ◽  
Pan Ziheng ◽  
Dui Hongyan ◽  
Bai Guanghan

Background: YOLOv3-Tesseract is widely used for the intelligent form recognition because it exhibits several attractive properties. It is important to improve the accuracy and efficiency of the optical character recognition. Methods: The YOLOv3 exhibits the classification advantages for the object detection. Tesseract can effectively recognize regular characters in the field of the optical character recognition. In this study, a YOLOv3 and Tesseract-based model of improved intelligent form recognition is proposed. Results: First, YOLOv3 is trained to detect the position of the text in the table and to subsequently segment text blocks. Second, Tesseract is used to individually detect separated text blocks and combine YOLOv3 and Tesseract to achieve the goal of table character recognition. Conclusion: Based on the Tianchi big data, experimental simulation is used to demonstrate the proposed method. The YOLOv3-Tesseract model is trained and tested to effectively accomplish the recognition task.


The following paper describes the design of a system which does text to speech generation for one of the regional language’s Kannada. The printed document of Kannada text is given as input to the system, the system then converts the document to an image format. Pre-processing is done to stabilize the intensity of the images and clear the artifacts. This process boosts the precision and interpretability of an image. Optical Character Recognition (OCR) is used to unsheathe the segmented characters from a particular image and are matched with the characters that have been stored in the dataset. Once the matched characters are extracted it is stored in a suitable format and then the TTS engine is deployed to convert the saved Kannada characters to a speech format. The obtained speech output corresponds to the characters which are collected after processing the input text.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 58-77
Author(s):  
Vitaly Kliatskine ◽  
Eugene Shchepin ◽  
Gunnar Thorvaldsen ◽  
Konstantin Zingerman ◽  
Valery Lazarev

In principle, printed source material should be made machine-readable with systems for Optical Character Recognition, rather than being typed once more. Offthe-shelf commercial OCR programs tend, however, to be inadequate for lists with a complex layout. The tax assessment lists that assess most nineteenth century farms in Norway, constitute one example among a series of valuable sources which can only be interpreted successfully with specially designed OCR software. This paper considers the problems involved in the recognition of material with a complex table structure, outlining a new algorithmic model based on ‘linked hierarchies’. Within the scope of this model, a variety of tables and layouts can be described and recognized. The ‘linked hierarchies’ model has been implemented in the ‘CRIPT’ OCR software system, which successfully reads tables with a complex structure from several different historical sources.


2020 ◽  
Vol 2020 (1) ◽  
pp. 78-81
Author(s):  
Simone Zini ◽  
Simone Bianco ◽  
Raimondo Schettini

Rain removal from pictures taken under bad weather conditions is a challenging task that aims to improve the overall quality and visibility of a scene. The enhanced images usually constitute the input for subsequent Computer Vision tasks such as detection and classification. In this paper, we present a Convolutional Neural Network, based on the Pix2Pix model, for rain streaks removal from images, with specific interest in evaluating the results of the processing operation with respect to the Optical Character Recognition (OCR) task. In particular, we present a way to generate a rainy version of the Street View Text Dataset (R-SVTD) for "text detection and recognition" evaluation in bad weather conditions. Experimental results on this dataset show that our model is able to outperform the state of the art in terms of two commonly used image quality metrics, and that it is capable to improve the performances of an OCR model to detect and recognise text in the wild.


2014 ◽  
Vol 6 (1) ◽  
pp. 36-39
Author(s):  
Kevin Purwito

This paper describes about one of the many extension of Optical Character Recognition (OCR), that is Optical Music Recognition (OMR). OMR is used to recognize musical sheets into digital format, such as MIDI or MusicXML. There are many musical symbols that usually used in musical sheets and therefore needs to be recognized by OMR, such as staff; treble, bass, alto and tenor clef; sharp, flat and natural; beams, staccato, staccatissimo, dynamic, tenuto, marcato, stopped note, harmonic and fermata; notes; rests; ties and slurs; and also mordent and turn. OMR usually has four main processes, namely Preprocessing, Music Symbol Recognition, Musical Notation Reconstruction and Final Representation Construction. Each of those four main processes uses different methods and algorithms and each of those processes still needs further development and research. There are already many application that uses OMR to date, but none gives the perfect result. Therefore, besides the development and research for each OMR process, there is also a need to a development and research for combined recognizer, that combines the results from different OMR application to increase the final result’s accuracy. Index Terms—Music, optical character recognition, optical music recognition, musical symbol, image processing, combined recognizer  


Sign in / Sign up

Export Citation Format

Share Document