scholarly journals A Review on Optical Character Recognition and Text to Speech Conversion

2016 ◽  
Vol 5 (6) ◽  
pp. 1964-1970

In the modern era of image processing, recognizing content or information from an image is process of electronic conversion into machine encoded text. Advanced systems that are capable of producing high accuracy for multi-font recognition are now becoming commonplace, and with the support of digital consent formatting. Some programs are able to retrieve formats that are very close to the original page including images, columns, and other non-text items. Proposed system is able to recognize text from an image and convert it into editable text along with speech conversion. System uses Correlation model for OCR (Optical Character Recognition) and Speech Synthesis for TTS (Text To Speech) conversion. Correlation is a measurement of the similarities between two similar objects such as the predefined alphabets and recognizing a combination of those alphabets from an image. Speech synthesis is an artificial expression of human speech. The computer program that has been used this feature is called a speech computer as well as speech synthesizer that can be implemented on the basis of software or hardware primitives. The text-to-speech system (TTS) converts a standard language text into a speech; some programs provide figurative language presentations such as typed text in speech. System is capable enough to acquire high level of accuracy with less false recognition. It is required to built an effective text scanner that can recognize text from an image with less error rate. System has been implemented in MATLAB and various pre-processing filters have been applied for better enhancement and extraction. Hand written text can also be recognized with an effective manner.


2020 ◽  
pp. 205-208
Author(s):  
Sowmya R ◽  
Sushma S Jagtap ◽  
Gnanamoorthy Kasthuri

Assistive technology uses assistive, adaptive and rehabilitative devices for people with disabilities. It’s assessed there are about 36 million people with visual impairment in the world and a further 216 million who lead life with moderate to severe visual impairments. Leveraging technology has helped the visually challenged in carrying out tasks on par with the people blessed with vision particularly in the activities of reading and writing. In the proposed work, an image scanning device attached to a microcontroller is designed. This device is designed in the form of hand gloves for ease of usage. The glove with the camera at the fingertip, when rolled over lines of text, scans the information and converts it into digital text with Optical Character Recognition (OCR). The converted digital text is finally read aloud using Text-to-speech synthesis. The results obtained were accurate and met the standards of operability.


Author(s):  
Anitha D B ◽  
Jyothi T M ◽  
Pooja R ◽  
Sahana N

The objective of this paper is to presents new design on assistive smart glasses for visually impaired. The objective is to assist in multiple daily tasks using the advantage of wearable design format. The proposed method is a camera based assistive text reading to help to blind in person in reading the text present on the text labels, printed notes and products in their own respective languages. It combines the concept of Optical Character Recognition (OCR), text to Speech Synthesizer (TTS) and translator in Raspberry pi. Optical character recognition (OCR) is the identification of printed characters using photoelectric devices and computer software. It converts images of typed, handwritten or printed text into machine encoded text from scanned document or from subtitle text superimposed on an image. Text-to-Speech conversion is a method that scans and reads any language letters and numbers that are in the image using OCR technique and then translates it into any desired language and at last it gives audio output of the translated text. The audio output is heard through the raspberry pi's audio jack using speakers or earphones.


Author(s):  
Shailendra Singh

The present paper has introduced an innovative and efficient technique that enables user to hear the contents of text images instead of reading through them. In the current world, there is a great increase in the utilization of digital technology and multiple methods are available for the people to capture images. such images may contain important textual content that the user may need to edit or store digitally. It merges the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS). This can be done using Optical Character Recognition with the use of Tesseract OCR Engine. OCR is a branch of AI that is used in applications to recognize text from scanned documents or images. The analyzed text can also be converted to audio format to help visually impaired people hear the content that they wish to know. Text-to-Speech conversion is a method that scans and reads alphabets and numbers that are in the image using OCR technique and convert it into voices. The aim is to study and compare the multiple methods used for STT conversions and to figure out the most efficient technique that can be adapted for the conversion processes. As a result, based on review study it is found that HMM is a statistical model which is most suitable for TTS conversions.


The following paper describes the design of a system which does text to speech generation for one of the regional language’s Kannada. The printed document of Kannada text is given as input to the system, the system then converts the document to an image format. Pre-processing is done to stabilize the intensity of the images and clear the artifacts. This process boosts the precision and interpretability of an image. Optical Character Recognition (OCR) is used to unsheathe the segmented characters from a particular image and are matched with the characters that have been stored in the dataset. Once the matched characters are extracted it is stored in a suitable format and then the TTS engine is deployed to convert the saved Kannada characters to a speech format. The obtained speech output corresponds to the characters which are collected after processing the input text.


Sign in / Sign up

Export Citation Format

Share Document