Smart Reader for Visually Challenged Using Optical Character Recognition and Text-To-Speech

2020 ◽  
pp. 205-208
Author(s):  
Sowmya R ◽  
Sushma S Jagtap ◽  
Gnanamoorthy Kasthuri

Assistive technology uses assistive, adaptive and rehabilitative devices for people with disabilities. It’s assessed there are about 36 million people with visual impairment in the world and a further 216 million who lead life with moderate to severe visual impairments. Leveraging technology has helped the visually challenged in carrying out tasks on par with the people blessed with vision particularly in the activities of reading and writing. In the proposed work, an image scanning device attached to a microcontroller is designed. This device is designed in the form of hand gloves for ease of usage. The glove with the camera at the fingertip, when rolled over lines of text, scans the information and converts it into digital text with Optical Character Recognition (OCR). The converted digital text is finally read aloud using Text-to-speech synthesis. The results obtained were accurate and met the standards of operability.

2018 ◽  
Vol 7 (3.34) ◽  
pp. 65 ◽  
Author(s):  
S Thiyagarajan ◽  
Dr G.Saravana Kumar ◽  
E Praveen Kumar ◽  
G Sakana

Blind people are unable to perform visual tasks. The majority of published printed works does not include Braille or audio versions, and digital versions are still a minority. In this project, the technology of optical character recognition (OCR) enables the recognition of texts from image data. The system is constituted by the raspberry pi, HD camera and Bluetooth headset. This technology has been widely used in scanned or photographed documents, converting them into electronic copies. The technology of speech synthesis (TTS) enables a text in digital format to be synthesized into human voice and played through an audio system. The objective of the TTS is the automatic conversion of sentences, without restrictions, into spoken discourse in a natural language, resembling the spoken form of the same text, by a native speaker of the language.  


In the modern era of image processing, recognizing content or information from an image is process of electronic conversion into machine encoded text. Advanced systems that are capable of producing high accuracy for multi-font recognition are now becoming commonplace, and with the support of digital consent formatting. Some programs are able to retrieve formats that are very close to the original page including images, columns, and other non-text items. Proposed system is able to recognize text from an image and convert it into editable text along with speech conversion. System uses Correlation model for OCR (Optical Character Recognition) and Speech Synthesis for TTS (Text To Speech) conversion. Correlation is a measurement of the similarities between two similar objects such as the predefined alphabets and recognizing a combination of those alphabets from an image. Speech synthesis is an artificial expression of human speech. The computer program that has been used this feature is called a speech computer as well as speech synthesizer that can be implemented on the basis of software or hardware primitives. The text-to-speech system (TTS) converts a standard language text into a speech; some programs provide figurative language presentations such as typed text in speech. System is capable enough to acquire high level of accuracy with less false recognition. It is required to built an effective text scanner that can recognize text from an image with less error rate. System has been implemented in MATLAB and various pre-processing filters have been applied for better enhancement and extraction. Hand written text can also be recognized with an effective manner.


Author(s):  
Anitha D B ◽  
Jyothi T M ◽  
Pooja R ◽  
Sahana N

The objective of this paper is to presents new design on assistive smart glasses for visually impaired. The objective is to assist in multiple daily tasks using the advantage of wearable design format. The proposed method is a camera based assistive text reading to help to blind in person in reading the text present on the text labels, printed notes and products in their own respective languages. It combines the concept of Optical Character Recognition (OCR), text to Speech Synthesizer (TTS) and translator in Raspberry pi. Optical character recognition (OCR) is the identification of printed characters using photoelectric devices and computer software. It converts images of typed, handwritten or printed text into machine encoded text from scanned document or from subtitle text superimposed on an image. Text-to-Speech conversion is a method that scans and reads any language letters and numbers that are in the image using OCR technique and then translates it into any desired language and at last it gives audio output of the translated text. The audio output is heard through the raspberry pi's audio jack using speakers or earphones.


Author(s):  
Shailendra Singh

The present paper has introduced an innovative and efficient technique that enables user to hear the contents of text images instead of reading through them. In the current world, there is a great increase in the utilization of digital technology and multiple methods are available for the people to capture images. such images may contain important textual content that the user may need to edit or store digitally. It merges the concept of Optical Character Recognition (OCR) and Text to Speech Synthesizer (TTS). This can be done using Optical Character Recognition with the use of Tesseract OCR Engine. OCR is a branch of AI that is used in applications to recognize text from scanned documents or images. The analyzed text can also be converted to audio format to help visually impaired people hear the content that they wish to know. Text-to-Speech conversion is a method that scans and reads alphabets and numbers that are in the image using OCR technique and convert it into voices. The aim is to study and compare the multiple methods used for STT conversions and to figure out the most efficient technique that can be adapted for the conversion processes. As a result, based on review study it is found that HMM is a statistical model which is most suitable for TTS conversions.


Author(s):  
Minerva Sarma ◽  
Anuskha Kumar ◽  
Aditi Joshi ◽  
Suraj Kumar Nayak ◽  
Biswajeet Champaty

In this chapter, a low-cost, efficient, and real-time wearable text-to-speech scanner has been proposed that can enable blind persons to hear the contents of a text material. The device captures the images of the text and converts them to speech. The hardware of the device has been realized using Raspberry Pi 3, Pi camera, and an earphone. Optical character recognition (OCR) and text-to-speech synthesis (TTS) have been implemented using Raspberry Pi 3 to accomplish the working of the device. OCR technology converted the captured text images to editable text, whereas the TTS technology scanned the alphanumeric characters in the processed image and converted them to speech. The proposed technology imitates the ability of the human sensory organs and the nervous system, where the camera mimics human eye and the image processing in Raspberry Pi 3 substitutes the human brain. This proposed device can also help people suffering from diseases like dyslexia and nyctalopia, and inability to see in dim light or at night.


2020 ◽  
Vol 13 (1) ◽  
pp. 1-17
Author(s):  
Traian Rebedea ◽  
Vlad Florea

This paper proposes a deep learning solution for optical character recognition, specifically tuned to detect expiration dates that are printed on the packaging of food items. This method can be used to reduce food waste, having a significant impact on the design of smart refrigerators and can prove especially useful for persons with vision difficulties, by combining it with a speech synthesis engine. The main problem in designing an efficient solution for expiry date recognition is the lack of a large enough dataset to train deep neural networks. To tackle this issue, we propose to use an additional dataset composed of synthetically generated images. Both the synthetic and real image datasets are detailed in the paper and we show that the proposed method offers a 9.4% accuracy improvement over using real images alone.


Around the world 285 million individuals are found to be visually challenged out of 7.4 billion populations found in a survey made by World Health Organization. These people face many problems but the major problem is reading. It is observed that they cannot read the text which is not written in braille. In the thought process of supporting them, here is a framework proposed for the visually challenged people which can perform content recognition and produce voice yield. This can assist the visually challenged people with reading any printed content and convey in speech output. A camera is utilized to capture the content from the printed content and the captured picture experiences progression of picture pre-preprocessing steps to get the content of the picture and expels the background. Characters are identified utilizing Tesseract-Optical Character recognition (OCR). The identified script is then changed into voice, utilizing open source speech synthesizer (TTS). Finally, the speech output is heard by the earphones.


Sign in / Sign up

Export Citation Format

Share Document