Computational Linguistics‐Based Tamil Character Recognition System for Text to Speech Conversion

AbstrakProsedur penggunaan aplikasi text to speech pada perangkat mobile yang ada umumnya saat ini yakni pengguna aplikasi ini harus menginput manual kata yang akan diaktualisasikan dengan suara. Pada penelitian ini, dirancang sebuah sistem input kata pada aplikasi text to speech dengan memanfaatkan pengolahan citra digital. Pengguna cukup mengambil gambar (capture) kata yang akan disuarakan tersebut tanpa harus mengetik manual pada area teks input. Metode yang digunakan dalam sistem ini meliputi akuisisi citra, pra pengolahan citra, segmentasi karakter, pengenalan karakter, dan integrasi dengan engine text to speech pada perangkat Android. Akuisisi citra dilakukan menggunakan kamera pada perangkat mobile untuk mengambil gambar kata yang akan diinputkan. Pengenalan karakter menggunakan jaringan saraf tiruan (JST) algoritma perambatan balik (back propagation). Sistem pengolahan citra yang berhasil dibuat kemudian dihubungkan dengan engine Google Text to Speech. Sistem pengenalan karakter pada penelitian ini menggunakan model jaringan syaraf tiruan (JST) dengan akurasi 97,58%. Sistem ini mampu mengenali beberapa tipe font yakni Arial, Calibri, dan Verdana. Rerata akurasi pengenalan pada sampel uji yang digunakan di dalam penelitian ini sebesar 94,7% dengan kondisi jarak pengambilan gambar pada rentang jarak 3 – 8 cm dan posisi kamera tegak lurus menghadap kertas tulisan. Kata kunci— Android, OCR, Back Propagation, OpenCV, Text to Speech Abstract Procedures using text to speech application on a mobile device generally at this time is user must manually enter the word to be actualized in speech. In this study, designed a words input system for text to speech application using digital image processing. This system makes users simply to do the words capturing that will be voiced without manually typing in the text area input.The method used in this system includes image acquisition, image pre-processing, character segmentation, character recognition, and integration with text to speech engine on mobile devices. Image acquisition was performed using the camera on a mobile device to capture the word to be entered. Character recognition using back propagation algorithm. Image processing system successfully created and then integrated with Google Text to Speech engine.Character recognition system in this study using a model of neural networks (ANN) with an accuracy of 97.58%. The system is able to recognize some types of font that is Arial, Calibri, and Verdana. The mean recognition accuracy on the test sample used in this study 94.7% with distance shooting conditions within the range 3-8 cm and the camera upright position facing the letter. Keywords— Android, OCR, Back Propagation, OpenCV, Text to Speech

Download Full-text

A Study of Different Methodologies Helpful in the Identification of Offline Handwritten Script

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i6.287 ◽

2018 ◽

Vol 6 (6) ◽

pp. 307

Author(s):

Manish M. Kayasth ◽

Bharat C. Patel

Keyword(s):

Feature Extraction ◽

Character Recognition ◽

Recognition Rate ◽

Recognition System ◽

Post Processing ◽

Classification Technique ◽

Scanned Image ◽

Gujarati Language ◽

High Degree ◽

Selection Of

The entire character recognition system is logically characterized into different sections like Scanning, Pre-processing, Classification, Processing, and Post-processing. In the targeted system, the scanned image is first passed through pre-processing modules then feature extraction, classification in order to achieve a high recognition rate. This paper describes mainly on Feature extraction and Classification technique. These are the methodologies which play an important role to identify offline handwritten characters specifically in Gujarati language. Feature extraction provides methods with the help of which characters can identify uniquely and with high degree of accuracy. Feature extraction helps to find the shape contained in the pattern. Several techniques are available for feature extraction and classification, however the selection of an appropriate technique based on its input decides the degree of accuracy of recognition.

Download Full-text

Offline Ancient Tamil Character Recognition System Based On Structural Features

i-manager s Journal on Communication Engineering and Systems ◽

10.26634/jcs.1.3.1891 ◽

2012 ◽

Vol 1 (3) ◽

pp. 17-24

Author(s):

S. Rajakumar ◽

V. Subbiah Bharathi

Keyword(s):

Character Recognition ◽

Recognition System ◽

Structural Features

Download Full-text

Handwritten Balinesse Character Recognition using K-Nearest Neighbor

10.31227/osf.io/z6m8u ◽

2018 ◽

Author(s):

I Wayan Agus Surya Darma

Keyword(s):

Feature Extraction ◽

Success Rate ◽

Character Recognition ◽

Nearest Neighbor ◽

Recognition System ◽

Extraction Process ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Character Feature

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text