Segmental and Supra Segmental Feature Based Speech Recognition System for Under Resourced Languages

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.

Download Full-text

Feature-based noise robust speech recognition on an Indonesian language automatic speech recognition system

2014 International Conference on Electrical Engineering and Computer Science (ICEECS) ◽

10.1109/iceecs.2014.7045217 ◽

2014 ◽

Author(s):

Cil Hardianto Satriawan ◽

Dessi Puji Lestari

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Recognition System ◽

Robust Speech Recognition ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Feature Based ◽

Noise Robust Speech Recognition ◽

Noise Robust

Download Full-text

Phonological feature-based speech recognition system for pronunciation training in non-native language learning

The Journal of the Acoustical Society of America ◽

10.1121/1.5017834 ◽

2018 ◽

Vol 143 (1) ◽

pp. 98-108 ◽

Cited By ~ 7

Author(s):

Vipul Arora ◽

Aditi Lahiri ◽

Henning Reetz

Keyword(s):

Speech Recognition ◽

Language Learning ◽

Native Language ◽

Recognition System ◽

Speech Recognition System ◽

Phonological Feature ◽

Feature Based

Download Full-text

An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition

Mathematical Problems in Engineering ◽

10.1155/2014/898729 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 4

Author(s):

Ing-Jr Ding ◽

Yen-Ming Hsu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Template Matching ◽

Dynamic Time Warping ◽

Recognition System ◽

Home Automation ◽

Speech Recognition System ◽

Time Warping ◽

Feature Based ◽

Dynamic Time

In the past, the kernel of automatic speech recognition (ASR) is dynamic time warping (DTW), which is feature-based template matching and belongs to the category technique of dynamic programming (DP). Although DTW is an early developed ASR technique, DTW has been popular in lots of applications. DTW is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligent speech recognition system using an improved DTW approach for multimedia and home automation services. The improved DTW presented in this work, called HMM-like DTW, is essentially a hidden Markov model- (HMM-) like method where the concept of the typical HMM statistical model is brought into the design of DTW. The developed HMM-like DTW method, transforming feature-based DTW recognition into model-based DTW recognition, will be able to behave as the HMM recognition technique and therefore proposed HMM-like DTW with the HMM-like recognition model will have the capability to further perform model adaptation (also known as speaker adaptation). A series of experimental results in home automation-based multimedia access service environments demonstrated the superiority and effectiveness of the developed smart speech recognition system by HMM-like DTW.

Download Full-text

A feature-based hierarchical speech recognition system for Hindi

Sadhana ◽

10.1007/bf02745745 ◽

1998 ◽

Vol 23 (4) ◽

pp. 313-340 ◽

Cited By ~ 7

Author(s):

K Samudravijaya ◽

R Ahuja ◽

N Bondale ◽

T Jose ◽

S Krishnan ◽

...

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Feature Based

Download Full-text

Development of HMM/Neural Network‐Based Medium‐Vocabulary Isolated‐Word Lithuanian Speech Recognition System

Informatica ◽

10.15388/informatica.2004.073 ◽

2004 ◽

Vol 15 (4) ◽

pp. 465-474 ◽

Cited By ~ 1

Author(s):

Mark Filipovič ◽

Antanas Lipeika

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Isolated Word

Download Full-text

Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN)

Telekontran : Jurnal Ilmiah Telekomunikasi, Kendali dan Elektronika Terapan ◽

10.34010/telekontran.v8i1.3078 ◽

2020 ◽

Vol 8 (1) ◽

pp. 57-73

Author(s):

Lery Sakti Ramba

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Background Noise ◽

Electronic Devices ◽

Recognition System ◽

Background Intensity ◽

Automation System ◽

Home Automation ◽

Speech Recognition System ◽

Home Automation System

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.

Download Full-text