Text Dependent Speaker Identification and Speech Recognition Using Artificial Neural Network

Author(s):  
Suma Swamy ◽  
Shalini T. ◽  
Sindhu P. Nagabhushan ◽  
Sumaiah Nawaz ◽  
K. V. Ramakrishnan
2021 ◽  
pp. 83-92
Author(s):  
Shoeb Hussain ◽  
Ronaq Nazir ◽  
Urooj Javeed ◽  
Shoaib Khan ◽  
Rumaisa Sofi

Author(s):  
Anny Tandyo ◽  
Martono Martono ◽  
Adi Widyatmoko

Article discussed a speaker identification system. Which was a part of speaker recognition. The system identified asubject based on the voice from a group of pattern had been saved before. This system used a wavelet discrete transformationas a feature extraction method and an artificial neural network of back-propagation as a classification method. The voiceinput was processed by the wavelet discrete transformation in order to obtain signal coefficient of low frequency as adecomposition result which kept voice characteristic of everyone. The coefficient then was classified artificial neural networkof back-propagation. A system trial was conducted by collecting voice samples directly by using 225 microphones in nonsoundproof rooms; contained of 15 subjects (persons) and each of them had 15 voice samples. The 10 samples were used as atraining voice and 5 others as a testing voice. Identification accuracy rate reached 84 percent. The testing was also done onthe subjects who pronounced same words. It can be concluded that, the similar selection of words by different subjects has noinfluence on the accuracy rate produced by system.Keywords: speaker identification, wavelet discrete transformation, artificial neural network, back-propagation.


Author(s):  
SAWIT KASURIYA ◽  
CHAI WUTIWIWATCHAI ◽  
VARIN ACHARIYAKULPORN ◽  
CHULARAT TANPRASERT

This paper reports a comparative study between a continuous hidden Markov model (CHMM) and an artificial neural network (ANN) on a text dependent, closed set speaker identification (SID) system with Thai language recording in office and telephone environment. Thai isolated digit "0–9" and their concatenation are used as speaking text. Mel frequency cepstral coefficients (MFCC) are selected as the studied features. Two well-known recognition engines, CHMM and ANN, are conducted and compared. The ANN system (multilayer perceptron network with backpropagation learning algorithm) is applied with a special design of input feeding methods in avoiding the distortion from the normalization process. The general Gaussian density distribution HMM is developed for CHMM system. After optimizing some system's parameters by performing some preliminary experiments, CHMM gives the best identification rate at 90.4%, which is slightly better than 90.1% of ANN on digit "5" in office environment. For telephone environment, ANN gives the best identification rate at 88.84% on digit "0" which is higher than 81.1% of CHMM on digit "3". When using 3-concatenated digit, the identification rate of ANN and CHMM achieves 97.3% and 95.7% respectively for office environment, and 92.1% and 96.3% respectively for telephone environment.


2017 ◽  
Vol 7 (1) ◽  
pp. 48-57
Author(s):  
Cigdem Bakir

Currently, technological developments are accompanied by a number of associated problems. Security takes the first place amongst such problems. In particular, biometric systems such as authentication constitute a significant fraction of the security problem. This is because sound recordings having connection with various crimes are required to be analysed for forensic purposes. Authentication systems necessitate transmission, design and classification of biometric data in a secure manner. The aim of this study is to actualise an automatic voice and speech recognition system using wavelet transform, taking Turkish sound forms and properties into consideration. Approximately 3740 Turkish voice samples of words and clauses of differing lengths were collected from 25 males and 25 females. The features of these voice samples were obtained using Mel-frequency cepstral coefficients (MFCCs), Mel-frequency discrete wavelet coefficients (MFDWCs) and linear prediction cepstral coefficient (LPCC). Feature vectors of the voice samples obtained were trained with k-means, artificial neural network (ANN) and hybrid model. The hybrid model was formed by combining with k-means clustering and ANN. In the first phase of this model, k-means performed subsets obtained with voice feature vectors. In the second phase, a set of training and tests were formed from these sub-clusters. Thus, for being trained more suitable data by clustering increased the accuracy. In the test phase, the owner of a given voice sample was identified by taking the trained voice samples into consideration. The results and performance of the algorithms used for classification are also demonstrated in a comparative manner. Keywords: Speech recognition, hybrid model, k-means, artificial neural network (ANN).


Author(s):  
Lam D. Pham ◽  
Hieu M. Nguyen ◽  
Du N. N. T. Nguyen ◽  
Trang Hoang

Artificial Neural Network (ANN) is promoted to one of major schemes applied in pattern recognition area. Indeed, many approaches to software-based platforms have proven great performance of ANN. However, developing pattern recognition systems integrating ANN hardware-based architecture has been limited not only by the silicon requirements such as frequency, area, power, or resource but also by high accuracy and real-time applications strictly. Although a considerable number of ANN hardware-based architectures have been proposed currently, they have experienced a deprivation of functions due to both small configurations and ability of reconfiguration. Consequently, achieving an effective ANN hardware-based architecture so as to adapt to not only strict accuracy, enormous configures, or silicon area but also real-time criterion in pattern recognition systems has been really challenged. To tackle these issues, this work has proposed a dynamic structure of three-layer ANN architecture being able to reconfigure for adapting to various real-time applications. What is more, a complete SOPC system integrating proposed ANN hardware has also implemented to apply Vietnamese speech recognition automatically to confirm high recognition probability around 95.2 % towards 20 Vietnamese discrete words. Moreover, experiment results on such ASIC-based architecture have witnessed maximum frequency at 250 MHz on 130nm technology as well as great ability of reconfiguration.


Sign in / Sign up

Export Citation Format

Share Document