Parametric Density Estimation for the Classification of Acoustic Feature Vectors in Speech Recognition

1998 ◽  
pp. 87-118 ◽  
Author(s):  
Sankar Basu ◽  
Charles A. Micchelli
2020 ◽  
Vol 8 (5) ◽  
pp. 3978-3983

Identification of a person’s speech by his lip movement is a challenging task. Even though many software tools available for recognition of speech to text and vice versa, some of the words uttered may not be accurate as spoken and may vary from person to person because of their pronunciation. In addition, in the noisy environment speech uttered may not perceive effectively hence there lip movement for a given speech varies. Lip reading has added advantages when it augmented with speech recognition, thus increasing the perceived information. In this paper, the video file of a individual person are converted to frames and extraction of only the lip contour for vowels is done by calculating its area and other geometrical aspects. Once this is done as a part of testing it is compared with three to four people’s lip contour for vowels for first 20 frames. The parameters such as mean, centroid will remain approximately same for all people irrespective of their lip movement but there is change in major and minor axis and hence area changes considerably. In audio domain vowel detection is carried out by extracting unique features of English vowel utterance using Mel Frequency Cepstrum Coefficients (MFCC) and the feature vectors that are orthonormalized to compare the normalized vectors with standard database and results are obtained with approximation.


Sign in / Sign up

Export Citation Format

Share Document