The influence of head and body postures on the acoustic speech signal

2016 ◽  
Vol 23 (1) ◽  
pp. 141-146 ◽  
Author(s):  
Yvonne Flory
2009 ◽  
pp. 1-38 ◽  
Author(s):  
Derek J. Shiell ◽  
Louis H. Terry ◽  
Petar S. Aleksic ◽  
Aggelos K. Katsaggelos

The information imbedded in the visual dynamics of speech has the potential to improve the performance of speech and speaker recognition systems. The information carried in the visual speech signal compliments the information in the acoustic speech signal, which is particularly beneficial in adverse acoustic environments. Non-invasive methods using low-cost sensors can be used to obtain acoustic and visual biometric signals, such as a person’s voice and lip movement, with little user cooperation. These types of unobtrusive biometric systems are warranted to promote widespread adoption of biometric technology in today’s society. In this chapter, the authors describe the main components and theory of audio-visual and visual-only speech and speaker recognition systems. Audio-visual corpora are described and a number of speech and speaker recognition systems are reviewed. Finally, various open issues about the system design and implementation, and present future research and development directions in this area are discussed.


1968 ◽  
Vol 44 (4) ◽  
pp. 993-1001 ◽  
Author(s):  
Michael H. L. Hecker ◽  
Kenneth N. Stevens ◽  
Gottfried von Bismarck ◽  
Carl E. Williams

Author(s):  
Dea Sifana Ramadhina ◽  
Rita Magdalena ◽  
Sofia Saidah

Voice is one of the parameters in the identification process of a person. Through the voice, information will be obtained such as gender, age, and even the identity of the speaker. Speaker recognition is a method to narrow down crimes and frauds committed by voice. So that it will minimize the occurrence of faking one's identity. The Method of Mel Frequency Cepstrum Coefficient (MFCC) can be used in the speech recognition system. The process of feature extraction of speech signal using MFCC will produce acoustic speech signal. The classification, Hidden Markov Models (HMM) is used to match unidentified speaker’s voice with the voices in database. In this research, the system is used to verify the speaker, namely 15 text dependent in Indonesian. On testing the speaker with the same as database, the highest accuracy is 99,16%.


1994 ◽  
Vol 37 (1) ◽  
pp. 53-63 ◽  
Author(s):  
Jeannette D. Hoit ◽  
Steven A. Shea ◽  
Robert B. Banzett

This investigation provides the first detailed description of speech production during mechanical ventilation. Seven adults with tracheostomies served as subjects. Recordings were made of chest wall motions, neck muscle activity, tracheal pressure, air flow at the nose and mouth, estimated blood-gas levels, and the acoustic speech signal during performance of a variety of speech tasks. Results indicated that subjects spoke for short durations that spanned all phases of the ventilator cycle, altered laryngeal opposing pressures in response to the continually changing tracheal pressure wave, and expended relatively small volumes of gas for speech production. Speech was improved by making selected ventilator adjustments. Suggestions for clinical interventions are offered.


Author(s):  
Larisa A. Glushchenko ◽  
Alexander M. Korzun ◽  
Victor I. Tupota ◽  
Vadim Ya. Krohalev

2012 ◽  
Vol 19 (3) ◽  
pp. 87-97 ◽  
Author(s):  
Susan Nittrouer

Children with a variety of language-related problems, including dyslexia, experience difficulty processing the acoustic speech signal, leading to proposals of diagnostic entities known as auditory processing deficits. Although descriptions of these deficits vary across accounts, most hinge on the idea that problems arise at the level of detecting and/or discriminating sensory inputs. In this article, the author re-examines that idea and proposes that the difficulty more likely arises in how those sensations get organized into service for auditory comprehension of language.


Author(s):  
Carol A. Fowler

A theory of speech production provides an account of the means by which a planned sequence of language forms is implemented as vocal tract activity that gives rise to an audible, intelligible acoustic speech signal. Such an account must address several issues. Two central issues are considered in this article. One issue concerns the nature of language forms that ostensibly compose plans for utterances. Because of their role in making linguistic messages public, a straightforward idea is that language forms are themselves the public behaviors in which members of a language community engage when talking. By most accounts, however, the relation of phonological segments to actions of the vocal tract is not one of identity. Rather, phonological segments are mental categories with featural attributes. Another issue concerns what, at various levels of description, the talker aims to achieve. This article focuses on speech production, and considers language forms and plans for speaking, along with speakers' goals as acoustic targets or vocal tract gestures, the DIVA theory of speech production, the task dynamic model, coarticulation, and prosody.


Sign in / Sign up

Export Citation Format

Share Document