The influence of head and body postures on the acoustic speech signal

The information imbedded in the visual dynamics of speech has the potential to improve the performance of speech and speaker recognition systems. The information carried in the visual speech signal compliments the information in the acoustic speech signal, which is particularly beneficial in adverse acoustic environments. Non-invasive methods using low-cost sensors can be used to obtain acoustic and visual biometric signals, such as a person’s voice and lip movement, with little user cooperation. These types of unobtrusive biometric systems are warranted to promote widespread adoption of biometric technology in today’s society. In this chapter, the authors describe the main components and theory of audio-visual and visual-only speech and speaker recognition systems. Audio-visual corpora are described and a number of speech and speaker recognition systems are reviewed. Finally, various open issues about the system design and implementation, and present future research and development directions in this area are discussed.

Download Full-text

Manifestations of Task‐Induced Stress in the Acoustic Speech Signal

The Journal of the Acoustical Society of America ◽

10.1121/1.1911241 ◽

1968 ◽

Vol 44 (4) ◽

pp. 993-1001 ◽

Cited By ~ 52

Author(s):

Michael H. L. Hecker ◽

Kenneth N. Stevens ◽

Gottfried von Bismarck ◽

Carl E. Williams

Keyword(s):

Speech Signal ◽

Induced Stress ◽

Acoustic Speech Signal

Download Full-text

Individual Identification Through Voice Using Mel-Frequency Cepstrum Coefficient (MFCC) and Hidden Markov Models (HMM) Method

Journal of Measurements Electronics Communications and Systems ◽

10.25124/jmecs.v7i1.3553 ◽

2020 ◽

Vol 7 (1) ◽

pp. 26

Author(s):

Dea Sifana Ramadhina ◽

Rita Magdalena ◽

Sofia Saidah

Keyword(s):

Hidden Markov Models ◽

Speaker Recognition ◽

Speech Signal ◽

Markov Models ◽

Hidden Markov ◽

Recognition System ◽

Individual Identification ◽

Acoustic Speech Signal ◽

Mel Frequency Cepstrum Coefficient ◽

The Voice

Voice is one of the parameters in the identification process of a person. Through the voice, information will be obtained such as gender, age, and even the identity of the speaker. Speaker recognition is a method to narrow down crimes and frauds committed by voice. So that it will minimize the occurrence of faking one's identity. The Method of Mel Frequency Cepstrum Coefficient (MFCC) can be used in the speech recognition system. The process of feature extraction of speech signal using MFCC will produce acoustic speech signal. The classification, Hidden Markov Models (HMM) is used to match unidentified speaker’s voice with the voices in database. In this research, the system is used to verify the speaker, namely 15 text dependent in Indonesian. On testing the speaker with the same as database, the highest accuracy is 99,16%.

Download Full-text

Bimodal classification of English allophones employing acoustic speech signal and facial motion capture

The Journal of the Acoustical Society of America ◽

10.1121/1.5067951 ◽

2018 ◽

Vol 144 (3) ◽

pp. 1801-1802

Author(s):

Andrzej Czyzewski ◽

Szymon Zaporowski ◽

Bozena Kostek

Keyword(s):

Motion Capture ◽

Speech Signal ◽

Facial Motion ◽

Acoustic Speech Signal

Download Full-text

Speech Production During Mechanical Ventilation in Tracheostomized Individuals

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3701.53 ◽

1994 ◽

Vol 37 (1) ◽

pp. 53-63 ◽

Cited By ~ 23

Author(s):

Jeannette D. Hoit ◽

Steven A. Shea ◽

Robert B. Banzett

Keyword(s):

Mechanical Ventilation ◽

Speech Production ◽

Chest Wall ◽

Muscle Activity ◽

Speech Signal ◽

Pressure Wave ◽

Neck Muscle ◽

Blood Gas ◽

Tracheal Pressure ◽

Acoustic Speech Signal

This investigation provides the first detailed description of speech production during mechanical ventilation. Seven adults with tracheostomies served as subjects. Recordings were made of chest wall motions, neck muscle activity, tracheal pressure, air flow at the nose and mouth, estimated blood-gas levels, and the acoustic speech signal during performance of a variety of speech tasks. Results indicated that subjects spoke for short durations that spanned all phases of the ventilator cycle, altered laryngeal opposing pressures in response to the continually changing tracheal pressure wave, and expended relatively small volumes of gas for speech production. Speech was improved by making selected ventilator adjustments. Suggestions for clinical interventions are offered.

Download Full-text

Possibility to extract information on an acoustic speech signal from reflected laser radiation

2015 Days on Diffraction (DD) ◽

10.1109/dd.2015.7354841 ◽

2015 ◽

Author(s):

Larisa A. Glushchenko ◽

Alexander M. Korzun ◽

Victor I. Tupota ◽

Vadim Ya. Krohalev

Keyword(s):

Laser Radiation ◽

Speech Signal ◽

Acoustic Speech Signal ◽

Extract Information

Download Full-text

Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2011.2104955 ◽

2011 ◽

Vol 19 (7) ◽

pp. 1975-1985 ◽

Cited By ~ 27

Author(s):

Gil Dobry ◽

Ron M. Hecht ◽

Mireille Avigal ◽

Yaniv Zigel

Keyword(s):

Dimension Reduction ◽

Speech Signal ◽

Age Estimation ◽

Acoustic Speech Signal

Download Full-text

A New Perspective on Developmental Language Problems: Perceptual Organization Deficits

Perspectives on Language Learning and Education ◽

10.1044/lle19.3.87 ◽

2012 ◽

Vol 19 (3) ◽

pp. 87-97 ◽

Cited By ~ 3

Author(s):

Susan Nittrouer

Keyword(s):

Perceptual Organization ◽

Auditory Processing ◽

Speech Signal ◽

Auditory Comprehension ◽

Sensory Inputs ◽

Language Problems ◽

Acoustic Speech Signal ◽

Processing Deficits ◽

New Perspective ◽

Experience Difficulty

Children with a variety of language-related problems, including dyslexia, experience difficulty processing the acoustic speech signal, leading to proposals of diagnostic entities known as auditory processing deficits. Although descriptions of these deficits vary across accounts, most hinge on the idea that problems arise at the level of detecting and/or discriminating sensory inputs. In this article, the author re-examines that idea and proposes that the difficulty more likely arises in how those sensations get organized into service for auditory comprehension of language.

Download Full-text

Speech Production

The Oxford Handbook of Psycholinguistics ◽

10.1093/oxfordhb/9780198568971.013.0029 ◽

2007 ◽

pp. 488-502

Author(s):

Carol A. Fowler

Keyword(s):

Dynamic Model ◽

Speech Production ◽

Speech Signal ◽

Vocal Tract ◽

Language Community ◽

The Public ◽

Acoustic Speech Signal

A theory of speech production provides an account of the means by which a planned sequence of language forms is implemented as vocal tract activity that gives rise to an audible, intelligible acoustic speech signal. Such an account must address several issues. Two central issues are considered in this article. One issue concerns the nature of language forms that ostensibly compose plans for utterances. Because of their role in making linguistic messages public, a straightforward idea is that language forms are themselves the public behaviors in which members of a language community engage when talking. By most accounts, however, the relation of phonological segments to actions of the vocal tract is not one of identity. Rather, phonological segments are mental categories with featural attributes. Another issue concerns what, at various levels of description, the talker aims to achieve. This article focuses on speech production, and considers language forms and plans for speaking, along with speakers' goals as acoustic targets or vocal tract gestures, the DIVA theory of speech production, the task dynamic model, coarticulation, and prosody.

Download Full-text