STRAIGHT-TEMPO: a universal tool to manipulate linguistic and para-linguistic speech information

Research has shown that speaking in a deliberately clear manner can improve the accuracy of auditory speech recognition. Allowing listeners access to visual speech cues also enhances speech understanding. Whether the nature of information provided by speaking clearly and by using visual speech cues is redundant has not been determined. This study examined how speaking mode (clear vs. conversational) and presentation mode (auditory vs. auditory-visual) influenced the perception of words within nonsense sentences. In Experiment 1, 30 young listeners with normal hearing responded to videotaped stimuli presented audiovisually in the presence of background noise at one of three signal-to-noise ratios. In Experiment 2, 9 participants returned for an additional assessment using auditory-only presentation. Results of these experiments showed significant effects of speaking mode (clear speech was easier to understand than was conversational speech) and presentation mode (auditoryvisual presentation led to better performance than did auditory-only presentation). The benefit of clear speech was greater for words occurring in the middle of sentences than for words at either the beginning or end of sentences for both auditory-only and auditory-visual presentation, whereas the greatest benefit from supplying visual cues was for words at the end of sentences spoken both clearly and conversationally. The total benefit from speaking clearly and supplying visual cues was equal to the sum of each of these effects. Overall, the results suggest that speaking clearly and providing visual speech information provide complementary (rather than redundant) information.

Download Full-text

Informational Masking Effects of Speech Versus Nonspeech Noise on Cortical Auditory Evoked Potentials

Journal of Speech Language and Hearing Research ◽

10.1044/2021_jslhr-21-00048 ◽

2021 ◽

Vol 64 (10) ◽

pp. 4014-4029

Author(s):

Kathy R. Vander Werff ◽

Christopher E. Niemczak ◽

Kenneth Morse

Keyword(s):

Auditory Processing ◽

Auditory Evoked Potentials ◽

Spectral Characteristics ◽

Auditory Evoked Potential ◽

Informational Masking ◽

Neural Encoding ◽

Speech Stimuli ◽

Typical Morphology ◽

Cortical Auditory Evoked Potentials ◽

Speech Information

Purpose Background noise has been categorized as energetic masking due to spectrotemporal overlap of the target and masker on the auditory periphery or informational masking due to cognitive-level interference from relevant content such as speech. The effects of masking on cortical and sensory auditory processing can be objectively studied with the cortical auditory evoked potential (CAEP). However, whether effects on neural response morphology are due to energetic spectrotemporal differences or informational content is not fully understood. The current multi-experiment series was designed to assess the effects of speech versus nonspeech maskers on the neural encoding of speech information in the central auditory system, specifically in terms of the effects of speech babble noise maskers varying by talker number. Method CAEPs were recorded from normal-hearing young adults in response to speech syllables in the presence of energetic maskers (white or speech-shaped noise) and varying amounts of informational maskers (speech babble maskers). The primary manipulation of informational masking was the number of talkers in speech babble, and results on CAEPs were compared to those of nonspeech maskers with different temporal and spectral characteristics. Results Even when nonspeech noise maskers were spectrally shaped and temporally modulated to speech babble maskers, notable changes in the typical morphology of the CAEP in response to speech stimuli were identified in the presence of primarily energetic maskers and speech babble maskers with varying numbers of talkers. Conclusions While differences in CAEP outcomes did not reach significance by number of talkers, neural components were significantly affected by speech babble maskers compared to nonspeech maskers. These results suggest an informational masking influence on neural encoding of speech information at the sensory cortical level of auditory processing, even without active participation on the part of the listener.

Download Full-text

Freedom of Speech, Information Privacy, and the Troubling Implications of a Right to Stop People from Speaking About You

SSRN Electronic Journal ◽

10.2139/ssrn.200469 ◽

1999 ◽

Author(s):

Eugene Volokh

Keyword(s):

Freedom Of Speech ◽

Information Privacy ◽

Speech Information

Download Full-text

Future Directions of Speech Information Processing

Digital Speech Processing, Synthesis, and Recognition ◽

10.1201/9781482270648-10 ◽

2018 ◽

pp. 375-386

Author(s):

Sadaoki Furui

Keyword(s):

Information Processing ◽

Future Directions ◽

Speech Information

Download Full-text

The Effect of a Concurrent Working Memory Task and Temporal Offsets on the Integration of Auditory and Visual Speech Information

Seeing and Perceiving ◽

10.1163/187847611x620937 ◽

2012 ◽

Vol 25 (1) ◽

pp. 87-106 ◽

Cited By ~ 13

Author(s):

Kevin G. Munhall ◽

Julie N. Buchan

Keyword(s):

Working Memory ◽

Memory Task ◽

Visual Speech ◽

Visual Speech Information ◽

Speech Information

Download Full-text

The G.729-Based Speech Information Hiding Approach

Information Hiding in Speech Signal for Secure Communication ◽

10.1016/b978-0-12-801328-1.00006-9 ◽

2015 ◽

pp. 97-112

Author(s):

Zhijun Wu

Keyword(s):

Information Hiding ◽

Speech Information

Download Full-text

The Importance of Extended High-Frequency Speech Information in the Recognition of Digits, Words, and Sentences in Quiet and Noise

Ear and Hearing ◽

10.1097/aud.0000000000001142 ◽

2021 ◽

Vol Publish Ahead of Print ◽

Author(s):

Sigrid Polspoel ◽

Sophia E. Kramer ◽

Bas van Dijk ◽

Cas Smits

Keyword(s):

High Frequency ◽

Speech Information

Download Full-text

Towards Device Independent Eavesdropping on Telephone Conversations with Built-in Accelerometer

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3494969 ◽

2021 ◽

Vol 5 (4) ◽

pp. 1-29

Author(s):

Weigao Su ◽

Daibo Liu ◽

Taiyuan Zhang ◽

Hongbo Jiang

Keyword(s):

Sampling Rate ◽

Physical Contact ◽

Motion Sensors ◽

Telephone Conversation ◽

Root Cause ◽

Phone Calls ◽

One Step ◽

Speech Information ◽

The Impact ◽

The Voice

Motion sensors in modern smartphones have been exploited for audio eavesdropping in loudspeaker mode due to their sensitivity to vibrations. In this paper, we further move one step forward to explore the feasibility of using built-in accelerometer to eavesdrop on the telephone conversation of caller/callee who takes the phone against cheek-ear and design our attack Vibphone. The inspiration behind Vibphone is that the speech-induced vibrations (SIV) can be transmitted through the physical contact of phone-cheek to accelerometer with the traces of voice content. To this end, Vibphone faces three main challenges: i) Accurately detecting SIV signals from miscellaneous disturbance; ii) Combating the impact of device diversity to work with a variety of attack scenarios; and iii) Enhancing feature-agnostic recognition model to generalize to newly issued devices and reduce training overhead. To address these challenges, we first conduct an in-depth investigation on SIV features to figure out the root cause of device diversity impacts and identify a set of critical features that are highly relevant to the voice content retained in SIV signals and independent of specific devices. On top of these pivotal observations, we propose a combo method that is the integration of extracted critical features and deep neural network to recognize speech information from the spectrogram representation of acceleration signals. We implement the attack using commodity smartphones and the results show it is highly effective. Our work brings to light a fundamental design vulnerability in the vast majority of currently deployed smartphones, which may put people's speech privacy at risk during phone calls. We also propose a practical and effective defense solution. We validate that it is feasible to prevent audio eavesdropping by using random variation of sampling rate.

Download Full-text

Second Language Instruction

Advances in Educational Technologies and Instructional Design - Handbook of Research on Bilingual and Intercultural Education ◽

10.4018/978-1-7998-2588-3.ch005 ◽

2020 ◽

pp. 105-123

Author(s):

Doğu Erdener

Keyword(s):

Speech Perception ◽

Applied Field ◽

General Framework ◽

Language Instruction ◽

Visual Speech ◽

Second Language Instruction ◽

Perception Process ◽

L2 Instruction ◽

Visual Speech Information ◽

Speech Information

Speech perception has long been taken for granted as an auditory-only process. However, it is now firmly established that speech perception is an auditory-visual process in which visual speech information in the form of lip and mouth movements are taken into account in the speech perception process. Traditionally, foreign language (L2) instructional methods and materials are auditory-based. This chapter presents a general framework of evidence that visual speech information will facilitate L2 instruction. The author claims that this knowledge will form a bridge to cover the gap between psycholinguistics and L2 instruction as an applied field. The chapter also describes how orthography can be used in L2 instruction. While learners from a transparent L1 orthographic background can decipher phonology of orthographically transparent L2s –overriding the visual speech information – that is not the case for those from orthographically opaque L1s.

Download Full-text

STRAIGHT-TEMPO: a universal tool to manipulate linguistic and para-linguistic speech information

Hardware-software complex of a speech-like interference generator for speech information protection

Auditory and Auditory-Visual Perception of Clear and Conversational Speech

Informational Masking Effects of Speech Versus Nonspeech Noise on Cortical Auditory Evoked Potentials

Freedom of Speech, Information Privacy, and the Troubling Implications of a Right to Stop People from Speaking About You

Future Directions of Speech Information Processing

The Effect of a Concurrent Working Memory Task and Temporal Offsets on the Integration of Auditory and Visual Speech Information

The G.729-Based Speech Information Hiding Approach

The Importance of Extended High-Frequency Speech Information in the Recognition of Digits, Words, and Sentences in Quiet and Noise

Towards Device Independent Eavesdropping on Telephone Conversations with Built-in Accelerometer

Second Language Instruction

Export Citation Format