Sequences of Intonation Units form a ~1 Hz rhythm

AbstractStudies of speech processing investigate the relationship between temporal structure in speech stimuli and neural activity. Despite clear evidence that the brain tracks speech at low frequencies (~1 Hz), it is not well understood what linguistic information gives rise to this rhythm. Here, we harness linguistic theory to draw attention to Intonation Units (IUs), a fundamental prosodic unit of human language, and characterize their temporal structure as captured in the speech envelope, an acoustic representation relevant to the neural processing of speech.IUs are defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. Linguistic studies of spontaneous speech indicate that this prosodic segmentation paces new information in language use across diverse languages. Therefore, IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension.We study the relation between IUs and periodicities in the speech envelope, applying methods from investigations of neural synchronization. Our sample includes recordings from every-day speech contexts of over 100 speakers and six languages. We find that sequences of IUs form a consistent low-frequency rhythm and constitute a significant periodic cue within the speech envelope. Our findings allow to predict that IUs are utilized by the neural system when tracking speech, and the methods we introduce facilitate testing this prediction given physiological data.

Download Full-text

Sequences of Intonation Units form a ~ 1 Hz rhythm

Scientific Reports ◽

10.1038/s41598-020-72739-4 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Maya Inbar ◽

Eitan Grossman ◽

Ayelet N. Landau

Keyword(s):

Speech Processing ◽

Temporal Structure ◽

Low Frequency ◽

Neural System ◽

Specific Pattern ◽

Physiological Data ◽

Low Frequencies ◽

Speech Stimuli ◽

Intonation Units ◽

Speech Envelope

Abstract Studies of speech processing investigate the relationship between temporal structure in speech stimuli and neural activity. Despite clear evidence that the brain tracks speech at low frequencies (~ 1 Hz), it is not well understood what linguistic information gives rise to this rhythm. In this study, we harness linguistic theory to draw attention to Intonation Units (IUs), a fundamental prosodic unit of human language, and characterize their temporal structure as captured in the speech envelope, an acoustic representation relevant to the neural processing of speech. IUs are defined by a specific pattern of syllable delivery, together with resets in pitch and articulatory force. Linguistic studies of spontaneous speech indicate that this prosodic segmentation paces new information in language use across diverse languages. Therefore, IUs provide a universal structural cue for the cognitive dynamics of speech production and comprehension. We study the relation between IUs and periodicities in the speech envelope, applying methods from investigations of neural synchronization. Our sample includes recordings from every-day speech contexts of over 100 speakers and six languages. We find that sequences of IUs form a consistent low-frequency rhythm and constitute a significant periodic cue within the speech envelope. Our findings allow to predict that IUs are utilized by the neural system when tracking speech. The methods we introduce here facilitate testing this prediction in the future (i.e., with physiological data).

Download Full-text

Including measures of high gamma power can improve the decoding of natural speech from EEG

10.1101/785881 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shyanthony R. Synigal ◽

Emily S. Teoh ◽

Edmund C. Lalor

Keyword(s):

Brain Imaging ◽

Neural Activity ◽

Speech Processing ◽

Signal To Noise Ratio ◽

Low Frequency ◽

Natural Speech ◽

Low Frequencies ◽

High Gamma ◽

Gamma Power ◽

Speech Tracking

ABSTRACTThe human auditory system is adept at extracting information from speech in both single-speaker and multi-speaker situations. This involves neural processing at the rapid temporal scales seen in natural speech. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) signatures of such processing have shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such rapid processing is even more strongly reflected in the power of neural activity at high frequencies (around 70-150 Hz; known as high gamma). The aim of this study was to determine if high gamma power in scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Furthermore, we aimed to assess whether any such information might be complementary to that reflected in well-established low frequency EEG indices of speech processing. We used linear regression to investigate speech envelope and attention decoding in EEG at low frequencies, in high gamma power, and in both signals combined. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in a minority of subjects. This same pattern was true for attention decoding using a separate group of subjects who undertook a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Overall, this indicates that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects and combining it with low frequency EEG can improve the mapping between natural speech and the resulting neural responses.

Download Full-text

Evidence of degraded representation of speech in noise, in the aging midbrain and cortex

Journal of Neurophysiology ◽

10.1152/jn.00372.2016 ◽

2016 ◽

Vol 116 (5) ◽

pp. 2346-2355 ◽

Cited By ~ 94

Author(s):

Alessandro Presacco ◽

Jonathan Z. Simon ◽

Samira Anderson

Keyword(s):

Older Adults ◽

Speech Processing ◽

Background Noise ◽

Low Frequency ◽

Network Connectivity ◽

Younger Adults ◽

Auditory Temporal Processing ◽

Age Related ◽

Speech Envelope ◽

Impaired Speech

Humans have a remarkable ability to track and understand speech in unfavorable conditions, such as in background noise, but speech understanding in noise does deteriorate with age. Results from several studies have shown that in younger adults, low-frequency auditory cortical activity reliably synchronizes to the speech envelope, even when the background noise is considerably louder than the speech signal. However, cortical speech processing may be limited by age-related decreases in the precision of neural synchronization in the midbrain. To understand better the neural mechanisms contributing to impaired speech perception in older adults, we investigated how aging affects midbrain and cortical encoding of speech when presented in quiet and in the presence of a single-competing talker. Our results suggest that central auditory temporal processing deficits in older adults manifest in both the midbrain and in the cortex. Specifically, midbrain frequency following responses to a speech syllable are more degraded in noise in older adults than in younger adults. This suggests a failure of the midbrain auditory mechanisms needed to compensate for the presence of a competing talker. Similarly, in cortical responses, older adults show larger reductions than younger adults in their ability to encode the speech envelope when a competing talker is added. Interestingly, older adults showed an exaggerated cortical representation of speech in both quiet and noise conditions, suggesting a possible imbalance between inhibitory and excitatory processes, or diminished network connectivity that may impair their ability to encode speech efficiently.

Download Full-text

A speech envelope landmark for syllable encoding in human superior temporal gyrus

10.1101/388280 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yulia Oganian ◽

Edward F. Chang

Keyword(s):

Speech Processing ◽

Acoustic Analysis ◽

Brain Area ◽

Superior Temporal Gyrus ◽

Rate Of Change ◽

Local Maxima ◽

Speech Stimuli ◽

Neural Computations ◽

Absolute Amplitude ◽

Speech Envelope

AbstractListeners use the slow amplitude modulations of speech, known as the envelope, to segment continuous speech into syllables. However, the underlying neural computations are heavily debated. We used high-density intracranial cortical recordings while participants listened to natural and synthesized control speech stimuli to determine how the envelope is represented in the human superior temporal gyrus (STG), a critical auditory brain area for speech processing. We found that the STG does not encode the instantaneous, moment-by-moment amplitude envelope of speech. Rather, a zone of the middle STG detects discrete acoustic onset edges, defined by local maxima in the rate-of-change of the envelope. Acoustic analysis demonstrated that acoustic onset edges reliably cue the information-rich transition between the consonant-onset and vowel-nucleus of syllables. Furthermore, the steepness of the acoustic edge cued whether a syllable was stressed. Synthesized amplitude-modulated tone stimuli showed that steeper edges elicited monotonically greater cortical responses, confirming the encoding of relative but not absolute amplitude. Overall, encoding of the timing and magnitude of acoustic onset edges in STG underlies our perception of the syllabic rhythm of speech.

Download Full-text

Envelope reconstruction of speech and music highlights unique tracking of speech at low frequencies

10.1101/2021.01.23.427890 ◽

2021 ◽

Author(s):

Nathaniel J Zuk ◽

Jeremy W Murphy ◽

Richard B Reilly ◽

Edmund C Lalor

Keyword(s):

Low Frequency ◽

Cognitive State ◽

Constrained Reconstruction ◽

Low Frequencies ◽

Amplitude Fluctuations ◽

Acoustic Processing ◽

Novel Method ◽

Envelope Reconstruction ◽

Speech Envelope ◽

The Brain

AbstractThe human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the processing of higher-order features and one’s cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, speech envelope tracking at low frequencies, below 1 Hz, was uniquely associated with increased weighting over parietal channels. Our results highlight the importance of low-frequency speech tracking and its origin from speech-specific processing in the brain.

Download Full-text

Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009358 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009358

Author(s):

Nathaniel J. Zuk ◽

Jeremy W. Murphy ◽

Richard B. Reilly ◽

Edmund C. Lalor

Keyword(s):

Low Frequency ◽

Neural Mechanism ◽

Cognitive State ◽

Constrained Reconstruction ◽

Low Frequencies ◽

Amplitude Fluctuations ◽

Acoustic Processing ◽

Envelope Reconstruction ◽

Speech Envelope ◽

Better Than

The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one’s cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.

Download Full-text

The Spatial Selective Auditory Attention of Cochlear Implant Users in Different Conversational Sound Levels

Journal of Clinical Medicine ◽

10.3390/jcm10143078 ◽

2021 ◽

Vol 10 (14) ◽

pp. 3078

Author(s):

Sara Akbarzadeh ◽

Sungmin Lee ◽

Chin-Tuan Tan

Keyword(s):

Cochlear Implant ◽

Sound Level ◽

Auditory Attention ◽

Impaired Hearing ◽

Speech Stimuli ◽

Electric Hearing ◽

Sound Levels ◽

Acoustic Hearing ◽

Speech Envelope ◽

Different Levels

In multi-speaker environments, cochlear implant (CI) users may attend to a target sound source in a different manner from normal hearing (NH) individuals during a conversation. This study attempted to investigate the effect of conversational sound levels on the mechanisms adopted by CI and NH listeners in selective auditory attention and how it affects their daily conversation. Nine CI users (five bilateral, three unilateral, and one bimodal) and eight NH listeners participated in this study. The behavioral speech recognition scores were collected using a matrix sentences test, and neural tracking to speech envelope was recorded using electroencephalography (EEG). Speech stimuli were presented at three different levels (75, 65, and 55 dB SPL) in the presence of two maskers from three spatially separated speakers. Different combinations of assisted/impaired hearing modes were evaluated for CI users, and the outcomes were analyzed in three categories: electric hearing only, acoustic hearing only, and electric + acoustic hearing. Our results showed that increasing the conversational sound level degraded the selective auditory attention in electrical hearing. On the other hand, increasing the sound level improved the selective auditory attention for the acoustic hearing group. In the NH listeners, however, increasing the sound level did not cause a significant change in the auditory attention. Our result implies that the effect of the sound level on selective auditory attention varies depending on the hearing modes, and the loudness control is necessary for the ease of attending to the conversation by CI users.

Download Full-text

Earless toads sense low frequencies but miss the high notes

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2017.1670 ◽

2017 ◽

Vol 284 (1864) ◽

pp. 20171670 ◽

Cited By ~ 10

Author(s):

Molly C. Womack ◽

Jakob Christensen-Dalsgaard ◽

Luis A. Coloma ◽

Juan C. Chaparro ◽

Kim L. Hoke

Keyword(s):

High Frequency ◽

Species Differences ◽

Low Frequency ◽

Auditory Brainstem ◽

Low Frequencies ◽

Hearing Sensitivity ◽

Sensory Pathways ◽

Airborne Sound ◽

Vibrational Sensitivity ◽

Species Specific

Sensory losses or reductions are frequently attributed to relaxed selection. However, anuran species have lost tympanic middle ears many times, despite anurans' use of acoustic communication and the benefit of middle ears for hearing airborne sound. Here we determine whether pre-existing alternative sensory pathways enable anurans lacking tympanic middle ears (termed earless anurans) to hear airborne sound as well as eared species or to better sense vibrations in the environment. We used auditory brainstem recordings to compare hearing and vibrational sensitivity among 10 species (six eared, four earless) within the Neotropical true toad family (Bufonidae). We found that species lacking middle ears are less sensitive to high-frequency sounds, however, low-frequency hearing and vibrational sensitivity are equivalent between eared and earless species. Furthermore, extratympanic hearing sensitivity varies among earless species, highlighting potential species differences in extratympanic hearing mechanisms. We argue that ancestral bufonids may have sufficient extratympanic hearing and vibrational sensitivity such that earless lineages tolerated the loss of high frequency hearing sensitivity by adopting species-specific behavioural strategies to detect conspecifics, predators and prey.

Download Full-text

A waveform inversion technique for measuring elastic wave attenuation in cylindrical bars

Geophysics ◽

10.1190/1.1443299 ◽

1992 ◽

Vol 57 (6) ◽

pp. 854-859 ◽

Cited By ~ 6

Author(s):

Xiao Ming Tang

Keyword(s):

Elastic Wave ◽

Wave Attenuation ◽

Waveform Inversion ◽

Finite Size ◽

Low Frequency ◽

Frequency Range ◽

Multiple Reflections ◽

Low Frequencies ◽

The Time Domain ◽

Attenuation Values

A new technique for measuring elastic wave attenuation in the frequency range of 10–150 kHz consists of measuring low‐frequency waveforms using two cylindrical bars of the same material but of different lengths. The attenuation is obtained through two steps. In the first, the waveform measured within the shorter bar is propagated to the length of the longer bar, and the distortion of the waveform due to the dispersion effect of the cylindrical waveguide is compensated. The second step is the inversion for the attenuation or Q of the bar material by minimizing the difference between the waveform propagated from the shorter bar and the waveform measured within the longer bar. The waveform inversion is performed in the time domain, and the waveforms can be appropriately truncated to avoid multiple reflections due to the finite size of the (shorter) sample, allowing attenuation to be measured at long wavelengths or low frequencies. The frequency range in which this technique operates fills the gap between the resonant bar measurement (∼10 kHz) and ultrasonic measurement (∼100–1000 kHz). By using the technique, attenuation values in a PVC (a highly attenuative) material and in Sierra White granite were measured in the frequency range of 40–140 kHz. The obtained attenuation values for the two materials are found to be reliable and consistent.

Download Full-text

Methods to isolate retrograde and prograde Rayleigh-wave signals

Geophysical Journal International ◽

10.1093/gji/ggz341 ◽

2019 ◽

Vol 219 (2) ◽

pp. 975-994 ◽

Cited By ~ 1

Author(s):

Gabriel Gribler ◽

T Dylan Mikesell

Keyword(s):

Rayleigh Wave ◽

Time Domain ◽

Fundamental Mode ◽

Poisson Ratio ◽

Low Frequency ◽

Wave Dispersion ◽

Low Frequencies ◽

Retrograde Motion ◽

Mode Identification ◽

Time Frequency

SUMMARY Estimating shear wave velocity with depth from Rayleigh-wave dispersion data is limited by the accuracy of fundamental and higher mode identification and characterization. In many cases, the fundamental mode signal propagates exclusively in retrograde motion, while higher modes propagate in prograde motion. It has previously been shown that differences in particle motion can be identified with multicomponent recordings and used to separate prograde from retrograde signals. Here we explore the domain of existence of prograde motion of the fundamental mode, arising from a combination of two conditions: (1) a shallow, high-impedance contrast and (2) a high Poisson ratio material. We present solutions to isolate fundamental and higher mode signals using multicomponent recordings. Previously, a time-domain polarity mute was used with limited success due to the overlap in the time domain of fundamental and higher mode signals at low frequencies. We present several new approaches to overcome this low-frequency obstacle, all of which utilize the different particle motions of retrograde and prograde signals. First, the Hilbert transform is used to phase shift one component by 90° prior to summation or subtraction of the other component. This enhances either retrograde or prograde motion and can increase the mode amplitude. Secondly, we present a new time–frequency domain polarity mute to separate retrograde and prograde signals. We demonstrate these methods with synthetic and field data to highlight the improvements to dispersion images and the resulting dispersion curve extraction.

Download Full-text