temporal envelope
Recently Published Documents


TOTAL DOCUMENTS

166
(FIVE YEARS 33)

H-INDEX

25
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Sarah Anne Sauvé ◽  
Jeremy Marozeau ◽  
Benjamin Zendel

Auditory stream segregation, or separating sounds into their respective sources, and tracking them over time is a fundamental auditory ability. Previous research has separately explored the impacts of aging and musicianship on the ability to separate and follow auditory streams. The current study evaluated the simultaneous effects of age and musicianship on auditory streaming induced by three physical features: intensity, spectral envelope and temporal envelope. In the first study, older and younger musicians and non-musicians with normal hearing identified deviants in a four-note melody interleaved with distractors that were more or less similar to the melody in terms of intensity, spectral envelope and temporal envelope. In the second study, older and younger musicians and non-musicians participated in a dissimilarity rating paradigm with pairs of melodies that differed along the same three features. Results suggested that auditory streaming skills are maintained in older adults but that older adults rely on intensity more than younger adults while musicianship is associated with increased sensitivity to spectral and temporal envelope, acoustic features that are typically less effective for stream segregation, particularly in older adults.


Loquens ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. e074
Author(s):  
Lei He ◽  
Yu Zhang

Lower modulation rates in the temporal envelope (ENV) of the acoustic signal are believed to be the rhythmic backbone in speech, facilitating speech comprehension in terms of neuronal entrainments at δ- and θ-rates (these rates are comparable to the foot- and syllable-rates phonetically). The jaw plays the role of a carrier articulator regulating mouth opening in a quasi-cyclical way, which correspond to the low-frequency modulations as a physical consequence. This paper describes a method to examine the joint roles of jaw oscillation and ENV in realizing speech rhythm using spectral coherence. Relative powers in the frequency bands corresponding to the δ-and θ-oscillations in the coherence (respectively notated as %δ and %θ) were quantified as one possible way of revealing the amount of concomitant foot- and syllable-level rhythmicities carried by both acoustic and articulatory domains. Two English corpora (mngu0 and MOCHA-TIMIT) were used for the proof of concept. %δ and %θ were regressed on utterance duration for an initial analysis. Results showed that the degrees of foot- and syllable-sized rhythmicities are different and are contingent upon the utterance length.


2021 ◽  
Vol 15 ◽  
Author(s):  
Zhong Zheng ◽  
Keyi Li ◽  
Gang Feng ◽  
Yang Guo ◽  
Yinan Li ◽  
...  

Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin.Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach.Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43–84.82%, 76.27–95.24%, and 96.58%, respectively; for consonant recognition 35.49–63.77%, 67.75–78.87%, and 87.87%; for lexical tone recognition 60.80–97.15%, 73.16–96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14.Conclusion: Regions that contributed most for vowel recognition was Region 2 (502–1,022 Hz) that contains first formant (F1) information; Region 5 (3,856–7,562 Hz) contributed most to consonant recognition; Region 1 (80–502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.


2021 ◽  
Vol 150 (4) ◽  
pp. A191-A191
Author(s):  
Yeonggwang Park ◽  
Erol J. Ozmeral ◽  
Supraja Anand ◽  
Rahul Shrivastav ◽  
David A. Eddins

2021 ◽  
Vol 12 ◽  
Author(s):  
Xin Wang ◽  
Yujia Wei ◽  
Lena Heng ◽  
Stephen McAdams

Timbre is one of the psychophysical cues that has a great impact on affect perception, although, it has not been the subject of much cross-cultural research. Our aim is to investigate the influence of timbre on the perception of affect conveyed by Western and Chinese classical music using a cross-cultural approach. Four listener groups (Western musicians, Western nonmusicians, Chinese musicians, and Chinese nonmusicians; 40 per group) were presented with 48 musical excerpts, which included two musical excerpts (one piece of Chinese and one piece of Western classical music) per affect quadrant from the valence-arousal space, representing angry, happy, peaceful, and sad emotions and played with six different instruments (erhu, dizi, pipa, violin, flute, and guitar). Participants reported ratings of valence, tension arousal, energy arousal, preference, and familiarity on continuous scales ranging from 1 to 9. ANOVA reveals that participants’ cultural backgrounds have a greater impact on affect perception than their musical backgrounds, and musicians more clearly distinguish between a perceived measure (valence) and a felt measure (preference) than do nonmusicians. We applied linear partial least squares regression to explore the relation between affect perception and acoustic features. The results show that the important acoustic features for valence and energy arousal are similar, which are related mostly to spectral variation, the shape of the temporal envelope, and the dynamic range. The important acoustic features for tension arousal describe the shape of the spectral envelope, noisiness, and the shape of the temporal envelope. The explanation for the similarity of perceived affect ratings between instruments is the similar acoustic features that were caused by the physical characteristics of specific instruments and performing techniques.


2021 ◽  
Vol 64 (9) ◽  
pp. 3697-3706
Author(s):  
Dhatri S. Devaraju ◽  
Amy Kemp ◽  
David A. Eddins ◽  
Rahul Shrivastav ◽  
Bharath Chandrasekaran ◽  
...  

Purpose Listeners shift their listening strategies between lower level acoustic information and higher level semantic information to prioritize maximum speech intelligibility in challenging listening conditions. Although increasing task demands via acoustic degradation modulates lexical-semantic processing, the neural mechanisms underlying different listening strategies are unclear. The current study examined the extent to which encoding of lower level acoustic cues is modulated by task demand and associations with lexical-semantic processes. Method Electroencephalography was acquired while participants listened to sentences in the presence of four-talker babble that contained either higher or lower probability final words. Task difficulty was modulated by time available to process responses. Cortical tracking of speech—neural correlates of acoustic temporal envelope processing—were estimated using temporal response functions. Results Task difficulty did not affect cortical tracking of temporal envelope of speech under challenging listening conditions. Neural indices of lexical-semantic processing (N400 amplitudes) were larger with increased task difficulty. No correlations were observed between the cortical tracking of temporal envelope of speech and lexical-semantic processes, even after controlling for the effect of individualized signal-to-noise ratios. Conclusions Cortical tracking of the temporal envelope of speech and semantic processing are differentially influenced by task difficulty. While increased task demands modulated higher level semantic processing, cortical tracking of the temporal envelope of speech may be influenced by task difficulty primarily when the demand is manipulated in terms of acoustic properties of the stimulus, consistent with an emerging perspective in speech perception.


2021 ◽  
Vol 15 ◽  
Author(s):  
Zhong Zheng ◽  
Keyi Li ◽  
Yang Guo ◽  
Xinrong Wang ◽  
Lili Xiao ◽  
...  

ObjectivesAcoustic temporal envelope (E) cues containing speech information are distributed across all frequency spectra. To provide a theoretical basis for the signal coding of hearing devices, we examined the relative weight of E cues in different frequency regions for Mandarin disyllabic word recognition in quiet.DesignE cues were extracted from 30 continuous frequency bands within the range of 80 to 7,562 Hz using Hilbert decomposition and assigned to five frequency regions from low to high. Disyllabic word recognition of 20 normal-hearing participants were obtained using the E cues available in two, three, or four frequency regions. The relative weights of the five frequency regions were calculated using least-squares approach.ResultsParticipants correctly identified 3.13–38.13%, 27.50–83.13%, or 75.00–93.13% of words when presented with two, three, or four frequency regions, respectively. Increasing the number of frequency region combinations improved recognition scores and decreased the magnitude of the differences in scores between combinations. This suggested a synergistic effect among E cues from different frequency regions. The mean weights of E cues of frequency regions 1–5 were 0.31, 0.19, 0.26, 0.22, and 0.02, respectively.ConclusionFor Mandarin disyllabic words, E cues of frequency regions 1 (80–502 Hz) and 3 (1,022–1,913 Hz) contributed more to word recognition than other regions, while frequency region 5 (3,856–7,562) contributed little.


Sign in / Sign up

Export Citation Format

Share Document