scholarly journals Cross-dialectal vowel mapping and glide perception

2021 ◽  
Vol 6 (1) ◽  
pp. 966
Author(s):  
Abram Clear ◽  
Anya Hogoboom

Formant transitions from a high front vowel to a non-high, non-front vowel mimic the formant signature of a canonical [j], resulting in the perception of an acoustic glide (Hogoboom 2020). We ask if listeners may still perceive a glide when canonical formant transitions are absent. We investigated the mapping of an Appalachian English (AE) monophthongal [aɪ] in hiatus sequences, monophthongal [aɪ.a]. If participants map this monophthongal [aɪ] to a high front position, they might perceive a glide that is not supported by the acoustic signal, which we call a phantom glide. Ninety-six participants (45 of which were native AE speakers) heard 30 different English words ending in [i], [ə], or monophthongal [aɪ] (i.e. tree, coma, pie) that had been suffixed with either [-a] or [-ja]. They were asked to identify which suffixed form they heard. Participants in both dialect groups sometimes perceived a glide that was truly absent from the speech stream. In these cases, participants mapped static formants in monophthongal [aɪ.a] stimuli to a diphthongal /aɪ/ with a high front endpoint, causing the perception of the necessary F1 fall and subsequent rise of a [j]. Using recent models of speech processing, which encode both social and acoustic representations of speech (e.g. Sumner et al. 2014), we discuss the mapping of monophthongal [aɪ] to a privileged diphthongal underlying form.

2021 ◽  
Author(s):  
Julia Schwarz ◽  
Katrina (Kechun) Li ◽  
Jasper Hong Sim ◽  
Yixin Zhang ◽  
Elizabeth Buchanan-Worster ◽  
...  

Face masks can cause speech processing difficulties. However, it is unclear to what extent these difficulties are caused by the visual obstruction of the speaker’s mouth or by changes of the acoustic signal, and whether the effects can be found regardless of semantic context. In the present study, children and adults performed a cued shadowing task online, repeating the last word of English sentences. Target words were embedded in sentence-final position and manipulated visually, acoustically, and by semantic context (cloze probability). First results from 16 children and 16 adults suggest that processing language through face masks leads to slower responses in both groups, but visual, acoustic, and semantic cues all significantly reduce the mask effect. Although children were less proficient in predictive speech processing overall, they were still able to use semantic cues to compensate for face mask effects in a similar fashion to adults.


Author(s):  
Benjamin R. Pittman-Polletta ◽  
Yangyang Wang ◽  
David A. Stanley ◽  
Charles E. Schroeder ◽  
Miles A. Whittington ◽  
...  

AbstractCurrent hypotheses suggest that speech segmentation – the initial division and grouping of the speech stream into candidate phrases, syllables, and phonemes for further linguistic processing – is executed by a hierarchy of oscillators in auditory cortex. Theta (~3-12 Hz) rhythms play a key role by phase-locking to recurring acoustic features marking syllable boundaries. Reliable synchronization to quasi-rhythmic inputs, whose variable frequency can dip below cortical theta frequencies (down to ~1 Hz), requires “flexible” theta oscillators whose underlying neuronal mechanisms remain unknown. Using biophysical computational models, we found that the flexibility of phase-locking in neural oscillators depended on the types of hyperpolarizing currents that paced them. Simulated cortical theta oscillators flexibly phase-locked to slow inputs when these inputs caused both (i) spiking and (ii) the subsequent buildup of outward current sufficient to delay further spiking until the next input. The greatest flexibility in phase-locking arose from a synergistic interaction between intrinsic currents that was not replicated by synaptic currents at similar timescales. Our results suggest that synaptic and intrinsic inhibition contribute to frequency-restricted and - flexible phase-locking in neural oscillators, respectively. Their differential deployment may enable neural oscillators to play diverse roles, from reliable internal clocking to adaptive segmentation of quasi-regular sensory inputs like speech.Author summaryOscillatory activity in auditory cortex is believed to play an important role in auditory and speech processing. One suggested function of these rhythms is to divide the speech stream into candidate phonemes, syllables, words, and phrases, to be matched with learned linguistic templates. This requires brain rhythms to flexibly phase-lock to regular acoustic features of the speech stream. How neuronal circuits implement this task remains unknown. In this study, we explored the contribution of inhibitory currents to flexible phase-locking in neuronal theta oscillators, believed to perform initial syllabic segmentation. We found that a combination of specific intrinsic inhibitory currents at multiple timescales, present in a large class of cortical neurons, enabled exceptionally flexible phase-locking, suggesting that the cells exhibiting these currents are a key component in the brain’s auditory and speech processing architecture.


Author(s):  
Jayanthiny Kangatharan ◽  
Maria Uther ◽  
Fernand Gobet

AbstractComprehension assesses a listener’s ability to construe the meaning of an acoustic signal in order to be able to answer questions about its contents, while intelligibility indicates the extent to which a listener can precisely retrieve the acoustic signal. Previous comprehension studies asking listeners for sentence-level information or narrative-level information used native listeners as participants. This is the first study to look at whether clear speech properties (e.g. expanded vowel space) produce a clear speech benefit at the word level for L2 learners for speech produced in naturalistic settings. This study explored whether hyperarticulated speech was more comprehensible than non-hyperarticulated speech for both L1 British English speakers and early and late L2 British English learners in quiet and in noise. Sixteen British English listeners, 16 native Mandarin Chinese listeners as early learners of L2 and 16 native Mandarin Chinese listeners as late learners of L2 rated hyperarticulated samples versus non-hyperarticulated samples in form of words for comprehension under four listening conditions of varying white noise level (quiet or SNR levels of + 16 dB, + 12 dB or + 8 dB) (3 × 2× 4 mixed design). Mean ratings showed all three groups found hyperarticulated speech samples easier to understand than non-hyperarticulated speech at all listening conditions. Results are discussed in terms of other findings (Uther et al., 2012) that suggest that hyperarticulation may generally improve speech processing for all language groups.


1984 ◽  
Vol 27 (2) ◽  
pp. 311-317 ◽  
Author(s):  
B. J. Guillemi ◽  
D. T. Nguyen

Durational measurements of frication, aspiration, prevoicing, and voice onset are often difficult to perform from the spectrogram, and the resolution is limited to about 5 ms. In many instances, a higher resolution can be obtained from a study of waveforms than from a study of spectrum. We present a microprocessor-based speech acquisition and processing system which uses waveform analysis techniques to extract measurements from the acoustic signal. The system is low cost and portable; it operates in "real time" and employs noninvasive data-capturing techniques. The usefulness of the system is demonstrated in the VOT measurement of CV clusters and in the measurement of fundamental frequency.


2021 ◽  
Author(s):  
Mahmoud Keshavarzi ◽  
Enrico Varano ◽  
Tobias Reichenbach

AbstractUnderstanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore evidencing a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we employ a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech.Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope evidences that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.Significance StatementLoud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we employ transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.


Author(s):  
Jennifer Cole ◽  
Yoonsook Mo ◽  
Mark Hasegawa-Johnson

AbstractThe perception of prosodic prominence in spontaneous speech is investigated through an online task of prosody transcription using untrained listeners. Prominence is indexed through a probabilistic prominence score assigned to each word based on the proportion of transcribers who perceived the word as prominent. Correlation and regression analyses between perceived prominence, acoustic measures and measures of a word's information status are conducted to test three hypotheses: (i) prominence perception is signal-driven, influenced by acoustic factors reflecting speakers' productions; (ii) perception is expectation-driven, influenced by the listener's prior experience of word frequency and repetition; (iii) any observed influence of word frequency on perceived prominence is mediated through the acoustic signal. Results show correlates of perceived prominence in acoustic measures, in word log-frequency and in the repetition index of a word, consistent with both signal-driven and expectation-driven hypotheses of prominence perception. But the acoustic correlates of perceived prominence differ somewhat from the correlates of word frequency, suggesting an independent effect of frequency on prominence perception. A speech processing account is offered as a model of signal-driven and expectation-driven effects on prominence perception, where prominence ratings are a function of the ease of lexical processing, as measured through the activation levels of lexical and sub-lexical units.


Speech is a vocal communication through which we achieve the information about speaker, language, message etc. Speaking is the process of converting discrete phonemes into continuous acoustic signal. Speech carries linguistic information that is associated with emotions, along with vocal information, which can be extracted by speech processing methods. In this paper, we basically analyse on prosodic features of different emotional speech in Assamese language for identify the comparison between male and female speakers who linguistically react differently in the same situations. We also apply statistical paired t-Test on prosodic features for showing the significant difference between male and female speech in different emotions.


Author(s):  
Aude Noiray ◽  
Khalil Iskarous ◽  
D. H. Whalen

AbstractThe nature of the links between speech production and perception has been the subject of longstanding debate. The present study investigated the articulatory parameter of tongue height and the acoustic F1–F0 difference for the phonological distinction of vowel height in American English front vowels. Multiple repetitions of /i, ɪ, e, ɛ, æ/ in [(h)Vd] sequences were recorded in seven adult speakers. Articulatory (ultrasound) and acoustic data were collected simultaneously to provide a direct comparison of variability in vowel production in both domains. Results showed idiosyncratic patterns of articulation for contrasting the three front vowel pairs /i-ɪ/, /e-ɛ/, and /ɛ-æ/ across subjects, with the degree of variability in vowel articulation comparable to that observed in the acoustics for all seven participants. However, contrary to what was expected, some speakers showed reversals for tongue height for /ɪ/-/e/ that were also reflected in acoustics, with F1 higher for /ɪ/ than for /e/. The data suggest the phonological distinction of height is conveyed via speaker-specific articulatory-acoustic patterns that do not strictly match features descriptions. However, the acoustic signal is faithful to the articulatory configuration that generated it, carrying the crucial information for perceptual contrast.


Sign in / Sign up

Export Citation Format

Share Document