scholarly journals Repetition enhancement to voice identities in the dog brain

2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Marianna Boros ◽  
Anna Gábor ◽  
Dóra Szabó ◽  
Anett Bozsik ◽  
Márta Gácsi ◽  
...  

AbstractIn the human speech signal, cues of speech sounds and voice identities are conflated, but they are processed separately in the human brain. The processing of speech sounds and voice identities is typically performed by non-primary auditory regions in humans and non-human primates. Additionally, these processes exhibit functional asymmetry in humans, indicating the involvement of distinct mechanisms. Behavioural studies indicate analogue side biases in dogs, but neural evidence for this functional dissociation is missing. In two experiments, using an fMRI adaptation paradigm, we presented awake dogs with natural human speech that either varied in segmental (change in speech sound) or suprasegmental (change in voice identity) content. In auditory regions, we found a repetition enhancement effect for voice identity processing in a secondary auditory region – the caudal ectosylvian gyrus. The same region did not show repetition effects for speech sounds, nor did the primary auditory cortex exhibit sensitivity to changes either in the segmental or in the suprasegmental content. Furthermore, we did not find evidence for functional asymmetry neither in the processing of speech sounds or voice identities. Our results in dogs corroborate former human and non-human primate evidence on the role of secondary auditory regions in the processing of suprasegmental cues, suggesting similar neural sensitivity to the identity of the vocalizer across the mammalian order.

2021 ◽  
Author(s):  
Carolin Juechter ◽  
Rainer Beutelmann ◽  
Georg M. Klump

The present study establishes the Mongolian gerbil (Meriones unguiculatus) as a model for investigating the perception of human speech sounds. We report data on the discrimination of logatomes (CVCs - consonant-vowel-consonant combinations with outer consonants /b/, /d/, /s/ and /t/ and central vowels /a/, /aː/, /ɛ/, /eː/, /ɪ/, /iː/, /ɔ/, /oː/, /ʊ/ and /uː/, VCVs - vowel-consonant-vowel combinations with outer vowels /a/, /ɪ/ and /ʊ/ and central consonants /b/, /d/, /f/, /g/, /k/, /l/, /m/, /n/, /p/, /s/, /t/ and /v/) by young gerbils. Four young gerbils were trained to perform an oddball target detection paradigm in which they were required to discriminate a deviant CVC or VCV in a sequence of CVC or VCV standards, respectively. The experiments were performed with an ICRA-1 noise masker with speech-like spectral properties, and logatomes of multiple speakers were presented at various signal-to-noise ratios. Response latencies were measured to generate perceptual maps employing multidimensional scaling, which visualize the gerbils' internal representations of the sounds. The dimensions of the perceptual maps were correlated to multiple phonetic features of the speech sounds for evaluating which features of vowels and consonants are most important for the discrimination. The perceptual representation of vowels and consonants in gerbils was similar to that of humans, although gerbils needed higher signal-to-noise ratios for the discrimination of speech sounds than humans. The gerbils' discrimination of vowels depended on differences in the frequencies of the first and second formant determined by tongue height and position. Consonants were discriminated based on differences in combinations of their articulatory features. The similarities in the perception of logatomes by gerbils and humans renders the gerbil a suitable model for human speech sound discrimination.


2018 ◽  
Author(s):  
A. Lipponen ◽  
J.L.O. Kurkela ◽  
Kyläheiko I. ◽  
Hölttä S. ◽  
T. Ruusuvirta ◽  
...  

AbstractElectrophysiological response termed mismatch negativity (MMN) indexes auditory change detection in humans. An analogous response, called the mismatch response (MMR), is also elicited in animals. Mismatch response has been widely utilized in investigations of change detection in human speech sounds in rats and guinea pigs, but not in mice. Since e.g. transgenic mouse models provide important advantages for further studies, we studied processing of speech sounds in anesthetized mice. Auditory evoked potentials were recorded from the dura above the auditory cortex to changes in duration of a human speech sound /a/. In oddball stimulus condition, the MMR was elicited at 53-259 ms latency in response to the changes. The MMR was found to the large (from 200 ms to 110 ms) but not to smaller (from 200 ms to 120-180 ms) changes in duration. The results suggest that mice can represent human speech sounds in order to detect changes in their duration. The findings can be utilized in future investigations applying mouse models for speech perception.


2018 ◽  
Vol 15 (2) ◽  
pp. 104-110 ◽  
Author(s):  
Shohei Kato ◽  
Akira Homma ◽  
Takuto Sakuma

Objective: This study presents a novel approach for early detection of cognitive impairment in the elderly. The approach incorporates the use of speech sound analysis, multivariate statistics, and data-mining techniques. We have developed a speech prosody-based cognitive impairment rating (SPCIR) that can distinguish between cognitively normal controls and elderly people with mild Alzheimer's disease (mAD) or mild cognitive impairment (MCI) using prosodic signals extracted from elderly speech while administering a questionnaire. Two hundred and seventy-three Japanese subjects (73 males and 200 females between the ages of 65 and 96) participated in this study. The authors collected speech sounds from segments of dialogue during a revised Hasegawa's dementia scale (HDS-R) examination and talking about topics related to hometown, childhood, and school. The segments correspond to speech sounds from answers to questions regarding birthdate (T1), the name of the subject's elementary school (T2), time orientation (Q2), and repetition of three-digit numbers backward (Q6). As many prosodic features as possible were extracted from each of the speech sounds, including fundamental frequency, formant, and intensity features and mel-frequency cepstral coefficients. They were refined using principal component analysis and/or feature selection. The authors calculated an SPCIR using multiple linear regression analysis. Conclusion: In addition, this study proposes a binary discrimination model of SPCIR using multivariate logistic regression and model selection with receiver operating characteristic curve analysis and reports on the sensitivity and specificity of SPCIR for diagnosis (control vs. MCI/mAD). The study also reports discriminative performances well, thereby suggesting that the proposed approach might be an effective tool for screening the elderly for mAD and MCI.


2013 ◽  
Vol 27 (6) ◽  
pp. 1105-1113 ◽  
Author(s):  
Blake Myers-Schulz ◽  
Maia Pujara ◽  
Richard C. Wolf ◽  
Michael Koenigs

1974 ◽  
Vol 17 (3) ◽  
pp. 352-366 ◽  
Author(s):  
Lorraine M. Monnin ◽  
Dorothy A. Huntington

Normal-speaking and speech-defective children were compared on a speech-sound identification task which included sounds the speech-defective subjects misarticulated and sounds they articulated correctly. The identification task included four tests: [r]-[w] contrasts, acoustically similar contrasts, acoustically dissimilar contrasts, and vowel contrasts. The speech sounds were presented on a continuum from undistorted signals to severely distorted speech signals under conditions which have caused confusion among adults. Subjects included 15 normal-speaking kindergarten children, 15 kindergarten children with defective [r]s, and 15 preschool-age children. The procedure employed was designed to test, in depth, each sound under study and to minimize extraneous variables. Speech-sound identification ability of speech-defective subjects was found to be specific rather than a general deficiency, indicating a positive relationship between production and identification ability.


Author(s):  
Aidan Kehoe ◽  
Flaithri Neff ◽  
Ian Pitt

There are numerous challenges to accessing user assistance information in mobile and ubiquitous computing scenarios. For example, there may be little-or-no display real estate on which to present information visually, the user’s eyes may be busy with another task (e.g., driving), it can be difficult to read text while moving, etc. Speech, together with non-speech sounds and haptic feedback can be used to make assistance information available to users in these situations. Non-speech sounds and haptic feedback can be used to cue information that is to be presented to users via speech, ensuring that the listener is prepared and that leading words are not missed. In this chapter, we report on two studies that examine user perception of the duration of a pause between a cue (which may be a variety of non-speech sounds, haptic effects or combined non-speech sound plus haptic effects) and the subsequent delivery of assistance information using speech. Based on these user studies, recommendations for use of cue pause intervals in the range of 600 ms to 800 ms are made.


2003 ◽  
Vol 23 (37) ◽  
pp. 11516-11522 ◽  
Author(s):  
Joseph T. Devlin ◽  
Josephine Raley ◽  
Elizabeth Tunbridge ◽  
Katherine Lanary ◽  
Anna Floyer-Lea ◽  
...  

1990 ◽  
Vol 55 (4) ◽  
pp. 779-798 ◽  
Author(s):  
Ann Bosma Smit ◽  
Linda Hand ◽  
J. Joseph Freilinger ◽  
John E. Bernthal ◽  
Ann Bird

The purpose of the Iowa Articulation Norms Project and its Nebraska replication was to provide normative information about speech sound acquisition in these two states. An assessment instrument consisting of photographs and a checklist form for narrow phonetic transcription was administered by school-based speech-language pathologists to stratified samples of children in the age range 3–9 years. The resulting data were not influenced by the demographic variables of population density (rural/urban), SES (based on parental education), or state of residence (Iowa/Nebraska); however, sex of the child exerted a significant influence in some of the preschool age groups. The criteria used to determine acceptability of a production appeared to influence outcomes for some speech sounds. Acquisition curves were plotted for individual phoneme targets or groups of targets. These curves were used to develop recommended ages of acquisition for the tested speech sounds, with recommendations based generally on a 90% level of acquisition. Special considerations were required for the phonemes /n s z/.


2020 ◽  
Vol 34 (1) ◽  
pp. 49-68
Author(s):  
Tsunagu Ikeda ◽  
Masanao Morishita

Abstract While stimulus complexity is known to affect the width of the temporal integration window (TIW), a quantitative evaluation of ecologically highly valid stimuli has not been conducted. We assumed that the degree of complexity is determined by the obviousness of the correspondence between the auditory onset and visual movement, and we evaluated the audiovisual complexity using video clips of a piano, a shakuhachi flute and human speech. In Experiment 1, a simultaneity judgment task was conducted using these three types of stimuli. The results showed that the width of TIW was wider for speech, compared with the shakuhachi and piano. Regression analysis revealed that the width of the TIW depended on the degree of complexity. In the second experiment, we investigated whether or not speech-specific factors affected the temporal integration. We used stimuli that either contained natural-speech sounds or white noise. The results revealed that the width of the TIW was wider for natural sentences, compared with white noise. Taken together, the width of the TIW might be affected by both the complexity and speech specificity.


Sign in / Sign up

Export Citation Format

Share Document