speech prosody
Recently Published Documents


TOTAL DOCUMENTS

255
(FIVE YEARS 70)

H-INDEX

29
(FIVE YEARS 3)

2022 ◽  
Vol 9 ◽  
Author(s):  
Marita K. Everhardt ◽  
Anastasios Sarampalis ◽  
Matt Coler ◽  
Deniz Başkent ◽  
Wander Lowie

When we speak, we can vary how we use our voices. Our speech can be high or low (pitch), loud or soft (loudness), and fast or slow (duration). This variation in pitch, loudness, and duration is called speech prosody. It is a bit like making music. Varying our voices when we speak can express sarcasm or emotion and can even change the meaning of what we are saying. So, speech prosody is a crucial part of spoken language. But how do speakers produce prosody? How do listeners hear and understand these variations? Is it possible to hear and interpret prosody in other languages? And what about people whose hearing is not so good? Can they hear and understand prosodic patterns at all? Let’s find out!


2022 ◽  
Author(s):  
Tatsuya Daikoku ◽  
Shin-Ichiro Kumagaya ◽  
Satsuki Ayaya ◽  
Yukie Nagai

How typically developed (TD) persons modulate their speech rhythm while talking to individuals with autism spectrum disorder (ASD) remains unclear. We aimed to elucidate the characteristics of phonological hierarchy in the verbal communication between ASD individuals and TD persons. TD and ASD respondents were asked by a TD questioner to share their recent experiences on 12 topics. We included 87 samples of ASD-directed speech (from TD questioner to ASD respondent), 72 of TD-directed speech (from TD questioner to TD respondent), 74 of ASD speech (from ASD respondent to TD questioner), and 55 of TD speech (from TD respondent to TD questioner). We analysed the amplitude modulation structures of speech waveforms using probabilistic amplitude demodulation based on Bayesian inference and found similarities between ASD speech and ASD-directed speech and between TD speech and TD-directed speech. Prosody and the interactions between prosodic, syllabic, and phonetic rhythms were significantly weaker in ASD-directed and ASD speech than those in TD-directed and TD speech, respectively. ASD speech showed weaker dynamic processing from higher to lower phonological bands (e.g. from prosody to syllable) than TD speech. The results indicate that TD individuals may spontaneously adapt their phonological characteristics to those of ASD speech.


2021 ◽  
Vol 39 (2) ◽  
pp. 103-117
Author(s):  
Laurène Léard-Schneider ◽  
Yohana Lévêque

The present study aimed to examine the perception of music and prosody in patients who had undergone a severe traumatic brain injury (TBI). Our second objective was to describe the association between music and prosody impairments in clinical individual presentations. Thirty-six patients who were out of the acute phase underwent a set of music and prosody tests: two subtests of the Montreal Battery for Evaluation of Amusia evaluating respectively melody (scale) and rhythm perception, two subtests of the Montreal Evaluation of Communication on prosody understanding in sentences, and two other tests evaluating prosody understanding in vowels. Forty-two percent of the patients were impaired in the melodic test, 51% were impaired in the rhythmic test, and 71% were impaired in at least one of the four prosody tests. The amusic patients performed significantly worse than non-amusics on the four prosody tests. This descriptive study shows for the first time the high prevalence of music deficits after severe TBI. It also suggests associations between prosody and music impairments, as well as between linguistic and emotional prosody impairments. Causes of these impairments remain to be explored.


2021 ◽  
pp. 030573562110501
Author(s):  
Alice Mado Proverbio ◽  
Elisabetta Piotti

Do speech and music understanding share common neural mechanisms? Here, brain bioelectrical activity was recorded in healthy participants listening to music obtained by digitally transforming speech into viola music. Sentences originally had a positive or negative affective prosody. The aim was to investigate if the emotional content of music was processed similarly to the affective prosody of speech. EEG was recorded from 128 electrodes in 20 healthy students. Participants had to detect rare neutral piano sounds while ignoring viola melodies. Negative affective valence of stimuli increased the amplitude of frontal P300 and N400 components of ERPs, while positive valence enhanced a late inferior frontal positivity. Similar markers were previously found for the processing of positive versus negative music, vocalizations, and speech. Source reconstruction showed that negative music activated the right superior temporal gyrus and cingulate cortex, while positive music activated the left middle and inferior temporal gyrus and the inferior frontal cortex. An integrated model is proposed of a possible common network for processing the emotional content of music, vocalizations, and speech, which might explain some universal and relatively innate brain reaction to music.


2021 ◽  
Vol 8 (11) ◽  
Author(s):  
Leonor Neves ◽  
Marta Martins ◽  
Ana Isabel Correia ◽  
São Luís Castro ◽  
César F. Lima

The human voice is a primary channel for emotional communication. It is often presumed that being able to recognize vocal emotions is important for everyday socio-emotional functioning, but evidence for this assumption remains scarce. Here, we examined relationships between vocal emotion recognition and socio-emotional adjustment in children. The sample included 141 6- to 8-year-old children, and the emotion tasks required them to categorize five emotions (anger, disgust, fear, happiness, sadness, plus neutrality), as conveyed by two types of vocal emotional cues: speech prosody and non-verbal vocalizations such as laughter. Socio-emotional adjustment was evaluated by the children's teachers using a multidimensional questionnaire of self-regulation and social behaviour. Based on frequentist and Bayesian analyses, we found that, for speech prosody, higher emotion recognition related to better general socio-emotional adjustment. This association remained significant even when the children's cognitive ability, age, sex and parental education were held constant. Follow-up analyses indicated that higher emotional prosody recognition was more robustly related to the socio-emotional dimensions of prosocial behaviour and cognitive and behavioural self-regulation. For emotion recognition in non-verbal vocalizations, no associations with socio-emotional adjustment were found. A similar null result was obtained for an additional task focused on facial emotion recognition. Overall, these results support the close link between children's emotional prosody recognition skills and their everyday social behaviour.


2021 ◽  
Author(s):  
Oliver Niebuhr ◽  
Ronald Böck ◽  
Joseph A. Allen
Keyword(s):  

2021 ◽  
Vol 55 (1) ◽  
Author(s):  
Maarten Renckens ◽  
Leo De Raeve ◽  
Erik Nuyts ◽  
María Pérez Mena ◽  
Ann Bessemans

Type is a wonderful tool to represent speech visually. Therefore, it can provide deaf individuals the information that they miss auditorily. Still, type does not represent all the information available in speech: it misses an exact indication of prosody. Prosody is the motor of expressive speech through speech variations in loudness, duration, and pitch. The speech of deaf readersis often less expressive because deafness impedes the perception and production of prosody. Support can be provided by visual cues that provide information about prosody—visual prosody—supporting both the training of speech variations and expressive reading. We will describe the influence of visual prosody on the reading expressiveness of deaf readers between age 7 and 18 (in this study, ‘deaf readers’ means persons with any kind of hearing loss, with or without hearing devices, who still developed legible speech). A total of seven cues visualize speech variations: a thicker/thinner font corresponds with a louder/quieter voice; a wider/narrower font relates to a lower/faster speed; a font raised above/lowered below the baseline suggests a higher/lower pitch; wider spaces between words suggest longer pauses. We evaluated the seven cues with questionnaires and a reading aloud test. Deaf readers relate most cues to the intendedspeech variation and read most of them aloud correctly. Only the raised cue is di#cult to connect to the intended speech variation at first, and a faster speed and lower pitch prove challenging to vocalize. Despite those two difficulties, this approach to visual prosody is elective in supporting speech prosody. The applied materials can form an example for typographers, type designers, graphic designers, teachers, speech therapists, and researchers developing expressive reading materials.


Author(s):  
Suzhen Wang ◽  
Lincheng Li ◽  
Yu Ding ◽  
Changjie Fan ◽  
Xin Yu

We propose an audio-driven talking-head method to generate photo-realistic talking-head videos from a single reference image. In this work, we tackle two key challenges: (i) producing natural head motions that match speech prosody, and (ii)} maintaining the appearance of a speaker in a large head motion while stabilizing the non-face regions. We first design a head pose predictor by modeling rigid 6D head movements with a motion-aware recurrent neural network (RNN). In this way, the predicted head poses act as the low-frequency holistic movements of a talking head, thus allowing our latter network to focus on detailed facial movement generation. To depict the entire image motions arising from audio, we exploit a keypoint based dense motion field representation. Then, we develop a motion field generator to produce the dense motion fields from input audio, head poses, and a reference image. As this keypoint based representation models the motions of facial regions, head, and backgrounds integrally, our method can better constrain the spatial and temporal consistency of the generated videos. Finally, an image generation network is employed to render photo-realistic talking-head videos from the estimated keypoint based motion fields and the input reference image. Extensive experiments demonstrate that our method produces videos with plausible head motions, synchronized facial expressions, and stable backgrounds and outperforms the state-of-the-art.


Author(s):  
Roza G. Kamiloğlu ◽  
George Boateng ◽  
Alisa Balabanova ◽  
Chuting Cao ◽  
Disa A. Sauter

AbstractThe human voice communicates emotion through two different types of vocalizations: nonverbal vocalizations (brief non-linguistic sounds like laughs) and speech prosody (tone of voice). Research examining recognizability of emotions from the voice has mostly focused on either nonverbal vocalizations or speech prosody, and included few categories of positive emotions. In two preregistered experiments, we compare human listeners’ (total n = 400) recognition performance for 22 positive emotions from nonverbal vocalizations (n = 880) to that from speech prosody (n = 880). The results show that listeners were more accurate in recognizing most positive emotions from nonverbal vocalizations compared to prosodic expressions. Furthermore, acoustic classification experiments with machine learning models demonstrated that positive emotions are expressed with more distinctive acoustic patterns for nonverbal vocalizations as compared to speech prosody. Overall, the results suggest that vocal expressions of positive emotions are communicated more successfully when expressed as nonverbal vocalizations compared to speech prosody.


Sign in / Sign up

Export Citation Format

Share Document