prosodic features
Recently Published Documents


TOTAL DOCUMENTS

626
(FIVE YEARS 166)

H-INDEX

32
(FIVE YEARS 3)

2022 ◽  
Vol 14 (2) ◽  
pp. 614
Author(s):  
Taniya Hasija ◽  
Virender Kadyan ◽  
Kalpna Guleria ◽  
Abdullah Alharbi ◽  
Hashem Alyami ◽  
...  

Speech recognition has been an active field of research in the last few decades since it facilitates better human–computer interaction. Native language automatic speech recognition (ASR) systems are still underdeveloped. Punjabi ASR systems are in their infancy stage because most research has been conducted only on adult speech systems; however, less work has been performed on Punjabi children’s ASR systems. This research aimed to build a prosodic feature-based automatic children speech recognition system using discriminative modeling techniques. The corpus of Punjabi children’s speech has various runtime challenges, such as acoustic variations with varying speakers’ ages. Efforts were made to implement out-domain data augmentation to overcome such issues using Tacotron-based text to a speech synthesizer. The prosodic features were extracted from Punjabi children’s speech corpus, then particular prosodic features were coupled with Mel Frequency Cepstral Coefficient (MFCC) features before being submitted to an ASR framework. The system modeling process investigated various approaches, which included Maximum Mutual Information (MMI), Boosted Maximum Mutual Information (bMMI), and feature-based Maximum Mutual Information (fMMI). The out-domain data augmentation was performed to enhance the corpus. After that, prosodic features were also extracted from the extended corpus, and experiments were conducted on both individual and integrated prosodic-based acoustic features. It was observed that the fMMI technique exhibited 20% to 25% relative improvement in word error rate compared with MMI and bMMI techniques. Further, it was enhanced using an augmented dataset and hybrid front-end features (MFCC + POV + Fo + Voice quality) with a relative improvement of 13% compared with the earlier baseline system.


Author(s):  
Basim Jubair Kadhim ◽  
Mujtaba Mohammedali Yahya Al-Hilo

This study deals with catharsis as a cognitive stylistic device used to expel fear and anxiety for the sake of changing the audience toward better by preachers in Husseini discourse – Hussein is a grand Shiite Muslim leader. It aims to explicate the exploitation of catharsis by Husseini preachers and the conceptualization of such phenomenon by the audience. The study adapts the emotion model developed by Kovecses (2000); five stages are utilized: cause of the emotion, emotion, control, loss of control, and behavioral response. Twenty Husseini sermons are analyzed according to the stages of the model. Consequently, the study has come up with considerable conclusions. Chief among them are: Husseini preachers pragmatically use prosodic features to convey catharsis. A further conclusion is that catharsis is utilized by Husseini preachers as a strategy to teach the audience all the objectives of the Husseini revolution and to connect the objectives to this age for the sake of reform, using the fear that can modulate the human behavior.


2021 ◽  
Vol 23 (1) ◽  
pp. 20-46
Author(s):  
Ling Zhang

Abstract Cantonese is a syllable-timed language: that is, the syllable is the isochronous unit of speech. However, in Cantonese, there is a type of closed syllable with the stop codas [-p], [-t], or [-k] (i.e. syllables with the so called “entering-tones”) which sound much shorter than other syllables. On the surface, the shorter duration of stop syllables and the general prosodic feature of syllable-isochrony seem to conflict. This study conducted acoustic investigations of stop syllables in Cantonese in different contexts (i.e. in isolated form, in disyllabic words, and in disyllabic words located at the beginning, middle, and final positions of sentences). The results showed that stop syllables alone are shorter than non-stop syllables in various contexts. However, in disyllabic words or in sentences, there is a supplementary lengthening effect immediately after the stop syllables: there is more acoustic blank, and in some circumstances the initial of the following syllable is lengthened. Therefore, we propose that the phonetic realization of syllable isochrony is beyond the syllable itself in Cantonese. The results and discussions of this study may also shed light on the problem of the disappearance of “entering tones” from various Chinese dialects.


Author(s):  
Michal Marmorstein ◽  
Nadav Matalon

Abstract Large conversational activities (e.g., storytelling) necessitate a suspension of ordinary turn-taking rules. In the resulting constellation of main speaker and recipient, minimal displays of cooperative recipiency become relevant at particular junctures. We investigate this mechanism by focusing on the Egyptian Arabic particle ʔāh ‘yeah’ when thus used. We observe that tokens of ʔāh are mobilized by main speakers via the opening of prosodic slots at local pragmatic completion points. The prosodic design of the particle at these points is sensitive to prior talk and displays recipients’ alignment at the structural, action-sequential, and relational levels. This is done through variation of three prosodic features, namely, rhythm-based timing, pitch configuration, and prominence. The measure of alignment proposed by ʔāh is implicative for the continuation of the turn. While smooth progression suggests that ʔāh is understood to be sufficiently fitted and aligned, expansions are traceable to a departure from the terms set by prior talk, which can be heard to indicate lesser alignment. We propose to view ʔāh response tokens as a subset of positionally sensitive responses to part-of-activity actions that are crucial for the co-accomplishment of a large activity.


Author(s):  
Bistra Dimitrova ◽  
◽  
Snezhina Dimitrova ◽  

The paper presents the results from a study of the interaction between intonation and information structure in SVO and OVS sentences with communicatively (un)marked alignment of information structure elements. We analyzed the prosodic features of pre-nuclear and nuclear pitch accents. The information structure elements were characterized using Steedman’s (2000) model which classifies sentence constituents as belonging to one of the following categories: theme-background, theme-focus, rheme-background and rheme-focus. Our study found that unmarked and marked alignment has no effect on the pitch range of the rheme-focus. In cases of communicatively unmarked alignment, the pitch range of the theme-background (and rheme-background) group in OVS sentences is wider than in SVO sentences. Word order has no effect on the duration of the accented syllable. Topicalized constituents belonging to the theme-background in OVS sentences with unmarked alignment form separate intermediate phrases. In cases of marked alignment, the rheme-focus ends with a phrase accent and sometimes a pause. The rheme-background and rheme-focus always take a pitch accent, whereas the theme-background is marked by a pitch accent only in cases of communicatively unmarked alignment. The theme-background is deaccented when the sentence is communicatively marked.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yi Shan

This study briefly describes the prosodic and pragmatic characteristics of the discourse marker ni zhidao (“you know”) in spoken Chinese. It mainly explores the interaction between its prosody and pragmatics using instrumental methods. It is the first attempt to use acoustic and statistical analysis to examine the prosodic parameters and prosody-pragmatics interaction of a Chinese discourse marker. The corpus includes 71 interview conversations totaling more than 30 h, in which 490 discourse marker tokens of ni zhidao were found. Ni zhidao mainly fulfilled four broad pragmatic functions of initiating a topic when occurring sentence-initially, of holding the floor when appearing within clauses, of marking coherence when making its presence between clauses, and of projecting attitudes and feelings when showing up sentence-finally. Drawing on the algorithm of random forest in R, the acoustic and statistical analysis of the performance of ni zhidao in these four functions showed that its prosodic features, including duration, tempo, pre-pause, post-pause, F0, and intensity, significantly relate to and thus imply its pragmatic functions, that the interaction between its prosody and pragmatics can be modeled statistically, and that the established pragmatics classification model based on prosody can be utilized to predict the pragmatics of ni zhidao. These findings seem to strengthen the hypothesis that prosodic variables play a role in deciphering the different pragmatic functions of ni zhidao. This study uses prosodic evidence to more objectively reveal not only the part of ni zhidao in dynamically constructing and embodying specific contexts but also its communicative functions and the underlying meta-pragmatic awareness behind it. This study breaks through the limitations of traditional discourse marker research, which mainly relies on context and discourse characteristics for subjective reasoning.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 584-585
Author(s):  
Clarissa Shaw ◽  
Caitlin Ward ◽  
Jean Gordon ◽  
Kristine Williams ◽  
Keela Herr

Abstract Rejection of care (RoC) by persons living with dementia (PLWD) has yet to be measured in the hospital setting. Elderspeak communication (i.e., baby talk or infantilization) is an established antecedent to RoC in nursing home dementia care. The purpose of this study was to determine the impact of elderspeak communication by nursing staff on RoC by hospitalized PLWD. Eighty-eight care encounters between 16 PLWD and 53 nursing staff were observed for RoC using the Resistiveness to Care scale in one Midwestern hospital. Audio-recordings of the care encounters were transcribed verbatim and coded for semantic, pragmatic, and prosodic features of elderspeak. Over one-quarter (28.7%) of the duration of nursing staff speech towards PLWD constituted elderspeak and nearly all (96.6%) of the 88 care encounters included some elderspeak. Almost half of the observations (48.9%) included RoC behaviors by PLWD. Rejection of care was modeled as present or absent using a GEE method. Characteristics of the PLWD (e.g., pain, delirium) and the observation (e.g., environmental simulation) were evaluated as potential covariates. After adjusting for pain, length of stay, and gender, a 15-percentage point decrease in the proportion of elderspeak communication by nursing staff reduced the odds of RoC by 62% (OR=0.38, 95% CI=0.21-0.71, p=.002,) and a one unit decrease in pain reduced the odds of RoC by 63% (OR=0.37, 95% CI=0.22-0.63, p<.001). This study identified that pain and elderspeak are two modifiable factors of RoC. Person-centered interventions are needed that address communication practices and approaches to pain management for hospitalized PLWD.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Seyedeh Zahra Asghari ◽  
Sajjad Farashi ◽  
Saeid Bashirian ◽  
Ensiyeh Jenabi

AbstractIn this systematic review, we analyzed and evaluated the findings of studies on prosodic features of vocal productions of people with autism spectrum disorder (ASD) in order to recognize the statistically significant, most confirmed and reliable prosodic differences distinguishing people with ASD from typically developing individuals. Using suitable keywords, three major databases including Web of Science, PubMed and Scopus, were searched. The results for prosodic features such as mean pitch, pitch range and variability, speech rate, intensity and voice duration were extracted from eligible studies. The pooled standard mean difference between ASD and control groups was extracted or calculated. Using I2 statistic and Cochrane Q-test, between-study heterogeneity was evaluated. Furthermore, publication bias was assessed using funnel plot and its significance was evaluated using Egger’s and Begg’s tests. Thirty-nine eligible studies were retrieved (including 910 and 850 participants for ASD and control groups, respectively). This systematic review and meta-analysis showed that ASD group members had a significantly larger mean pitch (SMD =  − 0.4, 95% CI [− 0.70, − 0.10]), larger pitch range (SMD =  − 0.78, 95% CI [− 1.34, − 0.21]), longer voice duration (SMD =  − 0.43, 95% CI [− 0.72, − 0.15]), and larger pitch variability (SMD = − 0.46, 95% CI [− 0.84, − 0.08]), compared with typically developing control group. However, no significant differences in pitch standard deviation, voice intensity and speech rate were found between groups. Chronological age of participants and voice elicitation tasks were two sources of between-study heterogeneity. Furthermore, no publication bias was observed during analyses (p > 0.05). Mean pitch, pitch range, pitch variability and voice duration were recognized as the prosodic features reliably distinguishing people with ASD from TD individuals.


EL LE ◽  
2021 ◽  
Author(s):  
David Newbold

The recent (2018) Companion Volume to the Common European Framework offers an overhaul of many of the scales of descriptors, including, notably, phonology. A single, skeletal, scale for ‘phonological control’ is replaced by three scales, describing overall control, sound articulation, and prosodic features. In each of these, the focus has become intelligibility, rather than proximity to a native speaker accent. In this article I examine the development of pronunciation teaching since the communicative revolution, and the rise of English as a lingua franca (ELF) in which intelligibility is crucial. The article concludes with a reflection on how (if at all) the revised framework could inform an ELF aware assessment of pronunciation.


Sign in / Sign up

Export Citation Format

Share Document