acoustic measures
Recently Published Documents


TOTAL DOCUMENTS

322
(FIVE YEARS 92)

H-INDEX

43
(FIVE YEARS 2)

2021 ◽  
Vol 2069 (1) ◽  
pp. 012165
Author(s):  
G Minelli ◽  
G E Puglisi ◽  
A Astolfi ◽  
C Hauth ◽  
A Warzybok

Abstract Since the fundamental phases of the learning process take place in elementary classrooms, it is necessary to guarantee a proper acoustic environment for the listening activity to children immersed in them. In this framework, speech intelligibility is especially important. In order to better understand and objectively quantify the effect of background noise and reverberation on speech intelligibility various models have been developed. Here, a binaural speech intelligibility model (BSIM) is investigated for speech intelligibility predictions in a real classroom considering the effect of talker-to-listener distance and binaural unmasking due to the spatial separation of noise and speech source. BSIM predictions are compared to the well-established room acoustic measures as reverberation time (T30), clarity or definition. Objective acoustical measurements were carried out in one Italian primary school classroom before (T30= 1.43s±0.03 s) and after (T30= 0.45±0.02 s) the acoustical treatment. Speech reception thresholds (SRTs) corresponding to signal-to-noise ratio yielding 80% of speech intelligibility will be obtained through the BSIM simulations using the measured binaural room impulse responses (BRIRs). A focus on the effect of different speech and noise source spatial positions on the SRT values will aim to show the importance of a model able to deal with the binaural aspects of the auditory system. In particular, it will be observed how the position of the noise source influences speech intelligibility when the target speech source lies always in the same position.


2021 ◽  
Vol 67 (6) ◽  
pp. 46-51
Author(s):  
P.M. Kovalchuk ◽  
◽  
T.A. Shydlovska ◽  

We aimed to analyse voice signals in 40 patients with chronic laryngitis elicited by exposure to chemical factors. We ex- amined 20 people with catarrhal chronic laryngitis (group 1), 20 people with subatrophic chronic laryngitis (group 2) and 15 healthy volunteers as controls. All subjects underwent acoustic examination of the voice signal using the software Praat V 4.2.1. We studied acoustic measures as follows: Jitter, Shimmer and NHR (noise-to-harmonics ratio). The analysis of the obtained data revealed statistically significant differ- ences in the average values of Jitter and Shimmer measures, as well as in the ratio of nonharmonic (noise) and harmonic component in the spectrum ( NHR) in patients with chronic laryngitis (groups 1 and 2) compared with controls. In group 1 (chronic catarrhal laryngitis), the average values of acoustic measures such as Jitter, Shimmer and NHR were as follows: Jitter - 0.92 ± 0.1%, Shimmer - 5.31 ± 0.5%, NHR - 0.078 ± 0.04. In group 2 (subatorophic laryngitis), the average values of acoustic measures were: Jitter - 0.67 ± 0.6%, Shimmer - 6.57 ± 0.7% and NHR - 0.028 ± 0.003. The obtained data indicate a pronounced instability of the voice in frequency and amplitude, a significant proportion of the noise component in the spectrum of the voice signal in the examined patients with chronic laryngitis exposed to chemical factors. The most pronounced alterations were found in patients with catarrhal chronic laryngitis. We conclude that the quantitative values of spectral analysis of the voice signal Jitter, Shimmer, NHR may serve as valuable criteria of the degree of voice impair- ment. This may be helpful in determining the effectiveness of rehabilitation measures.


Author(s):  
Marziye Eshghi ◽  
Kathryn P. Connaghan ◽  
Sarah E. Gutz ◽  
James D. Berry ◽  
Yana Yunusova ◽  
...  

Purpose Hypernasality and atypical voice characteristics are common features of dysarthric speech due to amyotrophic lateral sclerosis (ALS). Existing acoustic measures have been developed to primarily target either hypernasality or voice impairment, and the effects of co-occurring hypernasality-voice problems on these measures are unknown. This report explores (a) the extent to which acoustic measures are affected by concurrent perceptually identified hypernasality and voice impairment due to ALS and (b) candidate acoustic measures of early indicators of hypernasality and voice impairment in the presence of multisystem involvement in individuals with ALS. Method Two expert listeners rated severity of hypernasality and voice impairment in sentences produced by individuals with ALS ( n = 27). The samples were stratified based on perceptual ratings: voice/hypernasality asymptomatic, predominantly hypernasal, predominantly voice impairment, and mixed (co-occurring hypernasality and voice impairment). Groups were compared using established acoustic measures of hypernasality (one-third octave analysis) and voice (cepstral/spectral analysis) impairment. Results The one-third octave analysis differentiated all groups; the cepstral peak prominence differentiated all groups except asymptomatic versus mixed, whereas the low-to-high spectral ratio did not differ among groups. Additionally, one-third octave analyses demonstrated promising speech diagnostic potential. Conclusions The results highlight the need to consider the validity of measures in the context of multisubsystem involvement. Our preliminary findings further suggest that the one-third octave analysis may be an optimal approach to quantify hypernasality and voice abnormalities in the presence of multisystem speech impairment. Future evaluation of the diagnostic accuracy of the one-third octave analysis is warranted.


PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0258747
Author(s):  
Abigail R. Bradshaw ◽  
Carolyn McGettigan

Joint speech behaviours where speakers produce speech in unison are found in a variety of everyday settings, and have clinical relevance as a temporary fluency-enhancing technique for people who stutter. It is currently unknown whether such synchronisation of speech timing among two speakers is also accompanied by alignment in their vocal characteristics, for example in acoustic measures such as pitch. The current study investigated this by testing whether convergence in voice fundamental frequency (F0) between speakers could be demonstrated during synchronous speech. Sixty participants across two online experiments were audio recorded whilst reading a series of sentences, first on their own, and then in synchrony with another speaker (the accompanist) in a number of between-subject conditions. Experiment 1 demonstrated significant convergence in participants’ F0 to a pre-recorded accompanist voice, in the form of both upward (high F0 accompanist condition) and downward (low and extra-low F0 accompanist conditions) changes in F0. Experiment 2 demonstrated that such convergence was not seen during a visual synchronous speech condition, in which participants spoke in synchrony with silent video recordings of the accompanist. An audiovisual condition in which participants were able to both see and hear the accompanist in pre-recorded videos did not result in greater convergence in F0 compared to synchronisation with the pre-recorded voice alone. These findings suggest the need for models of speech motor control to incorporate interactions between self- and other-speech feedback during speech production, and suggest a novel hypothesis for the mechanisms underlying the fluency-enhancing effects of synchronous speech in people who stutter.


2021 ◽  
Vol 150 (4) ◽  
pp. A150-A150
Author(s):  
Margaret Cychosz ◽  
Jan R. Edwards ◽  
Nan Bernstein Ratner ◽  
Catherine Torrington Eaton ◽  
Rochelle S. Newman

Author(s):  
Giovanna Castilho Davatz ◽  
Rosiane Yamasaki ◽  
Adriana Hachiya ◽  
Domingos Hiroshi Tsuji ◽  
Arlindo Neto Montagnoli

2021 ◽  
Vol 11 (15) ◽  
pp. 7149
Author(s):  
Ji-Yeoun Lee

This work is focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for pathological voice detection using mel-frequency cepstral coefficients (MFCCs), linear prediction cepstrum coefficients (LPCCs), and higher-order statistics (HOSs) parameters. In total, 518 voice data samples were obtained from the publicly available Saarbruecken voice database (SVD), comprising recordings of 259 healthy and 259 pathological women and men, respectively, and using /a/, /i/, and /u/ vowels at normal pitch. Significant differences were observed between the normal and the pathological voice signals for normalized skewness (p = 0.000) and kurtosis (p = 0.000), except for normalized kurtosis (p = 0.051) that was estimated in the /u/ samples in women. These parameters are useful and meaningful for classifying pathological voice signals. The highest accuracy, 82.69%, was achieved by the CNN classifier with the LPCCs parameter in the /u/ vowel in men. The second-best performance, 80.77%, was obtained with a combination of the FNN classifier, MFCCs, and HOSs for the /i/ vowel samples in women. There was merit in combining the acoustic measures with HOS parameters for better characterization in terms of accuracy. The combination of various parameters and deep learning methods was also useful for distinguishing normal from pathological voices.


2021 ◽  
Author(s):  
Riccardo Fusaroli ◽  
Ruth Grossman ◽  
Niels Bilenberg ◽  
Cathriona Cantio ◽  
Jens Richardt Moellegaard Jepsen ◽  
...  

Acoustic atypicalities in speech production are widely documented in Autism Spectrum Disorder (ASD) and argued to be both a potential factor in atypical social development and potential markers of clinical features. A recent meta-analysis highlighted shortcomings in the field, in particular small sample sizes and study heterogeneity (Fusaroli, Lambrechts, Bang, Bowler, & Gaigg, 2017). We showcase a cumulative yet self-correcting approach to prosody in ASD to overcome these issues. We analyze a cross-linguistic corpus of multiple speech productions in 77 autistic children and adolescents and 72 TD ones (>1000 recordings in Danish and US English). We replicate findings of a minimal cross-linguistically reliable distinctive acoustic profile for ASD (higher pitch and longer pauses) with moderate effect sizes. We identified novel general reliable differences between the two groups for normalized amplitude quotient, maxima dispersion quotient and creakiness. However, all these relations are small, and there is likely no one general extensive acoustic profile characterizing all autistic individuals. We identified reliable and consistent relations of acoustic features with individual differences (age, gender), and clinical feature: speech rate and ADOS sub-scores (Communication, Social, Stereotyped). Besides cumulatively building our understanding of acoustic atypicalities in ASD, the study concretely shows how to use systematic reviews and meta-analyses to guide follow-up studies, both in their design and their statistical inferences. We indicate future directions: larger and more diverse cross-linguistic datasets, use of previous findings as statistical priors, understanding of covariance between acoustic measures, reliance on machine learning procedures, and open science.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Amr Gaballah ◽  
Vijay Parsa ◽  
Daryn Cushnie-Sparrow ◽  
Scott Adams

This paper investigated the performance of a number of acoustic measures, both individually and in combination, in predicting the perceived quality of sustained vowels produced by people impaired with Parkinson’s disease (PD). Sustained vowel recordings were collected from 51 PD patients before and after the administration of the Levodopa medication. Subjective ratings of the overall vowel quality were garnered using a visual analog scale. These ratings served to benchmark the effectiveness of the acoustic measures. Acoustic predictors of the perceived vowel quality included the harmonics-to-noise ratio (HNR), smoothed cepstral peak prominence (CPP), recurrence period density entropy (RPDE), Gammatone frequency cepstral coefficients (GFCCs), linear prediction (LP) coefficients and their variants, and modulation spectrogram features. Linear regression (LR) and support vector regression (SVR) models were employed to assimilate multiple features. Different feature dimensionality reduction methods were investigated to avoid model overfitting and enhance the prediction capabilities for the test dataset. Results showed that the RPDE measure performed the best among all individual features, while a regression model incorporating a subset of features produced the best overall correlation of 0.80 between the predicted and actual vowel quality ratings. This model may therefore serve as a surrogate for auditory-perceptual assessment of Parkinsonian vowel quality. Furthermore, the model may offer the clinician a tool to predict who may benefit from Levodopa medication in terms of enhanced voice quality.


2021 ◽  
pp. 1-49
Author(s):  
Rachel Hargrave ◽  
Amy Southall ◽  
Abby Walker

Two apparently contradictory observations have been made about consonantal voicing in Southern US English: compared to other US varieties, Southern speakers produce more voicing on “voiced” stops, but they also “devoice” word-final /z/ at higher rates. In this paper, regional differences in final /z/ realization within Virginia are investigated. 36 students from Southwest and Northern Virginia were recorded completing tasks designed to elicit /z/-final tokens. Tokens were acoustically analyzed for duration and voicing, and automatically categorized as being [z] or [s] using an HTK forced aligner. At the surface level, the two approaches yield incompatible results: the single acoustic measures suggest Southwest Virginians produce more [z]-like /z/ tokens than Northern Virginians, and the aligner finds that Southern-identifying participants produce the most [s]-like tokens. However, both analyses converge on the importance of following environment: Southwest Virginians are relatively least voiced pre-pausally, and more voiced in other environments. These combined findings confirm previous work showing that Southern “voiced” consonants generally have more voicing than other regional US varieties but also suggest that the dialect may exhibit greater phrase-final fortition. There are also differences within Southwest Virginian speakers based on differences in their rurality, or in their orientation to the South.


Sign in / Sign up

Export Citation Format

Share Document