scholarly journals Timbre-based machine learning of clustering Chinese and Western Hip Hop music

2021 ◽  
Author(s):  
Rolf Bader ◽  
Axel Zielke ◽  
Jonas Franke

Chinese and Western Hip Hop musical pieces are clustered using timbre-based Music Information Retrieval (MIR) and machine learning (ML) algorithms. Psychoacoustically motivated algorithms extracting timbre features such as spectral centroid, roughness, sharpness, sound pressure level (SPL), flux, etc. were extracted form 38 contemporary Chinese and 38 Western 'classical' (USA, Germany, France, Great Britain) Hip Hop pieces. All features were integrated over the pieces with respect to mean and standard deviation. A Kohonen self-organizing map, as integrated in the Computational Music and Sound Archive (COMSAR\cite{COMSAR}) and apollon\cite{apollon} framework was used to train different combinations of feature vectors in their mean and standard deviation integrations. No mean was able to cluster the corpora. Still SPL standard deviation perfectly separated Chinese and Western pieces. Spectral flux, sharpness, and spread standard deviation created two sub-cluster within the Western corpus, where only Western pieces had strong values there. Spectral centroid std did sub-cluster the Chinese Hip Hop pieces, where again only Chinese pieces had strong values. These findings point to different production, composition, or mastering strategies. E.g. the clear SPL-caused clusters point to the loudness-war of contemporary mastering, using massive compression to achieve high perceived loudness.

2021 ◽  
Author(s):  
Rolf Bader ◽  
Michael Blaß ◽  
Jonas Franke

The music of Northern Myanmar Kachin ethnic group is compared to the music of western China, Xijiang based Uyghur music, using timbre and pitch feature extraction and machine learning. Although separated by Tibet, the muqam tradition of Xinjiang might be found in Kachin music due to myths of Kachin origin, as well as linguistic similarities, e.g., the Kachin term 'makan' for a musical piece. Extractions were performed using the apollon and COMSAR (Computational Music and Sound Archiving) frameworks, on which the Ethnographic Sound Recordings Archive (ESRA) is based, using ethnographic recordings from ESRA next to additional pieces. In terms of pitch, tonal systems were compared using Kohonen self-organizing map (SOM), which clearly clusters Kachin and Uyghur musical pieces. This is mainly caused by the Xinjiang muqam music showing just fifth and fourth, while Kachin pieces tend to have a higher fifth and fourth, next to other dissimilarities. Also, the timbre features of spectral centroid and spectral sharpness standard deviation clearly tells Uyghur from Kachin pieces, where Uyghur music shows much larger deviations. Although more features will be compared in the future, like rhythm or melody, these already strong findings might introduce an alternative comparison methodology of ethnic groups beyond traditional linguistic definitions.


1990 ◽  
Vol 55 (3) ◽  
pp. 427-433 ◽  
Author(s):  
Tracey A. Yonick ◽  
Alan R. Reich ◽  
Fred D. Minifie ◽  
B. Raymond Fink

Certain acoustical consequences of endotracheal intubation were examined in 13 male cardiovascular-surgery patients. Each subject recorded three tokens of a sustained vowel 1 day before intubation, 1 day after, upon discharge, and during a follow-up visit. Eight acoustical measures were obtained from the audio-recorded vowels: (a) mean fundamental frequency (Fo), (b) Fostandard deviation, (c) Foperturbation quotient, (d) mean sound pressure level (SPL), (e) SPL standard deviation, (f) SPL perturbation quotient, (g) spectral flatness of the residue signal, and (h) coefficient of excess. Mean Fo, Fostandard deviation, mean SPL, SPL standard deviation, and coefficient of excess did not differ significantly across recording sessions, although certain predictable trends were apparent. Foperturbation quotient, SPL perturbation quotient, and spectral flatness of the residue signal varied significantly across sessions, implying that these acoustical measures may be useful in the identification and monitoring of even minor intubation-related laryngeal trauma.


2020 ◽  
pp. 026921552097626
Author(s):  
Natalia Muñoz-Vigueras ◽  
Esther Prados-Román ◽  
Marie Carmen Valenza ◽  
Maria Granados-Santiago ◽  
Irene Cabrera-Martos ◽  
...  

Objective: To assess the effect of speech and language therapy (SLT) on Hypokinetic dysarthria (HD) in Parkinson’s disease. Design: Systematic review and meta-analysis of randomized controlled trials. Methods: We performed a literature search of randomized controlled trials using PubMed, Web of Science, Science Direct and Cochrane database (last search October 2020). Quality assessment and risk of bias were assessed using the Downs and Black scale and the Cochrane tool. The data were pooled and a meta-analysis was completed for sound pressure levels, perceptual intelligibility and inflection of voice fundamental frequency. Results: We selected 15 high to moderate quality studies, which included 619 patients with Parkinson’s disease. After pooling the data, 7 studies, which compared different speech language therapies to no treatment, control groups and 3 of their variables, (sound pressure level, semitone standard deviation and perceptual intelligibility) were included in the analysis. Results showed significant differences in favor of SLT for sound pressure level sustained phonation tasks (standard mean difference = 1.79; 95% confidence interval = 0.86, 2.72; p ⩽ 0.0001). Significant results were also observed for sound pressure level and semitone standard deviation in reading tasks (standard mean difference = 1.32; 95% confidence interval = 1.03, 1.61; p ⩽ 0.0001). Additionally, sound pressure levels in monologue tasks showed similar results when SLT was compared to other treatments (standard mean difference = 0.87; 95% confidence interval = 0.46, 1.28; p ⩽ 0.0001). Conclusion: This meta-analysis suggests a beneficial effect of SLT for reducing Hypokinetic Dysarthria in Parkinson’s disease, improving perceptual intelligibility, sound pressure level and semitone standard deviation.


1998 ◽  
Vol 41 (5) ◽  
pp. 1003-1018 ◽  
Author(s):  
Christopher Dromey ◽  
Lorraine Olson Ramig

The purpose of the study was to compare the effects of changing sound pressure level (SPL) and rate on respiratory, phonatory, and articulatory behavior during sentence production. Ten subjects, 5 men and 5 women, repeated the sentence, "I sell a sapapple again," under 5 SPL and 5 rate conditions. From a multi-channel recording, measures were made of lung volume (LV), SPL, fundamental frequency (F 0 ), semitone standard deviation (STSD), and upper and lower lip displacements and peak velocities. Loud speech led to increases in LV initiation, LV termination, F 0 , STSD, and articulatory displacements and peak velocities for both lips. Token-to-token variability in these articulatory measures generally decreased as SPL increased, whereas rate increases were associated with increased lip movement variability. LV excursion decreased as rate increased. F 0 for the men and STSD for both genders increased with rate. Lower lip displacements became smaller for faster speech. The interspeaker differences in velocity change as a function of rate contrasted with the more consistent velocity performance across speakers for changes in SPL. Because SPL and rate change are targeted in therapy for dysarthria, the present data suggest directions for future research with disordered speakers.


2012 ◽  
Vol 37 (4) ◽  
pp. 561-569 ◽  
Author(s):  
María A. Navacerrada ◽  
Cesar Díaz ◽  
Antonio Pedrero

Abstract Knowledge of the uncertainty of measurement of testing results is important when results have to be compared with limits and specifications. In the measurement of sound insulation following standards ISO 140-4 and 140-5 the uncertainty of the final magnitude is mainly associated to the average sound pressure levels L1 and L2 measured. However, the study of sound fields in enclosed spaces is very difficult: there are a wide variety of rooms with different sound fields depending on factors as volume, geometry and materials. A parameter what allows us to quantify the spatial variation of the sound pressure level is the standard deviation of the pressure levels measured at the different positions of the room. Based on the analysis of this parameter some results have been pointed out: we show examples on the influence of the microphone positions and the wall characteristics on the uncertainty of the final magnitudes mainly at the low frequencies regime. In this line, we propose a theoretical calculus of the standard deviation as a combined uncertainty of the standard deviation already proposed in the literature focused in the room geometry and the standard deviation associated to the wall vibrational field.


2020 ◽  
Vol 63 (4) ◽  
pp. 931-947
Author(s):  
Teresa L. D. Hardy ◽  
Carol A. Boliek ◽  
Daniel Aalto ◽  
Justin Lewicke ◽  
Kristopher Wells ◽  
...  

Purpose The purpose of this study was twofold: (a) to identify a set of communication-based predictors (including both acoustic and gestural variables) of masculinity–femininity ratings and (b) to explore differences in ratings between audio and audiovisual presentation modes for transgender and cisgender communicators. Method The voices and gestures of a group of cisgender men and women ( n = 10 of each) and transgender women ( n = 20) communicators were recorded while they recounted the story of a cartoon using acoustic and motion capture recording systems. A total of 17 acoustic and gestural variables were measured from these recordings. A group of observers ( n = 20) rated each communicator's masculinity–femininity based on 30- to 45-s samples of the cartoon description presented in three modes: audio, visual, and audio visual. Visual and audiovisual stimuli contained point light displays standardized for size. Ratings were made using a direct magnitude estimation scale without modulus. Communication-based predictors of masculinity–femininity ratings were identified using multiple regression, and analysis of variance was used to determine the effect of presentation mode on perceptual ratings. Results Fundamental frequency, average vowel formant, and sound pressure level were identified as significant predictors of masculinity–femininity ratings for these communicators. Communicators were rated significantly more feminine in the audio than the audiovisual mode and unreliably in the visual-only mode. Conclusions Both study purposes were met. Results support continued emphasis on fundamental frequency and vocal tract resonance in voice and communication modification training with transgender individuals and provide evidence for the potential benefit of modifying sound pressure level, especially when a masculine presentation is desired.


1986 ◽  
Vol 29 (3) ◽  
pp. 420-424 ◽  
Author(s):  
Michael Dorman ◽  
Ingrid Cedar ◽  
Maureen Hannley ◽  
Marjorie Leek ◽  
Julie Mapes Lindholm

Computer synthesized vowels of 50- and 300-ms duration were presented to normal-hearing listeners at a moderate and high sound pressure level (SPL). Presentation at the high SPL resulted in poor recognition accuracy for vowels of a duration (50 ms) shorter than the latency of the acoustic stapedial reflex. Presentation level had no effect on recognition accuracy for vowels of sufficient duration (300 ms) to elicit the reflex. The poor recognition accuracy for the brief, high intensity vowels was significantly improved when the reflex was preactivated. These results demonstrate the importance of the acoustic reflex in extending the dynamic range of the auditory system for speech recognition.


2020 ◽  
Vol 68 (2) ◽  
pp. 137-145
Author(s):  
Yang Zhouo ◽  
Ming Gao ◽  
Suoying He ◽  
Yuetao Shi ◽  
Fengzhong Sun

Based on the basic theory of water droplets impact noise, the generation mechanism and calculation model of the water-splashing noise for natural draft wet cooling towers were established in this study, and then by means of the custom software, the water-splashing noise was studied under different water droplet diameters and water-spraying densities as well as partition water distribution patterns conditions. Comparedwith the water-splashing noise of the field test, the average difference of the theoretical and the measured value is 0.82 dB, which validates the accuracy of the established theoretical model. The results based on theoretical model showed that, when the water droplet diameters are smaller in cooling tower, the attenuation of total sound pressure level of the water-splashing noise is greater. From 0 m to 8 m away from the cooling tower, the sound pressure level of the watersplashing noise of 3 mm and 6 mm water droplets decreases by 8.20 dB and 4.36 dB, respectively. Additionally, when the water-spraying density becomes twice of the designed value, the sound pressure level of water-splashing noise all increases by 3.01 dB for the cooling towers of 300 MW, 600 MW and 1000 MW units. Finally, under the partition water distribution patterns, the change of the sound pressure level is small. For the R s/2 and Rs/3 partition radius (Rs is the radius of water-spraying area), when the water-spraying density ratio between the outer and inner zone increases from 1 to 3, the sound pressure level of water-splashing noise increases by 0.7 dB and 0.3 dB, respectively.


Sign in / Sign up

Export Citation Format

Share Document