spectral centroid
Recently Published Documents


TOTAL DOCUMENTS

94
(FIVE YEARS 31)

H-INDEX

14
(FIVE YEARS 1)

Author(s):  
Chieh Kao ◽  
Maria D. Sera ◽  
Yang Zhang

Purpose: The aim of this study was to investigate infants' listening preference for emotional prosodies in spoken words and identify their acoustic correlates. Method: Forty-six 3- to-12-month-old infants ( M age = 7.6 months) completed a central fixation (or look-to-listen) paradigm in which four emotional prosodies (happy, sad, angry, and neutral) were presented. Infants' looking time to the string of words was recorded as a proxy of their listening attention. Five acoustic variables—mean fundamental frequency (F0), word duration, intensity variation, harmonics-to-noise ratio (HNR), and spectral centroid—were also analyzed to account for infants' attentiveness to each emotion. Results: Infants generally preferred affective over neutral prosody, with more listening attention to the happy and sad voices. Happy sounds with breathy voice quality (low HNR) and less brightness (low spectral centroid) maintained infants' attention more. Sad speech with shorter word duration (i.e., faster speech rate), less breathiness, and more brightness gained infants' attention more than happy speech did. Infants listened less to angry than to happy and sad prosodies, and none of the acoustic variables were associated with infants' listening interests in angry voices. Neutral words with a lower F0 attracted infants' attention more than those with a higher F0. Neither age nor sex effects were observed. Conclusions: This study provides evidence for infants' sensitivity to the prosodic patterns for the basic emotion categories in spoken words and how the acoustic properties of emotional speech may guide their attention. The results point to the need to study the interplay between early socioaffective and language development.


Metals ◽  
2021 ◽  
Vol 11 (12) ◽  
pp. 1951
Author(s):  
Wanwei Xu ◽  
Xue Bai ◽  
Zhonggang Sun ◽  
Xin Meng ◽  
Zhongming Guo

The presence of large microtextured clusters (MTC) composed of small α-phase crystallites with preferred crystallographic orientations in 3D printed near-α titanium alloys leads to poor mechanical and fatigue properties. It is therefore crucial to characterize the size of MTCs nondestructively. Ti6Al4V/B4C composite materials are manufactured using Laser Melting Deposition (LMD) technology by adding an amount of nano-sized B4C particles to the original Ti6Al4V powder. TiB and TiC reinforcements precipitating at grain boundaries stimulate the elongated α crystallites and coarse columnar MTCs to equiaxed transition, and microstructures composed of approximately equiaxed MTCs with different mean sizes of 11–50 μm are obtained. Theoretical models for scattering-induced attenuation and centroid frequency downshift of ultrasonic waves propagating in such a polycrystalline medium are presented. It is indicated that, the studied composite material has an extremely narrow crystallographic orientation distribution width, i.e., a strong degree of anisotropy in MTCs. Therefore, MTCs make a dominant contribution to the total scattering-induced attenuation and spectral centroid frequency downshift, while the contribution of fine α-phase crystallites is insignificant. Laser ultrasonic inspection is performed, and the correlation between laser-generated ultrasonic wave properties and microstructural properties of the Ti6Al4V/B4C composites is analyzed. Results have shown that the deviation between the experimentally measured ultrasonic velocity and the theoretical result determined by the Voigt-averaged velocity in each crystallite is no more than 2.23%, which is in good agreement with the degree of macroscopically anisotropy in the composite specimens. The ultrasonic velocity seems to be insensitive to the size of MTCs, while the spectral centroid frequency downshift is approximately linear to the mean size of MTCs with a goodness-of-fit (R2) up to 0.99. Actually, for a macroscopically untextured near-α titanium alloy with a relatively narrow crystallographic orientation distribution, the ultrasonic velocity is not correlated with the properties of MTCs, by contrast, the central frequency downshift is dominated by the size and morphology of MTCs, showing great potentials in grain size evaluation.


2021 ◽  
Vol 11 (14) ◽  
pp. 6461
Author(s):  
Andy Pearce ◽  
Tim Brookes ◽  
Russell Mason

Brightness is one of the most common timbral descriptors used for searching audio databases, and is also the timbral attribute of recorded sound that is most affected by microphone choice, making a brightness prediction model desirable for automatic metadata generation. A model, sensitive to microphone-related as well as source-related brightness, was developed based on a novel combination of the spectral centroid and the ratio of the total magnitude of the signal above 500 Hz to that of the full signal. This model performed well on training data (r = 0.922). Validating it on new data showed a slight gradient error but good linear correlation across source types and overall (r = 0.955). On both training and validation data, the new model out-performed metrics previously used for brightness prediction.


2021 ◽  
Author(s):  
Venkatesh S ◽  
Saravanakumar R ◽  
SureshKumar M ◽  
sivakumar B ◽  
veeramakali T

Abstract Some technologies are technologically advanced to provide security from illegal copying. Two complementary methods are encryption and watermarking. Encryption safeguards the information throughout the communication from the sender to the receiver. The data might present a distorted image after receipt and subsequent decryption. Watermarking complements encryption through embedding data openly into the image. Therefore, the watermark continuously remains existing in the data. A digital watermark is a category of indication secretly entrenched in a noise-tolerant signal similar to audio or else image information. It is indeed applied to distinguish copyright possession of such signal. Computer-aided hiding of the given digitized information in a carrier is known as watermarking. Digital watermarks possibly will be employed to validate the authenticity or integrity of a carrier signal or to determine source uniqueness. It is evidently applied for determining copyright contraventions and aimed at banknote verification. Analogous to traditional watermarks, digital watermarks are unique only beneath certain conditions. Once a digital watermark varies a carrier in a manner that it turns out to be noticeable, formerly it is of no use. The media will be visible by traditional watermarks (similar to images or else video) but the signal might be pictures, video, audio, texts or 3D models in digital watermarking. A signal can transmit some different watermarks at the equivalent time. Image watermarking is achieved in this study using two methods known as Hidden Markov Tree–Contourlet Wavelet Transform (HMT-CWT) and Haar wavelet transform – Discrete Fourier transform (HWT-DFT). In the next HWT-DFT method, a video is given as an input and it is split into two halves (audio and image). The audio is de-watermarked through Spectral Centroid Wavelet Transform and enhanced by utilizing Firefly procedure. The images is handled through HWT in addition to DFT. Then the output watermarked images and audio combined together to form a watermarked video. The obtained video is de-watermarked to produce the original copy of the video. The process of getting back the original copy by removing the watermark from the video is called as de-watermarking.


2021 ◽  
Vol 57 (2) ◽  
pp. 356-360
Author(s):  
Divya Bharathi Krishnamani ◽  
◽  
P. A. Karthick ◽  
Ramakrishnan Swaminathan ◽  
◽  
...  

Surface electromyography (sEMG) is a technique which noninvasively acquires the electrical activity of muscles and is widely used for muscle fatigue assessment. This study attempts to characterize the dynamic muscle fatiguing contractions with frequency bands of sEMG signals and a geometric feature namely the instantaneous spectral centroid (ISC). The sEMG signals are acquired from biceps brachii muscle of fifty-eight healthy volunteers. The frequency components of the signals are divided into low frequency band (10-45Hz), medium frequency band (55-95Hz) and high frequency band (95-400Hz). The signals associated with these bands are subjected to a Hilbert transform and analytical shape representation is obtained in the complex plane. The ISC feature is extracted from the resultant shape of the three frequency bands. The results show that this feature can differentiate the muscle nonfatigue and fatigue conditions (p<0.05). It is found the values of ISC is lower in fatigue conditions irrespective of frequency bands. It is also observed that the coefficient of variation of ISC in the low frequency band is less and it demonstrates the ability of handling inter-subject variations. Therefore, the proposed geometric feature from the low frequency band of sEMG signals could be considered for detecting muscle fatigue in various neuromuscular conditions.


2021 ◽  
Author(s):  
Rolf Bader ◽  
Axel Zielke ◽  
Jonas Franke

Chinese and Western Hip Hop musical pieces are clustered using timbre-based Music Information Retrieval (MIR) and machine learning (ML) algorithms. Psychoacoustically motivated algorithms extracting timbre features such as spectral centroid, roughness, sharpness, sound pressure level (SPL), flux, etc. were extracted form 38 contemporary Chinese and 38 Western 'classical' (USA, Germany, France, Great Britain) Hip Hop pieces. All features were integrated over the pieces with respect to mean and standard deviation. A Kohonen self-organizing map, as integrated in the Computational Music and Sound Archive (COMSAR\cite{COMSAR}) and apollon\cite{apollon} framework was used to train different combinations of feature vectors in their mean and standard deviation integrations. No mean was able to cluster the corpora. Still SPL standard deviation perfectly separated Chinese and Western pieces. Spectral flux, sharpness, and spread standard deviation created two sub-cluster within the Western corpus, where only Western pieces had strong values there. Spectral centroid std did sub-cluster the Chinese Hip Hop pieces, where again only Chinese pieces had strong values. These findings point to different production, composition, or mastering strategies. E.g. the clear SPL-caused clusters point to the loudness-war of contemporary mastering, using massive compression to achieve high perceived loudness.


2021 ◽  
Author(s):  
Rolf Bader ◽  
Michael Blaß ◽  
Jonas Franke

The music of Northern Myanmar Kachin ethnic group is compared to the music of western China, Xijiang based Uyghur music, using timbre and pitch feature extraction and machine learning. Although separated by Tibet, the muqam tradition of Xinjiang might be found in Kachin music due to myths of Kachin origin, as well as linguistic similarities, e.g., the Kachin term 'makan' for a musical piece. Extractions were performed using the apollon and COMSAR (Computational Music and Sound Archiving) frameworks, on which the Ethnographic Sound Recordings Archive (ESRA) is based, using ethnographic recordings from ESRA next to additional pieces. In terms of pitch, tonal systems were compared using Kohonen self-organizing map (SOM), which clearly clusters Kachin and Uyghur musical pieces. This is mainly caused by the Xinjiang muqam music showing just fifth and fourth, while Kachin pieces tend to have a higher fifth and fourth, next to other dissimilarities. Also, the timbre features of spectral centroid and spectral sharpness standard deviation clearly tells Uyghur from Kachin pieces, where Uyghur music shows much larger deviations. Although more features will be compared in the future, like rhythm or melody, these already strong findings might introduce an alternative comparison methodology of ethnic groups beyond traditional linguistic definitions.


Sign in / Sign up

Export Citation Format

Share Document