Time-Frequency Algorithm of Audio Signal Compression

Author(s):  
E.V. Rabinovich ◽  
A.V. Shekhirev
2021 ◽  
Author(s):  
Shahrzad Esmaili

This research focuses on the application of joint time-frequency (TF) analysis for watermarking and classifying different audio signals. Time frequency analysis which originated in the 1930s has often been used to model the non-stationary behaviour of speech and audio signals. By taking into consideration the human auditory system which has many non-linear effects and its masking properties, we can extract efficient features from the TF domain to watermark or classify signals. This novel audio watermarking scheme is based on spread spectrum techniques and uses content-based analysis to detect the instananeous mean frequency (IMF) of the input signal. The watermark is embedded in this perceptually significant region such that it will resist attacks. Audio watermarking offers a solution to data privacy and helps to protect the rights of the artists and copyright holders. Using the IMF, we aim to keep the watermark imperceptible while maximizing its robustness. In this case, 25 bits are embedded and recovered witin a 5 s sample of an audio signal. This scheme has shown to be robust against various signal processing attacks including filtering, MP3 compression, additive moise and resampling with a bit error rate in the range of 0-13%. In addition content-based classification is performed using TF analysis to classify sounds into 6 music groups consisting of rock, classical, folk, jazz and pop. The features that are extracted include entropy, centroid, centroid ratio, bandwidth, silence ratio, energy ratio, frequency location of minimum and maximum energy. Using a database of 143 signals, a set of 10 time-frequncy features are extracted and an accuracy of classification of around 93.0% using regular linear discriminant analysis or 92.3% using leave one out method is achieved.


Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 172
Author(s):  
Mariam Yiwere ◽  
Eun Joo Rhee

This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance classes—one meter, two meters, and three meters—irrespective of its orientation angle. For the purpose of training, we create a dataset by recording audio signals at the three different distances and three angles in different rooms. The CRNN is trained using time-frequency representations of the audio signals. Specifically, we transform the audio signals into log-scaled mel spectrograms, allowing the convolutional layers to extract the appropriate features required for the classification. When trained and tested with combined datasets from all rooms, the proposed model exhibits high classification accuracies; however, training and testing the model in separate rooms results in lower accuracies, indicating that further study is required to improve the method’s generalization ability. Our experimental results demonstrate that it is possible to estimate sound source distances in known environments by classification using the log-scaled mel spectrogram.


Author(s):  
Hossam Mohamed Kasem ◽  
Osumu Muta ◽  
Maha Elsabrouty ◽  
Hiroshi Frukawa

2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Ian R. Prechtl ◽  
Steven W. Day ◽  
Jason R. Kolodziej

This paper studies the acoustic signals of left ventricular assist devices (LVADs) as it relates to machine health. Current LVAD condition monitoring requires examination from trained medical professionals, and is both inefficient and roughly-prognostic. To better quantify a patient's condition, the diagnostic method must be robust, non-invasive, and simple to apply. The concept behind this work is to determine an identifying pattern between the specific acoustics produced by an LVAD with the related overall health of the patient. Due to the cycle-to-cycle variance of heart sounds, the continuous wavelet transform (CWT) is applied to the objective audio signal so that a high resolution spectra is obtained. From this, region specific image features are developed and subsequently used in a support vector machine (SVM) algorithm to classify between health conditions. The preliminary goal is to develop an accurate and non-invasive diagnostic method for determining patient health that can be applied for any LVAD variant. This process is validated through in vitro testing using a DC motor as an LVAD proxy.


2021 ◽  
Vol 5 ◽  
Author(s):  
Ingo Siegert ◽  
Oliver Niebuhr

Remote meetings via Zoom, Skype, or Teams limit the range and richness of nonverbal communication signals. Not just because of the typically sub-optimal light, posture, and gaze conditions, but also because of the reduced speaker visibility. Consequently, the speaker’s voice becomes immensely important, especially when it comes to being persuasive and conveying charismatic attributes. However, to offer a reliable service and limit the transmission bandwidth, remote meeting tools heavily rely on signal compression. It has never been analyzed how this compression affects a speaker’s persuasive and overall charismatic impact. Our study addresses this gap for the audio signal. A perception experiment was carried out in which listeners rated short stimulus utterances with systematically varied compression rates and techniques. The scalar ratings concerned a set of charismatic speaker attributes. Results show that the applied audio compression significantly influences the assessment of a speaker’s charismatic impact and that, particularly female speakers seem to be systematically disadvantaged by audio compression rates and techniques. Their charismatic impact decreases over a larger range of different codecs; and this decrease is additionally also more strongly pronounced than for male speakers. We discuss these findings with respect to two possible explanations. The first explanation is signal-based: audio compression codecs could be generally optimized for male speech and, thus, degrade female speech more (particularly in terms of charisma-associated features). Alternatively, the explanation is in the ears of the listeners who are less forgiving of signal degradation when rating female speakers’ charisma.


2016 ◽  
Vol 44 ◽  
pp. 141-150
Author(s):  
Kazi Mahmudul Hassan ◽  
Md. Ekramul Hamid ◽  
Takayoshi Nakai

This study proposed an enhanced time-frequency representation of audio signal using EMD-2TEMD based approach. To analyze non-stationary signal like audio, timefrequency representation is an important aspect. In case of representing or analyzing such kind of signal in time-frequency-energy distribution, hilbert spectrum is a recent approach and popular way which has several advantages over other methods like STFT, WT etc. Hilbert-Huang Transform (HHT) is a prominent method consists of Empirical Mode Decomposition (EMD) and Hilbert Spectral Analysis (HSA). An enhanced method called Turning Tangent empirical mode decomposition (2T-EMD) has recently developed to overcome some limitations of classical EMD like cubic spline problems, sifting stopping condition etc. 2T-EMD based hilbert spectrum of audio signal encountered some issues due to the generation of too many IMFs in the process where EMD produces less. To mitigate those problems, a mutual implementation of 2T-EMD & classical EMD is proposed in this paper which enhances the representation of hilbert spectrum along with significant improvements in source separation result using Independent Subspace Analysis (ISA) based clustering in case of audio signals. This refinement of hilbert spectrum not only contributes to the future work of source separation problem but also many other applications in audio signal processing.


Sign in / Sign up

Export Citation Format

Share Document