BLIND AUDIO SEPARATION AND CONTENT ANALYSIS IN THE TIME-SCALE DOMAIN

2007 ◽  
Vol 01 (03) ◽  
pp. 307-318 ◽  
Author(s):  
ATMAN JBARI ◽  
ABDELLAH ADIB ◽  
DRISS ABOUTAJDINE

In this paper, we address the problem of Blind Audio Separation (BAS) by content evaluation of audio signals in the Time-Scale domain. Most of the proposed techniques rely on independence or at least uncorrelation assumption of the source signals exploiting mutual information or second/high order statistics. Here, we present a new algorithm, for instantaneous mixture, that considers only different time-scale source signature properties. Our approach lies in wavelet transformation advantages and proposes for this a new representation; Spatial Time Scale Distributions (STSD), to characterize energy and interference of the observed data. The BAS will be allowed by joint diagonalization, without a prior orthogonality constraint, of a set of selected diagonal STSD matrices. Several criteria will be proposed, in the transformed time-scale space, to assess the separated audio signal contents. We describe the logistics of the separation and the content rating, thus an exemplary implementation on synthetic signals and real audio recordings show the high efficiency of the proposed technique to restore the audio signal contents.

2013 ◽  
Vol 5 (4) ◽  
pp. 55-67
Author(s):  
Saif alZahir ◽  
Md Wahedul Islam

Audio signals and applications are numerous and ubiquitous. Most of these applications especially those on the Internet require authentication and proof(s) of ownership. There are several efficient methods in the literature address these crucial and critical concerns. In this paper, the authors present a new non-blind audio watermarking scheme for forensic audio authentication and proof of ownership. The proposed scheme is based on empirical mode decomposition and Hilbert Haung Transformation (HHT). In this method, the audio signal is decomposed into frames of 1024 sample each. These frames are further decomposed into its several mono-component signals called Intrinsic Mode Functions (IMF). These Intrinsic Mode Functions will serve as the addressee for the watermark. In this research, the chosen watermark is a pseudo random number generated by Matlab-7, which is added to the highest and lowest IMFs of each frame of the decomposed signal. This is done to accommodate for time scale modification attacks as well as MP3 compression respectively. Experimental results show that the watermarked audio signals maintained high fidelity of more than 20 dBs which meets the International Federation of Phonographic Industry requirements. The results also show that the proposed scheme is robust against signal processing attacks such as MP3, time scale modification, and resizing attacks.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Yingjun Dong ◽  
Neil G. MacLaren ◽  
Yiding Cao ◽  
Francis J. Yammarino ◽  
Shelley D. Dionne ◽  
...  

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.


2020 ◽  
Vol 23 (6) ◽  
pp. 225-240
Author(s):  
E. E. Usina ◽  
A. R. Shabanova ◽  
I. V. Lebedev

Purpose of reseach. The article presents the development of the model-algorithmic support for the process of determining the speech activity of a user of a socio-cyberphysical system. A topological model of a distributed subsystem of audio recordings implemented in limited physical spaces (rooms) is proposed; the model makes it possible to assess the quality of perceived audio signals for the case of distribution of microphones in such a room. Based on this model, a technique for determining the speech activity of a user of a socio-cyberphysical system, which maximizes the quality of perceived audio signals when a user moves in a room by means of determining the installation coordinates of microphones has been developed. Methods. The mathematical tools of graph theory and set theory was used for the most complete analysis and formal description of the distributed subsystem of the audiorecording. In order to determine the coordinates of the placement of microphones in one room, a relevant technique was developed; it involves performing such operations as emitting a speech signal in a room using acoustic equipment and measuring signal levels using a noise meter in the places intended for installing microphones.  Results. The dependences of the correlation coefficient of the combined signal and the initial test signal on the distance to the signal source were calculated for a different number of microphones. The obtained dependences allow us to determine the minimum required number of spaced microphones to ensure high-quality recording of the user’s speech. The results of testing the developed technique for determining speech activity in a particular room indicate the possibility and high efficiency of determining the speech activity of a user of a socio-cyberphysical system. Conclusion. Application of the proposed technique for determining the speech activity of a user of a sociocyberphysical system will improve the recording quality of the audio signal and, as a consequence, its subsequent processing, taking into account the possible movement of a user. 


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


2019 ◽  
Vol 9 (15) ◽  
pp. 3097 ◽  
Author(s):  
Diego Renza ◽  
Jaime Andres Arango ◽  
Dora Maria Ballesteros

This paper addresses a problem in the field of audio forensics. With the aim of providing a solution that helps Chain of Custody (CoC) processes, we propose an integrity verification system that includes capture (mobile based), hash code calculation and cloud storage. When the audio is recorded, a hash code is generated in situ by the capture module (an application), and it is sent immediately to the cloud. Later, the integrity of the audio recording given as evidence can be verified according to the information stored in the cloud. To validate the properties of the proposed scheme, we conducted several tests to evaluate if two different inputs could generate the same hash code (collision resistance), and to evaluate how much the hash code changes when small changes occur in the input (sensitivity analysis). According to the results, all selected audio signals provide different hash codes, and these values are very sensitive to small changes over the recorded audio. On the other hand, in terms of computational cost, less than 2 s per minute of recording are required to calculate the hash code. With the above results, our system is useful to verify the integrity of audio recordings that may be relied on as digital evidence.


2021 ◽  
Author(s):  
Katarina Stojadinović

In this study, we investigate efficient coding of multi-channel audio signals for transmission over packet networks. The techniques studied and developed as part of this research are based on redundancy coding and aim to achieve robustness with respect to packet losses. The resulting algorithm also addresses the needs of network clients with varying access bandwidths; the algorithm generates multi-layer encoded data streams which can range from basic mono to full multi-channel surround audio. Loss mitigation is achieved by applying multiple description coding technique based on the priority encoding transmission packetization scheme. The hierarchy of the transmitted data is derived from a statistical analysis of the multi-channel audio signal. Inter-channel correlations form the basis for estimating the multi-channel audio signal form the received descriptions at the decoder.


2021 ◽  
Author(s):  
Shahrzad Esmaili

This research focuses on the application of joint time-frequency (TF) analysis for watermarking and classifying different audio signals. Time frequency analysis which originated in the 1930s has often been used to model the non-stationary behaviour of speech and audio signals. By taking into consideration the human auditory system which has many non-linear effects and its masking properties, we can extract efficient features from the TF domain to watermark or classify signals. This novel audio watermarking scheme is based on spread spectrum techniques and uses content-based analysis to detect the instananeous mean frequency (IMF) of the input signal. The watermark is embedded in this perceptually significant region such that it will resist attacks. Audio watermarking offers a solution to data privacy and helps to protect the rights of the artists and copyright holders. Using the IMF, we aim to keep the watermark imperceptible while maximizing its robustness. In this case, 25 bits are embedded and recovered witin a 5 s sample of an audio signal. This scheme has shown to be robust against various signal processing attacks including filtering, MP3 compression, additive moise and resampling with a bit error rate in the range of 0-13%. In addition content-based classification is performed using TF analysis to classify sounds into 6 music groups consisting of rock, classical, folk, jazz and pop. The features that are extracted include entropy, centroid, centroid ratio, bandwidth, silence ratio, energy ratio, frequency location of minimum and maximum energy. Using a database of 143 signals, a set of 10 time-frequncy features are extracted and an accuracy of classification of around 93.0% using regular linear discriminant analysis or 92.3% using leave one out method is achieved.


2021 ◽  
Vol 12 (3-2021) ◽  
pp. 7-13
Author(s):  
A.F. Berdnik ◽  

In the course of the study, a 15-year-old female gray seal was trained to press a button after displaying an audio signal for 5 seconds and ignore similar audio signals of longer or shorter duration. The conducted research has demonstrated the ability of the experimental seal to reliably differentiate sound signals with a difference in sound duration of 3 seconds. Changes in the reaction time and behavior of the seal during the demonstration of sound stimuli with distinguishable and indistinguishable time ranges are described.


Author(s):  
Adarsh V Srinivasan ◽  
Mr. N. Saritakumar

In this paper, either a pre-recorded audio or a newly recorded audio is processed and analysed using the LabVIEW Software by National Instruments. All the data such as bitrate, number of channels, frequency, sampling rate of the Audio are analyzed and improvising the signal by a few operations like Amplification, De-Amplification, Inversion and Interlacing of Audio Signals are done. In LabVIEW, there are a few Sub Virtual Instrument’s available for Reading and Writing Audio in .wav formats and using them and array Sub Virtual Instrument, all the processing are done. KEYWORDS: Virtual Instrumentation (VI), LabVIEW (LV), Audio, Processing, audio array.


Sign in / Sign up

Export Citation Format

Share Document