Blind separation of audio signals using trigonometric transforms and Kalman filtering

2012 ◽  
Vol 16 (1) ◽  
pp. 7-17
Author(s):  
Mussa M. Ahmed ◽  
Fathi E. Abd El-Samie
2010 ◽  
Vol 13 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Hossam Hammam ◽  
Atef Abou Elazm ◽  
Mohamed E. Elhalawany ◽  
Fathi E. Abd El-Samie

Author(s):  
Sören Schulze ◽  
Emily J. King

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.


Signals ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 637-661
Author(s):  
Sören Schulze ◽  
Johannes Leuschner ◽  
Emily J. King

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.


Sign in / Sign up

Export Citation Format

Share Document