scholarly journals Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings

Author(s):  
Sören Schulze ◽  
Emily J. King

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.

Signals ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 637-661
Author(s):  
Sören Schulze ◽  
Johannes Leuschner ◽  
Emily J. King

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.


2005 ◽  
Author(s):  
Toru Taniguchi ◽  
Akishige Adachi ◽  
Shigeki Okawa ◽  
Masaaki Honda ◽  
Katsuhiko Shirai

Author(s):  
Y Chen ◽  
C Muratov ◽  
V Matveev

ABSTRACTWe consider the stationary solution for the Ca2+ concentration near a point Ca2+ source describing a single-channel Ca2+ nanodomain, in the presence of a single mobile Ca2+ buffer with one-to-one Ca2+ binding. We present computationally efficient approximants that estimate stationary single-channel Ca2+ nanodomains with great accuracy in broad regions of parameter space. The presented approximants have a functional form that combines rational and exponential functions, which is similar to that of the well-known Excess Buffer Approximation and the linear approximation, but with parameters estimated using two novel (to our knowledge) methods. One of the methods involves interpolation between the short-range Taylor series of the buffer concentration and its long-range asymptotic series in inverse powers of distance from the channel. Although this method has already been used to find Padé (rational-function) approximants to single-channel Ca2+ and buffer concentration, extending this method to interpolants combining exponential and rational functions improves accuracy in a significant fraction of the relevant parameter space. A second method is based on the variational approach, and involves a global minimization of an appropriate functional with respect to parameters of the chosen approximations. Extensive parameter sensitivity analysis is presented, comparing these two methods with previously developed approximants. Apart from increased accuracy, the strength of these approximants is that they can be extended to more realistic buffers with multiple binding sites characterized by cooperative Ca2+ binding, such as calmodulin and calretinin.STATEMENT OF SIGNIFICANCEMathematical and computational modeling plays an important role in the study of local Ca2+ signals underlying vesicle exocysosis, muscle contraction and other fundamental physiological processes. Closed-form approximations describing steady-state distribution of Ca2+ in the vicinity of an open Ca2+ channel have proved particularly useful for the qualitative modeling of local Ca2+ signals. We present simple and efficient approximants for the Ca2+ concentration in the presence of a mobile Ca2+ buffer, which achieve great accuracy over a wide range of model parameters. Such approximations provide an efficient method for estimating Ca2+ and buffer concentrations without resorting to numerical simulations, and allow to study the qualitative dependence of nanodomain Ca2+ distribution on the buffer’s Ca2+ binding properties and its diffusivity.


2019 ◽  
Vol 49 (10) ◽  
pp. 2469-2498 ◽  
Author(s):  
R. M. Samelson ◽  
D. B. Chelton ◽  
M. G. Schlax

AbstractA statistical-equilibrium, geostrophic-turbulence regime of the stochastically forced, one-layer, reduced-gravity, quasigeostrophic model is identified in which the numerical solutions are representative of global mean, midlatitude, open-ocean mesoscale variability. Solutions are forced near the internal deformation wavenumber and damped linearly and by high-wavenumber enstrophy dissipation. The results partially rationalize a recent semiempirical stochastic field model of mesoscale variability motivated by a global eddy identification and tracking analysis of two decades of satellite altimeter sea surface height (SSH) observations. Comparisons of model results with observed SSH variance, autocorrelation, eddy, and spectral statistics place constraints on the model parameters. A nominal best fit is obtained for a dimensional SSH stochastic-forcing variance production rate of 1/4 cm2 day−1, an SSH damping rate of 1/62 week−1, and a stochastic forcing autocorrelation time scale near or greater than 1 week. This ocean mesoscale regime is nonlinear and appears to fall near the stochastic limit, at which wave-mean interaction is just strong enough to begin to reduce the local mesoscale variance production, but is still weak relative to the overall nonlinearity. Comparison of linearly inverted wavenumber–frequency spectra shows that a strong effect of nonlinearity, the removal of energy from the resonant linear wave field, is resolved by the gridded altimeter SSH data. These inversions further suggest a possible signature in the merged altimeter SSH dataset of signal propagation characteristics from the objective analysis procedure.


Energies ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 2877
Author(s):  
Miroslaw Lewandowski ◽  
Marek Orzylowski

Accurate dynamic models of supercapacitors (SCs) are a basis for the design, control and exploitation of the hybrid energy storage systems for electric vehicles. This paper concerns a fractional model of SC impedance, based on the Cole–Cole equation describing relaxation in electric double layer. This article provides a new method of identifying the parameters of fractional order model of SC impedance, performed without disconnecting the SC module from the energy storage system. The test drive for this purpose needs only the respect a few simple recommendations. The article presents the conditions of the mentioned test drive that will ensure the frequency spectra of the recorded signals lying in the bandwidth necessary for the correct identification of the model parameters. These parameters are determined by means of the Nelder–Mead simplex optimization method. The results of the identification described by the time method coincide with those obtained in the frequency domain. It has been shown in the last part of the article that the real energy losses in these systems significantly exceed the losses determined only on the basis of the nominal capacity and series equivalent resistance (ESR), to which the SC catalogue data are usually limited. This paper also provides an auxiliary frequency criterion for the selection of SCs intended for energy storage systems of electric vehicles.


2012 ◽  
Vol 229-231 ◽  
pp. 1768-1771
Author(s):  
Wen Qiang Liu ◽  
Na Han ◽  
Man Yan ◽  
Gui Li Tao

For the single-channel autoregressive moving average (ARMA) signals with multisensor, and with unknown model parameters and noise variances, the local estimators of unknown model parameters and noise variances are obtained by the recursive instrumental variable (RIV) algorithm and correlation method, and the fused estimators are obtained by taking the average of the local estimators. Substituting them into the optimal fusion Kalman filter, a self-tuning fusion Kalman filter for single-channel ARMA signals is presented. A simulation example shows its effectiveness.


1993 ◽  
Vol 246 ◽  
pp. 489-502 ◽  
Author(s):  
George Kosály

Bilger, Saetran & Krishnamoorthy (1991) give measured values of the variance, cross-correlation coefficient, autospectra, coherence and phase shift of the reactant concentration fluctuations for an irreversible second-order reaction in an incompressible turbulent scalar mixing layer. The present paper approaches the interpretation of the measured data by evaluating the above quantities in the frozen (slow) and equilibrium (fast) chemistry limits. We assume that the limiting values bracket the corresponding intermediate rate data.The analysis leads to values that correspond with the measured variances and correlation coefficients. The paper offers simple procedures for experimenters to evaluate the fast chemistry limit of the spectral characteristics from the measured mixture fraction fluctuations. The investigation of the limiting spectra suggests that, in the frequency region considered in the Bilger et al. measurements, the shape of the autospectrum is quite insensitive to the chemistry rate. The cross-spectrum is much more sensitive to the chemistry than the autospectrum. The analysis predicts correctly that the coherence decreases with increasing frequency while the phase stays equal to π until the decrease of the coherence leads to indeterminate phase results.


Sign in / Sign up

Export Citation Format

Share Document