scholarly journals A recursive expectation-maximization algorithm for speaker tracking and separation

Author(s):  
Ofer Schwartz ◽  
Sharon Gannot

AbstractThe problem of blind and online speaker localization and separation using multiple microphones is addressed based on the recursive expectation-maximization (REM) procedure. A two-stage REM-based algorithm is proposed: (1) multi-speaker direction of arrival (DOA) estimation and (2) multi-speaker relative transfer function (RTF) estimation. The DOA estimation task uses only the time frequency (TF) bins dominated by a single speaker while the entire frequency range is not required to accomplish this task. In contrast, the RTF estimation task requires the entire frequency range in order to estimate the RTF for each frequency bin. Accordingly, a different statistical model is used for the two tasks. The first REM model is applied under the assumption that the speech signal is sparse in the TF domain, and utilizes a mixture of Gaussians (MoG) model to identify the TF bins associated with a single dominant speaker. The corresponding DOAs are estimated using these bins. The second REM model is applied under the assumption that the speakers are concurrently active in all TF bins and consequently applies a multichannel Wiener filter (MCWF) to separate the speakers. As a result of the assumption of the concurrent speakers, a more precise TF map of the speakers’ activity is obtained. The RTFs are estimated using the outputs of the MCWF-beamformer (BF), which are constructed using the DOAs obtained in the previous stage. Next, using the linearly constrained minimum variance (LCMV)-BF that utilizes the estimated RTFs, the speech signals are separated. The algorithm is evaluated using real-life scenarios of two speakers. Evaluation of the mean absolute error (MAE) of the estimated DOAs and the separation capabilities, demonstrates significant improvement w.r.t. a baseline DOA estimation and speaker separation algorithm.

Author(s):  
Joachim Frank

Cryo-electron microscopy combined with single-particle reconstruction techniques has allowed us to form a three-dimensional image of the Escherichia coli ribosome.In the interior, we observe strong density variations which may be attributed to the difference in scattering density between ribosomal RNA (rRNA) and protein. This identification can only be tentative, and lacks quantitation at this stage, because of the nature of image formation by bright field phase contrast. Apart from limiting the resolution, the contrast transfer function acts as a high-pass filter which produces edge enhancement effects that can explain at least part of the observed variations. As a step toward a more quantitative analysis, it is necessary to correct the transfer function in the low-spatial-frequency range. Unfortunately, it is in that range where Fourier components unrelated to elastic bright-field imaging are found, and a Wiener-filter type restoration would lead to incorrect results. Depending upon the thickness of the ice layer, a varying contribution to the Fourier components in the low-spatial-frequency range originates from an “inelastic dark field” image. The only prospect to obtain quantitatively interpretable images (i.e., which would allow discrimination between rRNA and protein by application of a density threshold set to the average RNA scattering density may therefore lie in the use of energy-filtering microscopes.


2018 ◽  
Vol 51 (5-6) ◽  
pp. 138-149 ◽  
Author(s):  
Hüseyin Göksu

Estimation of vehicle speed by analysis of drive-by noise is a known technique. The methods used in this kind of practice generally estimate the velocity of the vehicle with respect to the microphone(s), so they rely on the relative motion of the vehicle to the microphone(s). There are also other methods that do not rely on this technique. For example, recent research has shown that there is a statistical correlation between vehicle speed and drive-by noise emissions spectra. This does not rely on the relative motion of the vehicle with respect to the microphone(s) so it inspires us to consider the possibility of predicting velocity of the vehicle using an on-board microphone. This has the potential for the development of a new kind of speed sensor. For this purpose we record sound signal from a vehicle under speed variation using an on-board microphone. Sound emissions from a vehicle are very complex, which is from the engine, the exhaust, the air conditioner, other mechanical parts, tires, and air resistance. These emissions carry both stationary and non-stationary information. We propose to make the analysis by wavelet packet analysis, rather than traditional time or frequency domain methods. Wavelet packet analysis, by providing arbitrary time-frequency resolution, enables analyzing signals of stationary and non-stationary nature. It has better time representation than Fourier analysis and better high-frequency resolution than Wavelet analysis. Subsignals from the wavelet packet analysis are analyzed further by Norm Entropy, Log Energy Entropy, and Energy. These features are evaluated by feeding them into a multilayer perceptron. Norm entropy achieves the best prediction with 97.89% average accuracy with 1.11 km/h mean absolute error which corresponds to 2.11% relative error. Time sensitivity is ±0.453 s and is open to improvement by varying the window width. The results indicate that, with further tests at other speed ranges, with other vehicles and under dynamic conditions, this method can be extended to the design of a new kind of vehicle speed sensor.


2014 ◽  
Vol 610 ◽  
pp. 339-344
Author(s):  
Qiang Guo ◽  
Yun Fei An

A UCA-Root-MUSIC algorithm for direction-of-arrival (DOA) estimation is proposed in this paper which is based on UCA-RB-MUSIC [1]. The method utilizes not only a unitary transformation matrix different from UCA-RB-MUSIC but also the multi-stage Wiener filter (MSWF) to estimate the signal subspace and the number of sources, so that the new method has lower computational complexity and is more conducive to the real-time implementation. The computer simulation results demonstrate the improvement with the proposed method.


1963 ◽  
Vol 7 (04) ◽  
pp. 19-23
Author(s):  
J. Kotik

Ursell's exact expression3 for the wave-amplitude coefficient for a swaying or rolling vertical strip is evaluated numerically over the entire frequency range. The added-mass and inertia coefficients are then obtained numerically, also over the entire frequency range, via the Kramers-Kronig relations.


1990 ◽  
Vol 69 (4) ◽  
pp. 1372-1379 ◽  
Author(s):  
D. Navajas ◽  
R. Farre ◽  
J. Canet ◽  
M. Rotger ◽  
J. Sanchis

Respiratory impedance (Zrs) was measured between 0.25 and 32 Hz in seven anesthetized and paralyzed patients by applying forced oscillation of low amplitude at the inlet of the endotracheal tube. Effective respiratory resistance (Rrs; in cmH2O.l-1.s) fell sharply from 6.2 +/- 2.1 (SD) at 0.25 Hz to 2.3 +/- 0.6 at 2 Hz. From then on, Rrs decreased slightly with frequency down to 1.5 +/- 0.5 at 32 Hz. Respiratory reactance (Xrs; in cmH2O.l-1.s) was -22.2 +/- 5.9 at 0.25 Hz and reached zero at approximately 14 Hz and 2.3 +/- 0.8 at 32 Hz. Effective respiratory elastance (Ers = -2pi x frequency x Xrs; in cmH2O/1) was 34.8 +/- 9.2 at 0.25 Hz and increased markedly with frequency up to 44.2 +/- 8.6 at 2 Hz. We interpreted Zrs data in terms of a T network mechanical model. We represented the proximal branch by central airway resistance and inertance. The shunt pathway accounted for bronchial distensibility and alveolar gas compressibility. The distal branch included a Newtonian resistance component for tissues and peripheral airways and a viscoelastic component for tissues. When the viscoelastic component was represented by a Kelvin body as in the model of Bates et al. (J. Appl. Physiol. 61: 873-880, 1986), a good fit was obtained over the entire frequency range, and reasonable values of parameters were estimated. The strong frequency dependence of Rrs and Ers observed below 2 Hz in our anesthetized paralyzed patients could be mainly interpreted in terms of tissue viscoelasticity. Nevertheless, the high Ers we found with low volume excursions suggests that tissues also exhibit plasticlike properties.


Author(s):  
Feng Bao ◽  
Waleed H. Abdulla

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.


2017 ◽  
Vol 6 (1) ◽  
pp. 39-51 ◽  
Author(s):  
Peter Toose ◽  
Alexandre Roy ◽  
Frederick Solheim ◽  
Chris Derksen ◽  
Tom Watts ◽  
...  

Abstract. Radio-frequency interference (RFI) can significantly contaminate the measured radiometric signal of current spaceborne L-band passive microwave radiometers. These spaceborne radiometers operate within the protected passive remote sensing and radio-astronomy frequency allocation of 1400–1427 MHz but nonetheless are still subjected to frequent RFI intrusions. We present a unique surface-based and airborne hyperspectral 385 channel, dual polarization, L-band Fourier transform, RFI-detecting radiometer designed with a frequency range from 1400 through  ≈  1550 MHz. The extended frequency range was intended to increase the likelihood of detecting adjacent RFI-free channels to increase the signal, and therefore the thermal resolution, of the radiometer instrument. The external instrument calibration uses three targets (sky, ambient, and warm), and validation from independent stability measurements shows a mean absolute error (MAE) of 1.0 K for ambient and warm targets and 1.5 K for sky. A simple but effective RFI removal method which exploits the large number of frequency channels is also described. This method separates the desired thermal emission from RFI intrusions and was evaluated with synthetic microwave spectra generated using a Monte Carlo approach and validated with surface-based and airborne experimental measurements.


2021 ◽  
Author(s):  
Alain Beaudelaire Tchagang ◽  
Ahmed H. Tewfik ◽  
Julio J. Valdés

Abstract Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.


Sign in / Sign up

Export Citation Format

Share Document