scholarly journals 3D Multiple Sound Source Localization by Proposed Cuboids Nested Microphone Array in Combination with Adaptive Wavelet-Based Subband GEVD

Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 867
Author(s):  
Ali Dehghan Firoozabadi ◽  
Pablo Irarrazaval ◽  
Pablo Adasme ◽  
David Zabala-Blanco ◽  
Pablo Palacios-Játiva ◽  
...  

Sound source localization is one of the applicable areas in speech signal processing. The main challenge appears when the aim is a simultaneous multiple sound source localization from overlapped speech signals with an unknown number of speakers. Therefore, a method able to estimate the number of speakers, along with the speaker’s location, and with high accuracy is required in real-time conditions. The spatial aliasing is an undesirable effect of the use of microphone arrays, which decreases the accuracy of localization algorithms in noisy and reverberant conditions. In this article, a cuboids nested microphone array (CuNMA) is first proposed for eliminating the spatial aliasing. The CuNMA is designed to receive the speech signal of all speakers in different directions. In addition, the inter-microphone distance is adjusted for considering enough microphone pairs for each subarray, which prepares appropriate information for 3D sound source localization. Subsequently, a speech spectral estimation method is considered for evaluating the speech spectrum components. The suitable spectrum components are selected and the undesirable components are denied in the localization process. The speech information is different in frequency bands. Therefore, the adaptive wavelet transform is used for subband processing in the proposed algorithm. The generalized eigenvalue decomposition (GEVD) method is implemented in sub-bands on all nested microphone pairs, and the probability density function (PDF) is calculated for estimating the direction of arrival (DOA) in different sub-bands and continuing frames. The proper PDFs are selected by thresholding on the standard deviation (SD) of the estimated DOAs and the rest are eliminated. This process is repeated on time frames to extract the best DOAs. Finally, K-means clustering and silhouette criteria are considered for DOAs classification in order to estimate the number of clusters (speakers) and the related DOAs. All DOAs in each cluster are intersected for estimating the position of the 3D speakers. The closest point to all DOA planes is selected as a speaker position. The proposed method is compared with a hierarchical grid (HiGRID), perpendicular cross-spectra fusion (PCSF), time-frequency wise spatial spectrum clustering (TF-wise SSC), and spectral source model-deep neural network (SSM-DNN) algorithms based on the accuracy and computational complexity of real and simulated data in noisy and reverberant conditions. The results show the superiority of the proposed method in comparison with other previous works.

2020 ◽  
Vol 71 (3) ◽  
pp. 150-164
Author(s):  
Ali Dehghan Firoozabadi ◽  
Pablo Irarrazaval ◽  
Pablo Adasme ◽  
David Zabala-Blanco ◽  
Cesar Azurdia-Meza

AbstractMultiple sound source localization in noisy and reverberant conditions is one of the important challenges in the speech signal processing. The aim of this article is three-dimensional sound source localization in undesirable scenarios. For the localization algorithms, the spatial aliasing is one of the destructive factors in reducing the accuracy. Firstly, a 3D quasi-spherical nested microphone array (QSNMA) is proposed for eliminating the spatial aliasing. Since the speech signal has the windowed-disjoint orthogonality property, the speech information differs in terms of the frequency bands. Then, the Gammatone filter bank is introduced for the speech subband processing. In the following, the multiresolution steered response power (SRP) algorithm is adaptively implemented on subbands with the phase transform (PHAT)/maximum likelihood (ML) weighted functions based on the levels of the noise and reverberation. The peaks of the multiresolution adaptive SRP (MASRP) algorithm are extracted in each subband based on the number of speakers for continuous time frames. Finally, the distribution of these peaks are calculated in each subband and they are merged by the use of weighted averaging method. The final 3D speakers locations are estimated by extracting the peaks in the final distribution. The proposed QSNMAMASRP(PHAT/ML) algorithm is evaluated on real and simulated data for 2 and 3 simultaneous speakers in noisy and reverberant conditions. The proposed method is compared with SRP-PHAT, spectral source model-deep neural network, and spherical harmonic temporal extension of multiple response model sparse Bayesian learning algorithms on different range of signal-to-noise ratio and reverberation time. The mean absolute estimation error, averaged standard deviation for absolute estimation error, and computational complexity results show the superiority of the proposed method.


2013 ◽  
Vol 397-400 ◽  
pp. 2209-2214
Author(s):  
Chuan Yi Zhang ◽  
Chang Wei Mi ◽  
Pei Yang Yao

In the estimation of time delay, there always would not appear obvious peak with the basic cross-correlation (CC). In order to solve the problem of the basic cross-correlation method, this essay represents an improved time delay estimation method based on the generalized cross-correlation (GCC) and combines with the microphone array structure to achieve sound source localization. Finally, the simulation results show that this method could measure the sound source’s location accurately with noise and reverberation, and the distance positioning error is less than 10cm, the direction angle error is below 3°.


Sign in / Sign up

Export Citation Format

Share Document