Spatial Localization of Concurrent Multiple Sound Sources Using Phase Candidate Histogram

Author(s):  
Huakang Li ◽  
◽  
Jie Huang ◽  
Minyi Guo ◽  
Qunfei Zhao

Mobile robots communicating with people would benefit from being able to detect sound sources to help localize interesting events in real-life settings. We propose using a spherical robot with four microphones to determine the spatial locations of multiple sound sources in ordinary rooms. The arrival temporal disparities from phase difference histograms are used to calculate the time differences. A precedence effect model suppresses the influence of echoes in reverberant environments. To integrate spatial cues of different microphones, we map the correlation between different microphone pairs on a 3D map corresponding to the azimuth and elevation of sound source direction. Results of experiments indicate that our proposed system provides sound source distribution very clearly and precisely, even concurrently in reverberant environments with the Echo Avoidance (EA) model.

2015 ◽  
Vol 114 (5) ◽  
pp. 2991-3001 ◽  
Author(s):  
Andrew D. Brown ◽  
Heath G. Jones ◽  
Alan Kan ◽  
Tanvi Thakkar ◽  
G. Christopher Stecker ◽  
...  

Normal-hearing human listeners and a variety of studied animal species localize sound sources accurately in reverberant environments by responding to the directional cues carried by the first-arriving sound rather than spurious cues carried by later-arriving reflections, which are not perceived discretely. This phenomenon is known as the precedence effect (PE) in sound localization. Despite decades of study, the biological basis of the PE remains unclear. Though the PE was once widely attributed to central processes such as synaptic inhibition in the auditory midbrain, a more recent hypothesis holds that the PE may arise essentially as a by-product of normal cochlear function. Here we evaluated the PE in a unique human patient population with demonstrated sensitivity to binaural information but without functional cochleae. Users of bilateral cochlear implants (CIs) were tested in a psychophysical task that assessed the number and location(s) of auditory images perceived for simulated source-echo (lead-lag) stimuli. A parallel experiment was conducted in a group of normal-hearing (NH) listeners. Key findings were as follows: 1) Subjects in both groups exhibited lead-lag fusion. 2) Fusion was marginally weaker in CI users than in NH listeners but could be augmented by systematically attenuating the amplitude of the lag stimulus to coarsely simulate adaptation observed in acoustically stimulated auditory nerve fibers. 3) Dominance of the lead in localization varied substantially among both NH and CI subjects but was evident in both groups. Taken together, data suggest that aspects of the PE can be elicited in CI users, who lack functional cochleae, thus suggesting that neural mechanisms are sufficient to produce the PE.


2013 ◽  
Vol 280 (1769) ◽  
pp. 20131428 ◽  
Author(s):  
Ludwig Wallmeier ◽  
Nikodemus Geßele ◽  
Lutz Wiegrebe

Several studies have shown that blind humans can gather spatial information through echolocation. However, when localizing sound sources, the precedence effect suppresses spatial information of echoes, and thereby conflicts with effective echolocation. This study investigates the interaction of echolocation and echo suppression in terms of discrimination suppression in virtual acoustic space. In the ‘Listening’ experiment, sighted subjects discriminated between positions of a single sound source, the leading or the lagging of two sources, respectively. In the ‘Echolocation’ experiment, the sources were replaced by reflectors. Here, the same subjects evaluated echoes generated in real time from self-produced vocalizations and thereby discriminated between positions of a single reflector, the leading or the lagging of two reflectors, respectively. Two key results were observed. First, sighted subjects can learn to discriminate positions of reflective surfaces echo-acoustically with accuracy comparable to sound source discrimination. Second, in the Listening experiment, the presence of the leading source affected discrimination of lagging sources much more than vice versa. In the Echolocation experiment, however, the presence of both the lead and the lag strongly affected discrimination. These data show that the classically described asymmetry in the perception of leading and lagging sounds is strongly diminished in an echolocation task. Additional control experiments showed that the effect is owing to both the direct sound of the vocalization that precedes the echoes and owing to the fact that the subjects actively vocalize in the echolocation task.


2010 ◽  
Vol 103 (1) ◽  
pp. 446-457 ◽  
Author(s):  
Daniel J. Tollin ◽  
Elizabeth M. McClaine ◽  
Tom C. T. Yin

The precedence effect (PE) is an auditory spatial illusion whereby two identical sounds presented from two separate locations with a delay between them are perceived as a fused single sound source whose position depends on the value of the delay. By training cats using operant conditioning to look at sound sources, we have previously shown that cats experience the PE similarly to humans. For delays less than ±400 μs, cats exhibit summing localization, the perception of a “phantom” sound located between the sources. Consistent with localization dominance, for delays from 400 μs to ∼10 ms, cats orient toward the leading source location only, with little influence of the lagging source. Finally, echo threshold was reached for delays >10 ms, where cats first began to orient to the lagging source. It has been hypothesized by some that the neural mechanisms that produce facets of the PE, such as localization dominance and echo threshold, must likely occur at cortical levels. To test this hypothesis, we measured both pinnae position, which were not under any behavioral constraint, and eye position in cats and found that the pinnae orientations to stimuli that produce each of the three phases of the PE illusion was similar to the gaze responses. Although both eye and pinnae movements behaved in a manner that reflected the PE, because the pinnae moved with strikingly short latencies (∼30 ms), these data suggest a subcortical basis for the PE and that the cortex is not likely to be directly involved.


1999 ◽  
Vol 58 (3) ◽  
pp. 170-179 ◽  
Author(s):  
Barbara S. Muller ◽  
Pierre Bovet

Twelve blindfolded subjects localized two different pure tones, randomly played by eight sound sources in the horizontal plane. Either subjects could get information supplied by their pinnae (external ear) and their head movements or not. We found that pinnae, as well as head movements, had a marked influence on auditory localization performance with this type of sound. Effects of pinnae and head movements seemed to be additive; the absence of one or the other factor provoked the same loss of localization accuracy and even much the same error pattern. Head movement analysis showed that subjects turn their face towards the emitting sound source, except for sources exactly in the front or exactly in the rear, which are identified by turning the head to both sides. The head movement amplitude increased smoothly as the sound source moved from the anterior to the posterior quadrant.


Sensors ◽  
2021 ◽  
Vol 21 (2) ◽  
pp. 532
Author(s):  
Henglin Pu ◽  
Chao Cai ◽  
Menglan Hu ◽  
Tianping Deng ◽  
Rong Zheng ◽  
...  

Multiple blind sound source localization is the key technology for a myriad of applications such as robotic navigation and indoor localization. However, existing solutions can only locate a few sound sources simultaneously due to the limitation imposed by the number of microphones in an array. To this end, this paper proposes a novel multiple blind sound source localization algorithms using Source seParation and BeamForming (SPBF). Our algorithm overcomes the limitations of existing solutions and can locate more blind sources than the number of microphones in an array. Specifically, we propose a novel microphone layout, enabling salient multiple source separation while still preserving their arrival time information. After then, we perform source localization via beamforming using each demixed source. Such a design allows minimizing mutual interference from different sound sources, thereby enabling finer AoA estimation. To further enhance localization performance, we design a new spectral weighting function that can enhance the signal-to-noise-ratio, allowing a relatively narrow beam and thus finer angle of arrival estimation. Simulation experiments under typical indoor situations demonstrate a maximum of only 4∘ even under up to 14 sources.


Author(s):  
Simone Spagnol ◽  
Michele Geronazzo ◽  
Davide Rocchesso ◽  
Federico Avanzini

Purpose – The purpose of this paper is to present a system for customized binaural audio delivery based on the extraction of relevant features from a 2-D representation of the listener’s pinna. Design/methodology/approach – The most significant pinna contours are extracted by means of multi-flash imaging, and they provide values for the parameters of a structural head-related transfer function (HRTF) model. The HRTF model spatializes a given sound file according to the listener’s head orientation, tracked by sensor-equipped headphones, with respect to the virtual sound source. Findings – A preliminary localization test shows that the model is able to statically render the elevation of a virtual sound source better than non-individual HRTFs. Research limitations/implications – Results encourage a deeper analysis of the psychoacoustic impact that the individualized HRTF model has on perceived elevation of virtual sound sources. Practical implications – The model has low complexity and is suitable for implementation on mobile devices. The resulting hardware/software package will hopefully allow an easy and low-tech fruition of custom spatial audio to any user. Originality/value – The authors show that custom binaural audio can be successfully deployed without the need of cumbersome subjective measurements.


2021 ◽  
Vol 263 (6) ◽  
pp. 894-906
Author(s):  
Yannik Weber ◽  
Matthias Behrendt ◽  
Tobias Gohlke ◽  
Albert Albers

Preliminary work by the IPEK - Institute of Product Engineering at KIT has shown that the simulated pass-by measurement for exterior noise homologation of vehicles has relevant optimization potential: the measurement can be carried out in smaller halls and with a smaller measurement setup than required by the norm and thus with less construction cost and effort. A prerequisite for this however is the scaling of the entire setup. For the scaling in turn, the sound sources of the vehicle must be combined to a single point sound source - the acoustic centre. Previous approaches for conventional drives assume a static centre in the front part of the vehicle. For complex drive topologies, e.g. hybrid drives, and unsteady driving conditions, however, this assumption is not valid anymore. Therefore, with the help of an acoustic camera, a method for localizing the dominant sound sources of the vehicle and a software-based application for summarizing them to an acoustic centre were developed. The method is able to take into account stationary, unsteady and sudden events in the calculation of the acoustic centre, which is moved as a result. Using substitute sound sources and two vehicles, the method and the used measurement technology were examined and verified for their applicability.


2018 ◽  
Vol 120 (6) ◽  
pp. 2939-2952 ◽  
Author(s):  
Samira Anderson ◽  
Robert Ellis ◽  
Julie Mehta ◽  
Matthew J. Goupell

The effects of aging and stimulus configuration on binaural masking level differences (BMLDs) were measured behaviorally and electrophysiologically, using the frequency-following response (FFR) to target brainstem/midbrain encoding. The tests were performed in 15 younger normal-hearing (<30 yr) and 15 older normal-hearing (>60 yr) participants. The stimuli consisted of a 500-Hz target tone embedded in a narrowband (50-Hz bandwidth) or wideband (1,500-Hz bandwidth) noise masker. The interaural phase conditions included NoSo (tone and noise presented interaurally in-phase), NoSπ (noise presented interaurally in-phase and tone presented out-of-phase), and NπSo (noise presented interaurally out-of-phase and tone presented in-phase) configurations. In the behavioral experiment, aging reduced the magnitude of the BMLD. The magnitude of the BMLD was smaller for the NoSo–NπSo threshold difference compared with the NoSo–NoSπ threshold difference, and it was also smaller in narrowband compared with wideband conditions, consistent with previous measurements. In the electrophysiology experiment, older participants had reduced FFR magnitudes and smaller differences between configurations. There were significant changes in FFR magnitude between the NoSo to NoSπ configurations but not between the NoSo to NπSo configurations. The age-related reduction in FFR magnitudes suggests a temporal processing deficit, but no correlation was found between FFR magnitudes and behavioral BMLDs. Therefore, independent mechanisms may be contributing to the behavioral and neural deficits. Specifically, older participants had higher behavioral thresholds than younger participants for the NoSπ and NπSo configurations but had equivalent thresholds for the NoSo configuration. However, FFR magnitudes were reduced in older participants across all configurations. NEW & NOTEWORTHY Behavioral and electrophysiological testing reveal an aging effect for stimuli presented in wideband and narrowband noise conditions, such that behavioral binaural masking level differences and subcortical spectral magnitudes are reduced in older compared with younger participants. These deficits in binaural processing may limit the older participant's ability to use spatial cues to understand speech in environments containing competing sound sources.


Sign in / Sign up

Export Citation Format

Share Document