Virtual sound source distance evaluation in acoustically and visually incongruent contexts

This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance classes—one meter, two meters, and three meters—irrespective of its orientation angle. For the purpose of training, we create a dataset by recording audio signals at the three different distances and three angles in different rooms. The CRNN is trained using time-frequency representations of the audio signals. Specifically, we transform the audio signals into log-scaled mel spectrograms, allowing the convolutional layers to extract the appropriate features required for the classification. When trained and tested with combined datasets from all rooms, the proposed model exhibits high classification accuracies; however, training and testing the model in separate rooms results in lower accuracies, indicating that further study is required to improve the method’s generalization ability. Our experimental results demonstrate that it is possible to estimate sound source distances in known environments by classification using the log-scaled mel spectrogram.

Download Full-text

Loudness constancy with varying sound source distance

Nature Neuroscience ◽

10.1038/82931 ◽

2001 ◽

Vol 4 (1) ◽

pp. 78-83 ◽

Cited By ~ 51

Author(s):

Pavel Zahorik ◽

Frederic L. Wightman

Keyword(s):

Sound Source ◽

Source Distance

Download Full-text

Signal Dynamics and Sound Source Distance

Signals and Communication Technology - Sound in the Time Domain ◽

10.1007/978-981-10-5889-9_12 ◽

2017 ◽

pp. 297-317

Author(s):

Mikio Tohyama

Keyword(s):

Sound Source ◽

Source Distance

Download Full-text

Loudness judgements are not necessarily affected by visual cues to sound source distance

10.31234/osf.io/kjb78 ◽

2017 ◽

Author(s):

sol libesman ◽

Thomas Whitford ◽

Damien Mannion

Keyword(s):

Sound Source ◽

Visual Cues ◽

Apparent Distance ◽

Sound Level ◽

Acoustic Energy ◽

Visual Environment ◽

Potential Contribution ◽

Subjective Loudness ◽

Source Distance ◽

Source Level

The level of the auditory signals at the ear depends both on the capacity of the sound source to produce acoustic energy and on the distance of the source from the listener. Loudness constancy requires that our perception of sound level, loudness, corresponds to the source level by remaining invariant to the confounding effects of distance. Here, we assessed the evidence for a potential contribution of vision, via the disambiguation of sound source distance, to loudness constancy. We presented participants with a visual environment, on a computer monitor, which contained a visible loudspeaker at a particular distance and was accompanied by the auditory delivery, via headphones, of an anechoic sound of a particular aural level. We measured the point of subjective loudness equality for sounds associated with loudspeakers at different visually-depicted distances. We report strong evidence that such loudness judgements were closely aligned with the aural level, rather than being affected by the apparent distance of the sound source conveyed visually. Similar results were obtained across variations in sound and environment characteristics. We conclude that the loudness of anechoic sounds are not necessarily affected by indications of the sound source distance as established via vision.

Download Full-text