scholarly journals Dynamic Binaural Rendering: The Advantage of Virtual Artificial Heads over Conventional Ones for Localization with Speech Signals

2021 ◽  
Vol 11 (15) ◽  
pp. 6793
Author(s):  
Mina Fallahi ◽  
Martin Hansen ◽  
Simon Doclo ◽  
Steven van de Par ◽  
Dirk Püschel ◽  
...  

As an alternative to conventional artificial heads, a virtual artificial head (VAH), i.e., a microphone array-based filter-and-sum beamformer, can be used to create binaural renderings of spatial sound fields. In contrast to conventional artificial heads, a VAH enables one to individualize the binaural renderings and to incorporate head tracking. This can be achieved by applying complex-valued spectral weights—calculated using individual head related transfer functions (HRTFs) for each listener and for different head orientations—to the microphone signals of the VAH. In this study, these spectral weights were applied to measured room impulse responses in an anechoic room to synthesize individual binaural room impulse responses (BRIRs). In the first part of the paper, the results of localizing virtual sources generated with individually synthesized BRIRs and measured BRIRs using a conventional artificial head, for different head orientations, were assessed in comparison with real sources. Convincing localization performances could be achieved for virtual sources generated with both individually synthesized and measured non-individual BRIRs with respect to azimuth and externalization. In the second part of the paper, the results of localizing virtual sources were compared in two listening tests, with and without head tracking. The positive effect of head tracking on the virtual source localization performance confirmed a major advantage of the VAH over conventional artificial heads.

Author(s):  
Johannes M. Arend ◽  
Tim Lübeck ◽  
Christoph Pörschmann

AbstractHigh-quality rendering of spatial sound fields in real-time is becoming increasingly important with the steadily growing interest in virtual and augmented reality technologies. Typically, a spherical microphone array (SMA) is used to capture a spatial sound field. The captured sound field can be reproduced over headphones in real-time using binaural rendering, virtually placing a single listener in the sound field. Common methods for binaural rendering first spatially encode the sound field by transforming it to the spherical harmonics domain and then decode the sound field binaurally by combining it with head-related transfer functions (HRTFs). However, these rendering methods are computationally demanding, especially for high-order SMAs, and require implementing quite sophisticated real-time signal processing. This paper presents a computationally more efficient method for real-time binaural rendering of SMA signals by linear filtering. The proposed method allows representing any common rendering chain as a set of precomputed finite impulse response filters, which are then applied to the SMA signals in real-time using fast convolution to produce the binaural signals. Results of the technical evaluation show that the presented approach is equivalent to conventional rendering methods while being computationally less demanding and easier to implement using any real-time convolution system. However, the lower computational complexity goes along with lower flexibility. On the one hand, encoding and decoding are no longer decoupled, and on the other hand, sound field transformations in the SH domain can no longer be performed. Consequently, in the proposed method, a filter set must be precomputed and stored for each possible head orientation of the listener, leading to higher memory requirements than the conventional methods. As such, the approach is particularly well suited for efficient real-time binaural rendering of SMA signals in a fixed setup where usually a limited range of head orientations is sufficient, such as live concert streaming or VR teleconferencing.


2016 ◽  
Vol 62 (4) ◽  
pp. 389-394
Author(s):  
Andrzej Borys

Abstract In the literature, Saleh’s description of the AM/AM and AM/PM conversions occurring in communication power amplifiers (PAs) is classified as a representation without memory. We show here that this view must be revised. The need for such revision follows from the fact that the Saleh’s representation is based on the quadrature mapping which, as we show here, can be expanded in a Volterra series different from an usual Taylor series. That is the resulting Volterra series possesses the nonlinear impulse responses in form of sums of ordinary functions and multidimensional Dirac impulses multiplied by coefficients being real numbers. This property can be also expressed, equivalently, by saying that the nonlinear transfer functions associated with the aforementioned Volterra series are complex-valued functions. In conclusion, the above means that the Saleh’s representation incorporates memory effects.


Acta Acustica ◽  
2021 ◽  
Vol 5 ◽  
pp. 30
Author(s):  
Mina Fallahi ◽  
Martin Hansen ◽  
Simon Doclo ◽  
Steven van de Par ◽  
Dirk Püschel ◽  
...  

In order to realize binaural auralizations with head tracking, BRIRs of individual listeners are needed for different head orientations. In this contribution, a filter-and-sum beamformer, referred to as virtual artificial head (VAH), was used to synthesize the BRIRs. To this end, room impulse responses were first measured with a VAH, using a planar microphone array with 24 microphones, for one fixed orientation, in an anechoic and a reverberant room. Then, individual spectral weights for 185 orientations of the listener’s head were calculated with different parameter sets. Parameters included the number and the direction of the sources considered in the calculation of spectral weights as well as the required minimum mean white noise gain (WNGm). For both acoustical environments, the quality of the resulting synthesized BRIRs was assessed perceptually in head-tracked auralizations, in direct comparison to real loudspeaker playback in the room. Results showed that both rooms could be auralized with the VAH for speech signals in a perceptually convincing manner, by employing spectral weights calculated with 72 source directions from the horizontal plane. In addition, low resulting WNGm values should be avoided. Furthermore, in the dynamic binaural auralization with speech signals in this study, individual BRIRs seemed to offer no advantage over non-individual BRIRs, confirming previous results that were obtained with simulated BRIRs.


2021 ◽  
Vol 11 (3) ◽  
pp. 1150
Author(s):  
Stephan Werner ◽  
Florian Klein ◽  
Annika Neidhardt ◽  
Ulrike Sloma ◽  
Christian Schneiderwind ◽  
...  

For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with the real listening environment. Such a system has several components, each of which help to enable a plausible auditory simulation. For each possible position of the listener in the room, a set of binaural room impulse responses (BRIRs) congruent with the expected auditory environment is required to avoid room divergence effects. Adequate and efficient approaches are methods to synthesize new BRIRs using very few measurements of the listening room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception thresholds. Retrieving and processing the tracking data of the listener’s head-pose and position as well as convolving BRIRs with an audio signal needs to be done in real-time. This contribution presents work done by the authors including several technical components of such a system in detail. It shows how the single components are affected by psychoacoustics. Furthermore, the paper also discusses the perceptive effect by means of listening tests demonstrating the appropriateness of the approaches.


2013 ◽  
Author(s):  
Alba Granados ◽  
Finn Jacobsen ◽  
Efren Fernandez-Grande

2021 ◽  
Vol 263 (5) ◽  
pp. 1488-1496
Author(s):  
Yunqi Chen ◽  
Chuang Shi ◽  
Hao Mu

Earphones are commonly equipped with miniature loudspeaker units, which cannot transmit enough power of low-frequency sound. Meanwhile, there is often only one loudspeaker unit employed on each side of the earphone, whereby the multi-channel spatial audio processing cannot be applied. Therefore, the combined usage of the virtual bass (VB) and head-related transfer functions (HRTFs) is necessary for an immersive listening experience with earphones. However, the combining effect of the VB and HRTFs has not been comprehensively reported. The VB is developed based on the missing fundamental effect, providing that the presence of harmonics can be perceived as their fundamental frequency, even if the fundamental frequency is not presented. HRTFs describe the transmission process of a sound propagating from the sound source to human ears. Monaural audio processed by a pair of HRTFs can be perceived by the listener as a sound source located in the direction associated with the HRTFs. This paper carries out subjective listening tests and their results reveal that the harmonics required by the VB should be generated in the same direction as the high-frequency sound. The bass quality is rarely distorted by the presence of HRTFs, but the localization accuracy is occasionally degraded by the VB.


2021 ◽  
Vol 263 (2) ◽  
pp. 4598-4607
Author(s):  
Haruka Matsuhashi ◽  
Izumi Tsunokuni ◽  
Yusuke Ikeda

Measurements of Room Impulse Responses (RIRs) at multiple points have been used in various acoustic techniques using the room acoustic characteristics. To obtain multi-point RIRs more efficiently, spatial interpolation of RIRs using plane wave decomposition method (PWDM) and equivalent source method (ESM) has been proposed. Recently, the estimation of RIRs from a small number of microphones using spatial and temporal sparsity has been studied. In this study, by using the measured RIRs, we compare the estimation accuracies of RIRs interpolation methods with a small number of fixed microphones. In particular, we consider the early and late reflections separately. The direct sound and early reflection components are represented using sparse ESM, and the late reflection component is represented using ESM or PWDM. And then, we solve the two types of optimization problems: individual optimization problems for early and late reflections decomposed by the arrival time and a single optimization problem for direct sound and all reflections. In the evaluation experiment, we measured the multiple RIRs by moving the linear microphone array and compare the measured and estimated RIRs.


2021 ◽  
Author(s):  
Lior Madmoni ◽  
Jacob Donley ◽  
Vladimir Tourbabin ◽  
Boaz Rafaely

2018 ◽  
Vol 8 (10) ◽  
pp. 1956 ◽  
Author(s):  
Thomas McKenzie ◽  
Damian Murphy ◽  
Gavin Kearney

Ambisonics has enjoyed a recent resurgence in popularity due to virtual reality applications. Low order Ambisonic reproduction is inherently inaccurate at high frequencies, which causes poor timbre and height localisation. Diffuse-Field Equalisation (DFE), the theory of removing direction-independent frequency response, is applied to binaural (over headphones) Ambisonic rendering to address high-frequency reproduction. DFE of Ambisonics is evaluated by comparing binaural Ambisonic rendering to direct convolution via head-related impulse responses (HRIRs) in three ways: spectral difference, predicted sagittal plane localisation and perceptual listening tests on timbre. Results show DFE successfully improves frequency reproduction of binaural Ambisonic rendering for the majority of sound source locations, as well as the limitations of the technique, and set the basis for further research in the field.


Sign in / Sign up

Export Citation Format

Share Document