scholarly journals Evaluation of head-tracked binaural auralizations of speech signals generated with a virtual artificial head in anechoic and classroom environments

Acta Acustica ◽  
2021 ◽  
Vol 5 ◽  
pp. 30
Author(s):  
Mina Fallahi ◽  
Martin Hansen ◽  
Simon Doclo ◽  
Steven van de Par ◽  
Dirk Püschel ◽  
...  

In order to realize binaural auralizations with head tracking, BRIRs of individual listeners are needed for different head orientations. In this contribution, a filter-and-sum beamformer, referred to as virtual artificial head (VAH), was used to synthesize the BRIRs. To this end, room impulse responses were first measured with a VAH, using a planar microphone array with 24 microphones, for one fixed orientation, in an anechoic and a reverberant room. Then, individual spectral weights for 185 orientations of the listener’s head were calculated with different parameter sets. Parameters included the number and the direction of the sources considered in the calculation of spectral weights as well as the required minimum mean white noise gain (WNGm). For both acoustical environments, the quality of the resulting synthesized BRIRs was assessed perceptually in head-tracked auralizations, in direct comparison to real loudspeaker playback in the room. Results showed that both rooms could be auralized with the VAH for speech signals in a perceptually convincing manner, by employing spectral weights calculated with 72 source directions from the horizontal plane. In addition, low resulting WNGm values should be avoided. Furthermore, in the dynamic binaural auralization with speech signals in this study, individual BRIRs seemed to offer no advantage over non-individual BRIRs, confirming previous results that were obtained with simulated BRIRs.

2021 ◽  
Vol 11 (15) ◽  
pp. 6793
Author(s):  
Mina Fallahi ◽  
Martin Hansen ◽  
Simon Doclo ◽  
Steven van de Par ◽  
Dirk Püschel ◽  
...  

As an alternative to conventional artificial heads, a virtual artificial head (VAH), i.e., a microphone array-based filter-and-sum beamformer, can be used to create binaural renderings of spatial sound fields. In contrast to conventional artificial heads, a VAH enables one to individualize the binaural renderings and to incorporate head tracking. This can be achieved by applying complex-valued spectral weights—calculated using individual head related transfer functions (HRTFs) for each listener and for different head orientations—to the microphone signals of the VAH. In this study, these spectral weights were applied to measured room impulse responses in an anechoic room to synthesize individual binaural room impulse responses (BRIRs). In the first part of the paper, the results of localizing virtual sources generated with individually synthesized BRIRs and measured BRIRs using a conventional artificial head, for different head orientations, were assessed in comparison with real sources. Convincing localization performances could be achieved for virtual sources generated with both individually synthesized and measured non-individual BRIRs with respect to azimuth and externalization. In the second part of the paper, the results of localizing virtual sources were compared in two listening tests, with and without head tracking. The positive effect of head tracking on the virtual source localization performance confirmed a major advantage of the VAH over conventional artificial heads.


2021 ◽  
Vol 263 (2) ◽  
pp. 4598-4607
Author(s):  
Haruka Matsuhashi ◽  
Izumi Tsunokuni ◽  
Yusuke Ikeda

Measurements of Room Impulse Responses (RIRs) at multiple points have been used in various acoustic techniques using the room acoustic characteristics. To obtain multi-point RIRs more efficiently, spatial interpolation of RIRs using plane wave decomposition method (PWDM) and equivalent source method (ESM) has been proposed. Recently, the estimation of RIRs from a small number of microphones using spatial and temporal sparsity has been studied. In this study, by using the measured RIRs, we compare the estimation accuracies of RIRs interpolation methods with a small number of fixed microphones. In particular, we consider the early and late reflections separately. The direct sound and early reflection components are represented using sparse ESM, and the late reflection component is represented using ESM or PWDM. And then, we solve the two types of optimization problems: individual optimization problems for early and late reflections decomposed by the arrival time and a single optimization problem for direct sound and all reflections. In the evaluation experiment, we measured the multiple RIRs by moving the linear microphone array and compare the measured and estimated RIRs.


2021 ◽  
Author(s):  
Lior Madmoni ◽  
Jacob Donley ◽  
Vladimir Tourbabin ◽  
Boaz Rafaely

2005 ◽  
Vol 14 (5) ◽  
pp. 606-615 ◽  
Author(s):  
Katerina Mania ◽  
Andrew Robinson ◽  
Karen R. Brandt

Prior theoretical work on memory schemas, an influential concept of memory from the field of cognitive psychology, is presented for application to fidelity of computer graphics simulations. The basic assumption is that an individual's prior experience will influence how he or she perceives, comprehends, and remembers new information in a scene. Schemas are knowledge structures; a scene could incorporate objects that fit into a specific context or schema (e.g., an academic's office) referred to as consistent objects. A scene could also include objects that are not related to the schema in place referred to as inconsistent objects. In this paper, we describe ongoing development of a rendering framework related to scene perception based on schemas. An experiment was carried out to explore the effect of object type and rendering quality on object memory recognition in a room. The computer graphics simulation was displayed on a Head Mounted Display (HMD) utilizing stereo imagery and head tracking. Thirty-six participants across three conditions of varied rendering quality of the same space were exposed to the computer graphics environment and completed a memory recognition task. Results revealed that schema consistent elements of the scene were more likely to be recognized than inconsistent information. Overall higher confidence ratings were assigned for consistent objects compared to inconsistent ones. Total object recognition was better for the mid-quality condition compared to the low-quality one. The presence of shadow information, though, did not affect recognition of either consistent or inconsistent objects. Further explorations of the effect of schemas on spatial awareness in synthetic worlds could lead to identifying areas of a computer graphics scene that require better quality of rendering as well as areas for which lower fidelity could be sufficient. The ultimate goal of this work is to simulate a perceptual process rather than to simulate physics.


2017 ◽  
Vol 10 (13) ◽  
pp. 382
Author(s):  
Khadar Nawas K

A review on multimodal speaker recognition (SR) is being presented. For many decades the speaker recognition has been studied and still it has grabbed the interest of many researchers. Speaker recognition includes of two levels –system training and system testing. The robustness of the speaker recognition system depends on the training environment and testing environment as well as  the quality of  speech .Air conducted (AC) Speech is a source from  which speaker is recognized by extracting the features. The performance of the speaker recognition system depends on AC speech. further to improve the robustness  and accuracy of  the SR system various other sources(Modals) like Throat Microphone ,Bone Conduction Microphone, array of microphones,Non Audible murmur, non auditory information like video are used in complementary with standard AC microphone. This paper is purely a review on SR and various complimentary modals.


1995 ◽  
Vol 73 (6) ◽  
pp. 2293-2301 ◽  
Author(s):  
F. A. Keshner ◽  
B. W. Peterson

1. Potential mechanisms for controlling stabilization of the head and neck include voluntary movements, vestibular (VCR) and proprioceptive (CCR) neck reflexes, and system mechanics. In this study we have tested the hypothesis that the relative importance of those mechanisms in producing compensatory actions of the head-neck motor system depends on the frequency of an externally applied perturbation. Angular velocity of the head with respect to the trunk (neck) and myoelectric activity of three neck muscles were recorded in seven seated subjects during pseudorandom rotations of the trunk in the horizontal plane. Subjects were externally perturbed with a random sum-of-sines stimulus at frequencies ranging from 0.185 to 4.11 Hz. Four instructional sets were presented. Voluntary mechanisms were examined by having the subjects actively stabilize the head in the presence of visual feedback as the body was rotated (VS). Visual feedback was then removed, and the subjects attempted to stabilize the head in the dark as the body was rotated (NV). Reflex mechanisms were examined when subjects performed a mental arithmetic task during body rotations in the dark (MA). Finally, subjects performed a voluntary head tracking task while the body was kept stationary (VT). 2. Gains and phases of head velocity indicated good compensation to the stimulus in VS and NV at frequencies < 1 Hz. Gains dropped and phases advanced between 1 and 2 Hz, suggesting interference between neural and mechanical components. Above 3 Hz, the gains of head velocity increased steeply and exceeded unity, suggesting the emergence of mechanical resonance.(ABSTRACT TRUNCATED AT 250 WORDS)


Author(s):  
J Hu ◽  
C C Cheng ◽  
W H Liu

For intelligent robots to interact with people, an efficient human-robot communication interface is very important (e.g. voice command). However, recognizing voice command or speech represents only part of speech communication. The physics of speech signals includes other information, such as speaker direction. Secondly, a basic element of processing the speech signal is recognition at the acoustic level. However, the performance of recognition depends greatly on the reception. In a noisy environment, the success rate can be very poor. As a result, prior to speech recognition, it is important to process the speech signals to extract the needed content while rejecting others (such as background noise). This paper presents a speech purification system for robots to improve the signal-to-noise ratio of reception and an algorithm with a multidirection calibration beamformer.


Sign in / Sign up

Export Citation Format

Share Document