scholarly journals Creation of Auditory Augmented Reality Using a Position-Dynamic Binaural Synthesis System—Technical Components, Psychoacoustic Needs, and Perceptual Evaluation

2021 ◽  
Vol 11 (3) ◽  
pp. 1150
Author(s):  
Stephan Werner ◽  
Florian Klein ◽  
Annika Neidhardt ◽  
Ulrike Sloma ◽  
Christian Schneiderwind ◽  
...  

For a spatial audio reproduction in the context of augmented reality, a position-dynamic binaural synthesis system can be used to synthesize the ear signals for a moving listener. The goal is the fusion of the auditory perception of the virtual audio objects with the real listening environment. Such a system has several components, each of which help to enable a plausible auditory simulation. For each possible position of the listener in the room, a set of binaural room impulse responses (BRIRs) congruent with the expected auditory environment is required to avoid room divergence effects. Adequate and efficient approaches are methods to synthesize new BRIRs using very few measurements of the listening room. The required spatial resolution of the BRIR positions can be estimated by spatial auditory perception thresholds. Retrieving and processing the tracking data of the listener’s head-pose and position as well as convolving BRIRs with an audio signal needs to be done in real-time. This contribution presents work done by the authors including several technical components of such a system in detail. It shows how the single components are affected by psychoacoustics. Furthermore, the paper also discusses the perceptive effect by means of listening tests demonstrating the appropriateness of the approaches.

2013 ◽  
Vol 2013 ◽  
pp. 1-16 ◽  
Author(s):  
Anastasios Alexandridis ◽  
Anthony Griffin ◽  
Athanasios Mouchtaris

This paper proposes a real-time method for capturing and reproducing spatial audio based on a circular microphone array. Following a different approach than other recently proposed array-based methods for spatial audio, the proposed method estimates the directions of arrival of the active sound sources on a per time-frame basis and performs source separation with a fixed superdirective beamformer, which results in more accurate modelling and reproduction of the recorded acoustic environment. The separated source signals are downmixed into one monophonic audio signal, which, along with side information, is transmitted to the reproduction side. Reproduction is possible using either headphones or an arbitrary loudspeaker configuration. The method is compared with other recently proposed array-based spatial audio methods through a series of listening tests for both simulated and real microphone array recordings. Reproduction using both loudspeakers and headphones is considered in the listening tests. As the results indicate, the proposed method achieves excellent spatialization and sound quality.


2020 ◽  
Author(s):  
Lieber Po-Hung Li ◽  
Ji-Yan Han ◽  
Wei-Zhong Zheng ◽  
Ren-Jie Huang ◽  
Ying-Hui Lai

BACKGROUND The cochlear implant technology is a well-known approach to help deaf patients hear speech again. It can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning–based noise reduction (NR), such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. OBJECTIVE Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to (1) propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T, (2) examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests, and (3) investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. METHODS The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores, as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled ten cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. RESULTS The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding STOI and PESQ scores. Therefore, the parameters of layer three in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only needs a quarter of the number of parameters compared to the NC+DDAE. CONCLUSIONS This study demonstrated that knowledge transfer technology can help to reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


Author(s):  
Luis Almeida ◽  
Paulo Menezes ◽  
Jorge Dias

The socialization between elderly people assumes a key role on their mind and body well-being while loneliness expects to be one of major problems of our increasing age society. This research aims to study and develop a framework to support elderly people socialization when they are confined to their homes for some reason. It can be also adequate for people following some neurological or physical rehabilitation treatment remotely or monitoring behaviors in order to prevent potential diseases. This work proposes a framework that supports the socialization through Augmented Reality (AR) based on telepresence. The aim is a low cost solution that enables users to communicate and interact remotely, experiencing the benefits of a face-to-face meeting. The authors explore computers graphics, spatial audio, and artificial vision to induce sensations of being physical in the presence of other people and exploit the potential activities that such frameworks enable. TV and phones are elderly common companion devices that should be complementarily used with emergent AR technologies to enhance and create the remote presence feeling, minimizing the loneliness. Inspired by Virtual Reality (VR) studies, one of the authors’ goals is to explore if VR presence measurement instruments are useful in the AR context by reviewing literature on the area.


2018 ◽  
Vol 8 (10) ◽  
pp. 1956 ◽  
Author(s):  
Thomas McKenzie ◽  
Damian Murphy ◽  
Gavin Kearney

Ambisonics has enjoyed a recent resurgence in popularity due to virtual reality applications. Low order Ambisonic reproduction is inherently inaccurate at high frequencies, which causes poor timbre and height localisation. Diffuse-Field Equalisation (DFE), the theory of removing direction-independent frequency response, is applied to binaural (over headphones) Ambisonic rendering to address high-frequency reproduction. DFE of Ambisonics is evaluated by comparing binaural Ambisonic rendering to direct convolution via head-related impulse responses (HRIRs) in three ways: spectral difference, predicted sagittal plane localisation and perceptual listening tests on timbre. Results show DFE successfully improves frequency reproduction of binaural Ambisonic rendering for the majority of sound source locations, as well as the limitations of the technique, and set the basis for further research in the field.


2020 ◽  
Vol 161 ◽  
pp. 107179
Author(s):  
Jose J. Lopez ◽  
Pablo Gutierrez-Parera ◽  
Lauri Savioja

Sign in / Sign up

Export Citation Format

Share Document