scholarly journals Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

2021 ◽  
Author(s):  
Hansung Kim ◽  
Luca Remaggi ◽  
Aloisio Dourado ◽  
Teofilo de Campos ◽  
Philip J. B. Jackson ◽  
...  

AbstractAs personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.

2017 ◽  
Vol 14 (2) ◽  
pp. 234-252
Author(s):  
Emilia Christie Picelli Sanches ◽  
Claudia Mara Scudelari Macedo ◽  
Juliana Bueno

A acessibilidade na educação de pessoas cegas é um direito que deve ser cumprido. Levando-se em consideração que o design da informação almeja transmitir uma informação de forma efetiva ao receptor, e que uma imagem estática precisa ser adaptada para que um aluno cego tenha acesso a esse conteúdo visual, propõe-se uma maneira de traduzir a informação visual para o tátil. O propósito deste artigo, então, é apresentar um modelo para tradução de imagens estáticas bidimensionais em imagens táteis tridimensionais. Por isso, parte de uma breve revisão da literatura sobre cegueira, percepção tátil e imagens táteis. Na sequência, apresenta o modelo de tradução em três partes: (1) recomendações da literatura; (2) estrutura e (3) modelo preliminar para teste. Depois, descreve o teste do modelo realizado com dois designers com habilidades de modelagem digital (potenciais usuários). Como resultado dos testes, obtiveram-se duas modelagens distintas, uma utilizando da elevação e outra utilizando texturas, porém, os dois participantes realizaram com sucesso a tarefa pretendida. Ainda, a partir dos resultados dos obtidos, também, foi possível perceber falhas no modelo que necessitam ser ajustadas para as próximas etapas da pesquisa.+++++Accessibility in education of blind people is a right that must be fulfilled. Considering that information design aims to transmit an information in an effective way to the receiver, and that a static image needs to be adapted so that a blind student can have access to this visual content, it is proposed a way to translate the visual information to the tactile sense. The purpose of this paper is to present a translating model of static two-dimensional images into three-dimensional tactile images. First, it starts from a brief literature review aboutblindness, tactile perception and tactile images. Second, it presents the translating model in three sections: (1) literature recommendations; (2) structure and (3) finished model for testing. Then, it describes the tests with the model and two designers with digital modelling abilities (potential users). As a result from the tests, two distinct models were obtained, one using elevation and other using textures, although, both participants successfully made the intended task. Also from the test results, it was possible to perceive flaws on the model that need to be adjusted for the next steps of the research.


Author(s):  
Zhiyong Wang ◽  
Dagan Feng

Visual information has been immensely used in various domains such as web, education, health, and digital libraries, due to the advancements of computing technologies. Meanwhile, users realize that it has been more and more difficult to find desired visual content such as images. Though traditional content-based retrieval (CBR) systems allow users to access visual information through query-by-example with low level visual features (e.g. color, shape, and texture), the semantic gap is widely recognized as a hurdle for practical adoption of CBR systems. Wealthy visual information (e.g. user generated visual content) enables us to derive new knowledge at a large scale, which will significantly facilitate visual information management. Besides semantic concept detection, semantic relationship among concepts can also be explored in visual domain, other than traditional textual domain. Therefore, this chapter aims to provide an overview of the state-of-the-arts on discovering semantics in visual domain from two aspects, semantic concept detection and knowledge discovery from visual information at semantic level. For the first aspect, various aspects of visual information annotation are discussed, including content representation, machine learning based annotation methodologies, and widely used datasets. For the second aspect, a novel data driven based approach is introduced to discover semantic relevance among concepts in visual domain. Future research topics are also outlined.


Vision ◽  
2019 ◽  
Vol 3 (4) ◽  
pp. 57 ◽  
Author(s):  
Pia Hauck ◽  
Heiko Hecht

Sound by itself can be a reliable source of information about an object’s size. For instance, we are able to estimate the size of objects merely on the basis of the sound they make when falling on the floor. Moreover, loudness and pitch are crossmodally linked to size. We investigated if sound has an effect on size estimation even in the presence of visual information, that is if the manipulation of the sound produced by a falling object influences visual length estimation. Participants watched videos of wooden dowels hitting a hard floor and estimated their lengths. Sound was manipulated by (A) increasing (decreasing) overall sound pressure level, (B) swapping sounds among the different dowel lengths, and (C) increasing (decreasing) pitch. Results showed that dowels were perceived to be longer with increased sound pressure level (SPL), but there was no effect of swapped sounds or pitch manipulation. However, in a sound-only-condition, main effects of length and pitch manipulation were found. We conclude that we are able to perceive subtle differences in the acoustic properties of impact sounds and use them to deduce object size when visual cues are eliminated. In contrast, when visual cues are available, only loudness is potent enough to exercise a crossmodal influence on length perception.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
L. Fernández ◽  
L. Payá ◽  
O. Reinoso ◽  
L. M. Jiménez ◽  
M. Ballesta

A comparative analysis between several methods to describe outdoor panoramic images is presented. The main objective consists in studying the performance of these methods in the localization process of a mobile robot (vehicle) in an outdoor environment, when a visual map that contains images acquired from different positions of the environment is available. With this aim, we make use of the database provided by Google Street View, which contains spherical panoramic images captured in urban environments and their GPS position. The main benefit of using these images resides in the fact that it permits testing any novel localization algorithm in countless outdoor environments anywhere in the world and under realistic capture conditions. The main contribution of this work consists in performing a comparative evaluation of different methods to describe images to solve the localization problem in an outdoor dense map using only visual information. We have tested our algorithms using several sets of panoramic images captured in different outdoor environments. The results obtained in the work can be useful to select an appropriate description method for visual navigation tasks in outdoor environments using the Google Street View database and taking into consideration both the accuracy in localization and the computational efficiency of the algorithm.


2019 ◽  
Author(s):  
Louisa Lok Yee Man ◽  
Karolina Krzys ◽  
Monica Castelhano

When you walk into a room, you perceive visual information that is both close to you and farther in depth. In the current study, we investigated how visual search is affected by information across scene depth and contrasted it with the effect of semantic scene context. Across two experiments, participants performed search for target objects appearing either in the foreground or background regions within scenes that were either normally configured or had semantically mismatched foreground and background contexts (Chimera scenes; Castelhano, Fernandes, & Theriault, 2018). In Experiment 1, we found participants had shorter latencies and fewer fixations to the target. This pattern was not explained by target size. In Experiment 2, a preview of the scene prior to search was added to better establish scene context prior to search. Results again show a Foreground Bias, with faster search performance for foreground targets. Together, these studies suggest processing differences across depth in scenes, with a preference for objects closer in space.


2018 ◽  
pp. 1662-1685
Author(s):  
Rajarshi Pal

Selective visual attention is an amazing capability of primate visual system to restrict the focus to few interesting objects (or portions) in a scene. Thus, primates are able to pay attention to the required visual content amidst myriads of other visual information. It enables them to interact with the external environment in real time through reduction of computational load in their brain. This inspires image and computer vision scientists to derive computational models of visual attention and to use them in varieties of applications in real-life, mainly to speed up the processing through reduction of computational burden which often characterizes image processing and vision tasks. This chapter discusses a wide variety of such applications of visual attention models in image processing, computer vision and graphics.


2020 ◽  
Vol 30 ◽  
pp. 33-37
Author(s):  
Michael McKnight

Stowaway City is an immersive audio experience that combines electroacoustic composition and storytelling with extended reality. The piece was designed to accommodate multiple listeners in a shared auditory virtual environment. Each listener, based on their tracked position and rotation in space, wirelessly receives an individual binaurally decoded sonic perspective via open-back headphones. The sounds and unfolding narrative are mapped to physical locations in the performance area, which are only revealed through exploration and physical movement. Spatial audio is simultaneously presented to all listeners via a spherical loudspeaker array that supplements the headphone audio, thus forming a hybrid listening environment. The work is presented as a conceptual and technical design paradigm for creative sonic application of the technology in this medium. The author outlines a set of strategies that were used to realize the composition and technical affordances of the system.


Sign in / Sign up

Export Citation Format

Share Document