scholarly journals Inhabiting spatial video and audio data: Towards a scenographic turn in the analysis of social interaction

Author(s):  
Paul McIlvenny

Consumer versions of the passive 360° and stereoscopic omni-directional camera have recently come to market, generating new possibilities for qualitative video data collection. This paper discusses some of the methodological issues raised by collecting, manipulating and analysing complex video data recorded with 360° cameras and ambisonic microphones. It also reports on the development of a simple, yet powerful prototype to support focused engagement with such 360° recordings of a scene. The paper proposes that we ‘inhabit’ video through a tangible interface in virtual reality (VR) in order to explore complex spatial video and audio recordings of a single scene in which social interaction took place. The prototype is a software package called AVA360VR (‘Annotate, Visualise, Analyse 360° video in VR’). The paper is illustrated through a number of video clips, including a composite video of raw and semi-processed multi-cam recordings, a 360° video with spatial audio, a video comprising a sequence of static 360° screenshots of the AVA360VR interface, and a video comprising several screen capture clips of actual use of the tool. The paper discusses the prototype’s development and its analytical possibilities when inhabiting spatial video and audio footage as a complementary mode of re-presenting, engaging with, sharing and collaborating on interactional video data.

Author(s):  
Andreas M. Kist ◽  
Pablo Gómez ◽  
Denis Dubrovskiy ◽  
Patrick Schlegel ◽  
Melda Kunduk ◽  
...  

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533


Author(s):  
Michael Odzer ◽  
Kristina Francke

Abstract The sound of waves breaking on shore, or against an obstruction or jetty, is an immediately recognizable sound pattern which could potentially be employed by a sensor system to identify obstructions. If frequency patterns produced by breaking waves can be reproduced and mapped in a laboratory setting, a foundational understanding of the physics behind this process could be established, which could then be employed in sensor development for navigation. This study explores whether wave-breaking frequencies correlate with the physics behind the collapsing of the wave, and whether frequencies of breaking waves recorded in a laboratory tank will follow the same pattern as frequencies produced by ocean waves breaking on a beach. An artificial “beach” was engineered to replicate breaking waves inside a laboratory wave tank. Video and audio recordings of waves breaking in the tank were obtained, and audio of ocean waves breaking on the shoreline was recorded. The audio data was analysed in frequency charts. The video data was evaluated to correlate bubble sizes to frequencies produced by the waves. The results supported the hypothesis that frequencies produced by breaking waves in the wave tank followed the same pattern as those produced by ocean waves. Analysis utilizing a solution to the Rayleigh-Plesset equation showed that the bubble sizes produced by breaking waves were inversely related to the pattern of frequencies. This pattern can be reproduced in a controlled laboratory environment and extrapolated for use in developing navigational sensors for potential applications in marine navigation such as for use with autonomous ocean vehicles.


2021 ◽  
Vol 19 (4) ◽  
pp. 601-618
Author(s):  
Джефф Хиггинботам ◽  
Кайла Конуэй ◽  
Антара Сатчидананд

The purpose of this article is to provide the reader with tools and recommendations for collecting data and making microanalytic transcriptions of interaction involving people using Augmentative Communication Technologies (ACTs). This is of interest for clinicians, as well as anyone else engaged in video-based microanalysis of technology mediated interaction in other contexts. The information presented here has particular relevance to young researchers developing their own methodologies, and experienced scientists interested in social interaction research in ACTs or as well as other digital communication technologies. Tools and methods for recording social interactions to support microanalysis by making unobtrusive recordings of naturally occurring or task-driven social interactions while minimizing recording-related distractions which could alter the authenticity of the social interaction are discussed. Recommendations for the needed functionality of video and audio recording equipment are made with tips for how to capture actions that are important to the research question as opposed to capturing 'generally usable' video. In addition, tips for processing video and managing video data are outlined, including how to develop optimally functional naming conventions for stored videos, how and where to store video data (i. e. use of external hard drives, compressing videos for storage) and syncing multiple videos, offering different views of a single interaction (i. e. syncing footage of the overall interaction with footage of the device display). Finally, tools and strategies for transcription are discussed including a brief description of the role transcription plays in analysis, a suggested framework for how transcription might proceed through multiple passes, each focused on a different aspect of communication, transcription software options along with discussion of specific features that aide transcription. In addition, special issues that arise in transcribing interactions involving ACTs are addressed.


2008 ◽  
Vol 18 (06) ◽  
pp. 481-489 ◽  
Author(s):  
COLIN FYFE ◽  
WESAM BARBAKH ◽  
WEI CHUAN OOI ◽  
HANSEOK KO

We review a new form of self-organizing map which is based on a nonlinear projection of latent points into data space, identical to that performed in the Generative Topographic Mapping (GTM).1 But whereas the GTM is an extension of a mixture of experts, this model is an extension of a product of experts.2 We show visualisation and clustering results on a data set composed of video data of lips uttering 5 Korean vowels. Finally we note that we may dispense with the probabilistic underpinnings of the product of experts and derive the same algorithm as a minimisation of mean squared error between the prototypes and the data. This leads us to suggest a new algorithm which incorporates local and global information in the clustering. Both ot the new algorithms achieve better results than the standard Self-Organizing Map.


Pragmatics ◽  
2002 ◽  
Vol 12 (2) ◽  
pp. 153-182 ◽  
Author(s):  
Tomoyo Takagi

The phenomenon of “elliptical” expressions in Japanese has been extensively studied in the field of Japanese linguistics. However, this phenomenon has been often treated as a general syntactic feature of Japanese, and the question of how this feature is realized in actual use of the language has been rather neglected. The present paper is intended to analyze how speakers of Japanese actually deal with the task of interpreting unexpressed elements that emerge in their talk in interaction. Using video- and audio-data of naturally occurring conversations in Japanese, it is shown that, in producing and understanding utterances involving unexpressed referents, conversational parties utilize not only their morphological and syntactic knowledge but also various, multilayered resources that are available to them in the immediate context of interaction.


Author(s):  
Marcel Nikmon ◽  
Roman Budjač ◽  
Daniel Kuchár ◽  
Peter Schreiber ◽  
Dagmar Janáčová

Abstract Deep learning is a kind of machine learning, and machine learning is a kind of artificial intelligence. Machine learning depicts groups of various technologies, and deep learning is one of them. The use of deep learning is an integral part of the current data classification practice in today’s world. This paper introduces the possibilities of classification using convolutional networks. Experiments focused on audio and video data show different approaches to data classification. Most experiments use the well-known pre-trained AlexNet network with various pre-processing types of input data. However, there are also comparisons of other neural network architectures, and we also show the results of training on small and larger datasets. The paper comprises description of eight different kinds of experiments. Several training sessions were conducted in each experiment with different aspects that were monitored. The focus was put on the effect of batch size on the accuracy of deep learning, including many other parameters that affect deep learning [1].


Author(s):  
Colin Mackenzie ◽  
Yan Xiao ◽  
Peter Hu ◽  
F. Jacob Seagull ◽  
Camille Hammond ◽  
...  

Improved safety is an important goal, but there is difficulty in gathering data and identifying practices that lessen the margin of patient safety in real dynamic complex medical workplaces. Video clips as data are a rich source to examine safety performance. Video clips have utility for participants to review their activities and for analysts to extract quantitative data. Focusing video data collection around brief, risky but beneficial tasks, to illustrate patterns of use that occur in Trauma Centers during patients' resuscitation, can simplify participation consent, confidentiality and data analysis problems. However such video clip acquisition (5–15 minute duration) does not appear to compromise the quality of the content, that can facilitate identification of team performance, communication, ergonomic, and systems factors affecting patient safety. Comparisons of task performance under two levels of task urgency was particularly revealing of areas where patient safety performance can be improved and allowed identification of preventive strategies to minimize the effects of safety infractions.


2018 ◽  
Vol 173 ◽  
pp. 03021
Author(s):  
Yaqing Liu ◽  
Lunhui Deng

This design introduces the theoretical basis of digital audio embedding and de-embedding, and proposes a solution that Verilog language can be used to achieve 3G-SDI audio embedding and de-embedding. SDI video and audio data are input to the FPGA, and the audio signals can be embedded in the SDI line blanking after processing. Moreover, some auxiliary information is embedded in the SDI data, when you need these auxiliary information, you need to use the audio de-embedding process. The process of audio de-embedding is inversed with the process of embedding. It has been proved through practice that this scheme can effectively embed digital audio in SDI data stream, synchronize audio and video data, and can de-embed audio signal. The design is very versatile and can improve the efficiency of the design, thus effectively reducing the cost of the product.


2021 ◽  
Vol 3 (4) ◽  
pp. 119-126
Author(s):  
Hadeel EJMAIL

Death is one of the most difficult topics a person can talk about. The human being is busy with how to continue his life and improve its conditions. This study aims is to explore the writing of Facebook pages of the dead. The research used the qualitative approach through a content analysis, where (50) publications were found on fifteen pages of a dead person with an intentional sample, and the results of the research showed that writing people in the pages of the dead included two directions, the first direction is a desire to immortalize the dead and a kind of preserving their roots Alive. As for the other direction, it was weeping over their ruins and showing the end of a person's death and his end life. Sometimes in the same post include both directions together, meaning "the use of the deceased’s account by his family by changing the profile picture of the dead, and at the same time inviting the deceased’s friends through his page to the memorial event. People write on the pages of the dead in order to weep over their ruins on the one hand, and to immortalize their memories on the other side. Facebook as a social platform and the interaction of people with the pages of the dead shows the great social interaction that takes place in this space, and research in this field is not consistent with one and only claim, as some posts are either temporary or permanent; Therefore, I have used screen capture technology to collect and retain information. The pages of the dead included referring to them, writing memorials and longing, etc. Facebook has become a social platform that allows those who lose a dear person to share their grief through it, and enables them to deal with death and relieve their pain


Sign in / Sign up

Export Citation Format

Share Document