Lip Motion Features for Biometric Person Recognition

2009 ◽  
pp. 495-532 ◽  
Author(s):  
Maycel Isaac Faraj ◽  
Josef Bigun

The present chapter reports on the use of lip motion as a stand alone biometric modality as well as a modality integrated with audio speech for identity recognition using digit recognition as a support. First, the auhtors estimate motion vectors from images of lip movements. The motion is modeled as the distribution of apparent line velocities in the movement of brightness patterns in an image. Then, they construct compact lip-motion features from the regional statistics of the local velocities. These can be used as alone or merged with audio features to recognize identity or the uttered digit. The author’s present person recognition results using the XM2VTS database representing the video and audio data of 295 people. Furthermore, we present results on digit recognition when it is used in a text prompted mode to verify the liveness of the user. Such user challenges have the intention to reduce replay attack risks of the audio system.

2019 ◽  
Author(s):  
Bria Long ◽  
Patrick Wong ◽  
Michael C. Frank ◽  
Eva Lai ◽  
Peggy Chan ◽  
...  

Play is a universal behavior that is thought to be a critical way for children to learn a wide range of motor, social, and language skills. Empirical studies of play have borne out some of the predictions of classical theories, showing that children preferentially engage with surprising stimuli, will play in order to learn, and generally show a similar progression of increasingly-complex play behaviors through infancy. Past research has also characterized the types of support and guidance that parents offer during guided play with their child, as distinguished from individual free play. However, most of these studies come from Western nations, and relatively few cross-cultural comparisons have been made, despite observations of wide variability in cultural play traditions. The goal of this study is to examine the variability and consistency of play behaviors in a large sample of 1–2-year-old children—a critical period in the development of play behaviors—in two cultural contexts: the United States and Hong Kong. Our investigation covers both individual and guided play, with measures related to joint attention, stereotypical play behaviors, language use, and types of support offered by caregivers during guided play. This rich, annotated corpus of video and audio data also provides an important resource for research on early play.


Author(s):  
Andreas M. Kist ◽  
Pablo Gómez ◽  
Denis Dubrovskiy ◽  
Patrick Schlegel ◽  
Melda Kunduk ◽  
...  

Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533


Author(s):  
Michael Odzer ◽  
Kristina Francke

Abstract The sound of waves breaking on shore, or against an obstruction or jetty, is an immediately recognizable sound pattern which could potentially be employed by a sensor system to identify obstructions. If frequency patterns produced by breaking waves can be reproduced and mapped in a laboratory setting, a foundational understanding of the physics behind this process could be established, which could then be employed in sensor development for navigation. This study explores whether wave-breaking frequencies correlate with the physics behind the collapsing of the wave, and whether frequencies of breaking waves recorded in a laboratory tank will follow the same pattern as frequencies produced by ocean waves breaking on a beach. An artificial “beach” was engineered to replicate breaking waves inside a laboratory wave tank. Video and audio recordings of waves breaking in the tank were obtained, and audio of ocean waves breaking on the shoreline was recorded. The audio data was analysed in frequency charts. The video data was evaluated to correlate bubble sizes to frequencies produced by the waves. The results supported the hypothesis that frequencies produced by breaking waves in the wave tank followed the same pattern as those produced by ocean waves. Analysis utilizing a solution to the Rayleigh-Plesset equation showed that the bubble sizes produced by breaking waves were inversely related to the pattern of frequencies. This pattern can be reproduced in a controlled laboratory environment and extrapolated for use in developing navigational sensors for potential applications in marine navigation such as for use with autonomous ocean vehicles.


2018 ◽  
Vol 7 (3) ◽  
pp. 230-247 ◽  
Author(s):  
Shirley Tan ◽  
Kumi Fukaya ◽  
Shiho Nozaki

Purpose The purpose of this paper is to develop bansho analysis as a research method to improve observation and analysis of instruction in lesson study, which could potentially visualise pupils’ thinking processes in a lesson. Design/methodology/approach The paper opted for a qualitative method of case study analysis. Data are drawn from a Year 6 Japanese Language lesson of a Japanese primary school. Data collection and data analysis are informed by transcript-based lesson analysis. The process of bansho formation is also reproduced based on video and audio data. Findings Bansho analysis illustrates three main patterns of pupils’ thinking processes, namely, variation of ideas, connection of ideas and attention to ideas. Pupils’ opinion sharing at the beginning of the lesson led to a variety of ideas and they were recorded as part of bansho. There, pupils proceeded to establish connection among ideas. Finally, pupils displayed attention to ideas which were recorded on bansho, by returning to ideas that intrigued them. Research limitations/implications There is a need to investigate the teacher’s role in bansho formation processes in order to develop a more comprehensive bansho analysis method. Other teaching and learning materials such as lesson plans and pupils’ notes should also be included in the study of bansho to develop a more comprehensive bansho analysis. Originality/value Bansho analysis proposed in this paper allows educators and researchers to study bansho with visualisation of bansho-related data. It would serve as an invaluable source of evidence during the observation and reflection stage of lesson study cycle.


2020 ◽  
Author(s):  
Enrique Garcia-Ceja ◽  
Vajira Thambawita ◽  
Steven Hicks ◽  
Debesh Jha ◽  
Petter Jakobsen ◽  
...  

In this paper, we present HTAD: A Home Tasks Activities Dataset. The dataset contains wrist-accelerometer and audio data from people performing at-home tasks such as sweeping, brushing teeth, washing hands, or watching TV. These activities represent a subset of activities that are needed to be able to live independently. Being able to detect activities with wearable devices in real-time has the potential for the realization of assistive technologies with applications in different domains such as elderly care and mental health monitoring. Preliminary results show that using machine learning with the dataset leads to promising results, but also that there is still improvement potential. By making this dataset public, researchers can test different machine learning algorithms for activity recognition, especially, sensor data fusion methods.


Author(s):  
Paul McIlvenny

Consumer versions of the passive 360° and stereoscopic omni-directional camera have recently come to market, generating new possibilities for qualitative video data collection. This paper discusses some of the methodological issues raised by collecting, manipulating and analysing complex video data recorded with 360° cameras and ambisonic microphones. It also reports on the development of a simple, yet powerful prototype to support focused engagement with such 360° recordings of a scene. The paper proposes that we ‘inhabit’ video through a tangible interface in virtual reality (VR) in order to explore complex spatial video and audio recordings of a single scene in which social interaction took place. The prototype is a software package called AVA360VR (‘Annotate, Visualise, Analyse 360° video in VR’). The paper is illustrated through a number of video clips, including a composite video of raw and semi-processed multi-cam recordings, a 360° video with spatial audio, a video comprising a sequence of static 360° screenshots of the AVA360VR interface, and a video comprising several screen capture clips of actual use of the tool. The paper discusses the prototype’s development and its analytical possibilities when inhabiting spatial video and audio footage as a complementary mode of re-presenting, engaging with, sharing and collaborating on interactional video data.


Sign in / Sign up

Export Citation Format

Share Document