visual cues
Recently Published Documents





2022 ◽  
Vol 40 (3) ◽  
pp. 1-25
Dan Li ◽  
Tong Xu ◽  
Peilun Zhou ◽  
Weidong He ◽  
Yanbin Hao ◽  

Person search has long been treated as a crucial and challenging task to support deeper insight in personalized summarization and personality discovery. Traditional methods, e.g., person re-identification and face recognition techniques, which profile video characters based on visual information, are often limited by relatively fixed poses or small variation of viewpoints and suffer from more realistic scenes with high motion complexity (e.g., movies). At the same time, long videos such as movies often have logical story lines and are composed of continuously developmental plots. In this situation, different persons usually meet on a specific occasion, in which informative social cues are performed. We notice that these social cues could semantically profile their personality and benefit person search task in two aspects. First, persons with certain relationships usually co-occur in short intervals; in case one of them is easier to be identified, the social relation cues extracted from their co-occurrences could further benefit the identification for the harder ones. Second, social relations could reveal the association between certain scenes and characters (e.g., classmate relationship may only exist among students), which could narrow down candidates into certain persons with a specific relationship. In this way, high-level social relation cues could improve the effectiveness of person search. Along this line, in this article, we propose a social context-aware framework, which fuses visual and social contexts to profile persons in more semantic perspectives and better deal with person search task in complex scenarios. Specifically, we first segment videos into several independent scene units and abstract out social contexts within these scene units. Then, we construct inner-personal links through a graph formulation operation for each scene unit, in which both visual cues and relation cues are considered. Finally, we perform a relation-aware label propagation to identify characters’ occurrences, combining low-level semantic cues (i.e., visual cues) and high-level semantic cues (i.e., relation cues) to further enhance the accuracy. Experiments on real-world datasets validate that our solution outperforms several competitive baselines.

2022 ◽  
Vol 15 (1) ◽  
pp. 1-16
Francisca Pessanha ◽  
Almila Akdag Salah

Computational technologies have revolutionized the archival sciences field, prompting new approaches to process the extensive data in these collections. Automatic speech recognition and natural language processing create unique possibilities for analysis of oral history (OH) interviews, where otherwise the transcription and analysis of the full recording would be too time consuming. However, many oral historians note the loss of aural information when converting the speech into text, pointing out the relevance of subjective cues for a full understanding of the interviewee narrative. In this article, we explore various computational technologies for social signal processing and their potential application space in OH archives, as well as neighboring domains where qualitative studies is a frequently used method. We also highlight the latest developments in key technologies for multimedia archiving practices such as natural language processing and automatic speech recognition. We discuss the analysis of both visual (body language and facial expressions), and non-visual cues (paralinguistics, breathing, and heart rate), stating the specific challenges introduced by the characteristics of OH collections. We argue that applying social signal processing to OH archives will have a wider influence than solely OH practices, bringing benefits for various fields from humanities to computer sciences, as well as to archival sciences. Looking at human emotions and somatic reactions on extensive interview collections would give scholars from multiple fields the opportunity to focus on feelings, mood, culture, and subjective experiences expressed in these interviews on a larger scale.

2022 ◽  
Vol 9 ◽  
Diana Rubene ◽  
Utku Urhan ◽  
Velemir Ninkovic ◽  
Anders Brodin

Ability to efficiently localize productive foraging habitat is crucial for nesting success of insectivorous birds. Some bird species can use olfaction to identify caterpillar-infested trees by detection of herbivore induced plant volatiles (HIPVs), but these cues probably need to be learned. So far, we know very little about the process of olfactory learning in birds, whether insectivorous species have a predisposition for detecting and learning HIPVs, due to the high ecological significance of these odors, and how olfaction is integrated with vision in making foraging decisions. In a standardized setup, we tested whether 35 wild-caught great tits (Parus major) show any preference for widely abundant HIPVs compared to neutral (non-induced) plant odors, how fast they learn to associate olfactory, visual and multimodal foraging cues with food, and whether the olfactory preferences and learning speed were influenced by bird sex or habitat (urban or rural). We also tested how fast birds switch to a new cue of the same modality. Great tits showed no initial preference for HIPVs compared to neutral odors, and they learned all olfactory cues at a similar pace, except for methyl salicylate (MeSA), which they learned more slowly. We also found no differences in learning speeds between visual, olfactory and multimodal foraging cues, but birds learned the second cue they were offered faster than the first one. Bird sex or habitat had no effect on learning speed or olfactory preference, but urban birds tended to learn visual cues more slowly. We conclude that insectivorous birds utilize olfactory and visual cues with similar efficiency in foraging, and that they probably don‘t have any special predisposition toward the tested HIPVs. These results confirm that great tits are flexible foragers with good learning abilities.

Foods ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 201
Stella Nordhagen ◽  
James Lee ◽  
Nwando Onuigbo-Chatta ◽  
Augustine Okoruwa ◽  
Eva Monterrosa ◽  

This paper uses detailed data from in-depth interviews with consumers (n = 47) and vendors (n = 37) in three traditional markets in Birnin Kebbi, Nigeria. We used observations from those markets to examine how consumers and vendors identify and avoid or manage food safety risks and whom they hold responsible and trust when it comes to ensuring food safety. At the level of the vendor, consumers mentioned seeking “clean” or “neat” vendors or stalls. Cleanliness was primarily related to the appearance of the vendor, stall, and surroundings; reliance on trusted, known vendors was also noted. Food products themselves were largely evaluated based on visual cues: insects, holes, and colors—with some reliance on smell, also. Similarly, vendors assessed safety of food from suppliers based on a visual assessment or reliance on trusted relationships. On the second research question, both consumers and vendors largely placed responsibility for ensuring food safety on government; when asked specifically, consumers also named specific steps that vendors could take to ensure food safety. Consumers and vendors also generally felt that they could limit many food safety risks through identifying the “good” products in the market or from suppliers. The paper discusses the implications of these results for behavior change interventions.

Eduardo Guimarães Santos ◽  
Lucas Camelo Depollo ◽  
Ricardo Bomfim Machado ◽  
Helga Correa Wiederhecker

2022 ◽  
Nicole E Wynne ◽  
Karthikeyan Chandrasegaran ◽  
Lauren Fryzlewicz ◽  
Clément Vinauger

The diurnal mosquitoes Aedes aegypti are vectors of several arboviruses, including dengue, yellow fever, and Zika viruses. To find a host to feed on, they rely on the sophisticated integration of olfactory, visual, thermal, and gustatory cues reluctantly emitted by the hosts. If detected by their target, this latter may display defensive behaviors that mosquitoes need to be able to detect and escape. In humans, a typical response is a swat of the hand, which generates both mechanical and visual perturbations aimed at a mosquito. While the neuro-sensory mechanisms underlying the approach to the host have been the focus of numerous studies, the cues used by mosquitoes to detect and identify a potential threat remain largely understudied. In particular, the role of vision in mediating mosquitoes' ability to escape defensive hosts has yet to be analyzed. Here, we used programmable visual displays to generate expanding objects sharing characteristics with the visual component of an approaching hand and quantified the behavioral response of female mosquitoes. Results show that Ae. aegypti is capable of using visual information to decide whether to feed on an artificial host mimic. Stimulations delivered in a LED flight arena further reveal that landed females Ae. aegypti display a stereotypical escape strategy by taking off at an angle that is a function of the distance and direction of stimulus introduction. Altogether, this study demonstrates mosquitoes can use isolated visual cues to detect and avoid a potential threat.

Elke B. Lange ◽  
Jens Fünderich ◽  
Hartmut Grimm

AbstractWe investigated how visual and auditory information contributes to emotion communication during singing. Classically trained singers applied two different facial expressions (expressive/suppressed) to pieces from their song and opera repertoire. Recordings of the singers were evaluated by laypersons or experts, presented to them in three different modes: auditory, visual, and audio–visual. A manipulation check confirmed that the singers succeeded in manipulating the face while keeping the sound highly expressive. Analyses focused on whether the visual difference or the auditory concordance between the two versions determined perception of the audio–visual stimuli. When evaluating expressive intensity or emotional content a clear effect of visual dominance showed. Experts made more use of the visual cues than laypersons. Consistency measures between uni-modal and multimodal presentations did not explain the visual dominance. The evaluation of seriousness was applied as a control. The uni-modal stimuli were rated as expected, but multisensory evaluations converged without visual dominance. Our study demonstrates that long-term knowledge and task context affect multisensory integration. Even though singers’ orofacial movements are dominated by sound production, their facial expressions can communicate emotions composed into the music, and observes do not rely on audio information instead. Studies such as ours are important to understand multisensory integration in applied settings.

2022 ◽  
Vol 12 ◽  
Daphne J. Geerse ◽  
Bert Coolen ◽  
Jacobus J. van Hilten ◽  
Melvyn Roerdink

External visual cueing is a well-known means to target freezing of gait (FOG) in Parkinson's disease patients. Holocue is a wearable visual cueing application that allows the HoloLens 1 mixed-reality headset to present on-demand patient-tailored action-relevant 2D and 3D holographic visual cues in free-living environments. The aim of this study involving 24 Parkinson's disease patients with dopaminergic “ON state” FOG was two-fold. First, to explore unfamiliarity and habituation effects associated with wearing the HoloLens on FOG. Second, to evaluate the potential immediate effect of Holocue on alleviating FOG in the home environment. Three sessions were conducted to examine (1) the effect of wearing the unfamiliar HoloLens on FOG by comparing walking with and without the HoloLens, (2) habituation effects to wearing the HoloLens by comparing FOG while walking with HoloLens over sessions, and (3) the potential immediate effect of Holocue on FOG by comparing walking with HoloLens with and without Holocue. Wearing the HoloLens (without Holocue) did significantly increase the number and duration of FOG episodes, but this unfamiliarity effect disappeared with habituation over sessions. This not only emphasizes the need for sufficient habituation to unfamiliar devices, but also testifies to the need for research designs with appropriate control conditions when examining effects of unfamiliar wearable cueing devices. Holocue had overall no immediate effect on FOG, although objective and subjective benefits were observed for some individuals, most notably those with long and/or many FOG episodes. Our participants raised valuable opportunities to improve Holocue and confirmed our assumptions about current and anticipated future design choices, which supports ongoing Holocue development for and with end users.

Shrinidhi Kanchi ◽  
Alain Pagani ◽  
Hamam Mokayed ◽  
Marcus Liwicki ◽  
Didier Stricker ◽  

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. The image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network(HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses the dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While the earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3428 from scratch. Thereby, we outperform state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.

2022 ◽  
pp. 795-812
Robert Costello ◽  
Murray Lambert ◽  
Florian Kern

This research investigates how the accessibility of video games can be improved for deaf and hearing-impaired players. The journal is divided into several areas, first, examining the use of subtitles and closed captions used in video games; and second, how visual cues can be used to provide better accessibility for deaf and hearing-impaired gamers. This includes effectively creating suitable atmospheres and mood in games through lighting as well as having a varied environment that prevents the players from getting bored with the setting of a game and finally exploring current best practices within the gaming industry. Through this research data the issues with accessibility can be found as well as how a lack of accessibility affects deaf and hearing-impaired gamers. Research from this investigation supports some of the evidence from other researchers in the field that accessibility features for deaf and hearing-impaired can be considered and implemented.

Sign in / Sign up

Export Citation Format

Share Document