scholarly journals Epistemic guidance of visual attention for robotic agents in dynamic visual scenes

2021 ◽  
Author(s):  
◽  
Arindam Bhakta

<p>Humans and many animals can selectively sample important parts of their visual surroundings to carry out their daily activities like foraging or finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organism's current task in hand.  Robots or other computational agents operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Developing computational models of visual attention has long been of interest as such models enable artificial systems to select necessary information from complex and cluttered visual environments, hence reducing the data-processing burden.  Biologically inspired computational saliency models have previously been used in selectively sampling a visual scene, but these have limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their visual scene sampling strategy. These models typically select contrast in colour, shape or orientation as salient and sample locations of a visual scene in descending order of salience. After each observation, the area around the sampled location is blocked using inhibition of return mechanism to keep it from being re-visited.  This thesis generalises the traditional model of saliency by using an adaptive Kalman filter estimator to model an agent's understanding of the world and uses a utility function based approach to describe what the agent cares about in the visual scene. This allows the agents to adopt a richer set of perceptual strategies than is possible with the classical winner-take-all mechanism of the traditional saliency model. In contrast with the traditional approach, inhibition of return is achieved without implementing an extra mechanism on top of the underlying structure.  This thesis demonstrates the use of five utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct perceptual behaviour that is matched to particular scenarios.  The resulting visual attention distribution of the five proposed utility functions is demonstrated on five real-life videos.  In most of the experiments, pixel intensity has been used as the source of the saliency map. As the proposed approach is independent of the saliency map used, it can be used with other existing more complex saliency map building models. Moreover, the underlying structure of the model is sufficiently general and flexible, hence it can be used as the base of a new range of more sophisticated gaze control systems.</p>

2021 ◽  
Author(s):  
◽  
Arindam Bhakta

<p>Humans and many animals can selectively sample important parts of their visual surroundings to carry out their daily activities like foraging or finding prey or mates. Selective attention allows them to efficiently use the limited resources of the brain by deploying sensory apparatus to collect data believed to be pertinent to the organism's current task in hand.  Robots or other computational agents operating in dynamic environments are similarly exposed to a wide variety of stimuli, which they must process with limited sensory and computational resources. Developing computational models of visual attention has long been of interest as such models enable artificial systems to select necessary information from complex and cluttered visual environments, hence reducing the data-processing burden.  Biologically inspired computational saliency models have previously been used in selectively sampling a visual scene, but these have limited capacity to deal with dynamic environments and have no capacity to reason about uncertainty when planning their visual scene sampling strategy. These models typically select contrast in colour, shape or orientation as salient and sample locations of a visual scene in descending order of salience. After each observation, the area around the sampled location is blocked using inhibition of return mechanism to keep it from being re-visited.  This thesis generalises the traditional model of saliency by using an adaptive Kalman filter estimator to model an agent's understanding of the world and uses a utility function based approach to describe what the agent cares about in the visual scene. This allows the agents to adopt a richer set of perceptual strategies than is possible with the classical winner-take-all mechanism of the traditional saliency model. In contrast with the traditional approach, inhibition of return is achieved without implementing an extra mechanism on top of the underlying structure.  This thesis demonstrates the use of five utility functions that are used to encapsulate the perceptual state that is valued by the agent. Each utility function thereby produces a distinct perceptual behaviour that is matched to particular scenarios.  The resulting visual attention distribution of the five proposed utility functions is demonstrated on five real-life videos.  In most of the experiments, pixel intensity has been used as the source of the saliency map. As the proposed approach is independent of the saliency map used, it can be used with other existing more complex saliency map building models. Moreover, the underlying structure of the model is sufficiently general and flexible, hence it can be used as the base of a new range of more sophisticated gaze control systems.</p>


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5178
Author(s):  
Sangbong Yoo ◽  
Seongmin Jeong ◽  
Seokyeon Kim ◽  
Yun Jang

Gaze movement and visual stimuli have been utilized to analyze human visual attention intuitively. Gaze behavior studies mainly show statistical analyses of eye movements and human visual attention. During these analyses, eye movement data and the saliency map are presented to the analysts as separate views or merged views. However, the analysts become frustrated when they need to memorize all of the separate views or when the eye movements obscure the saliency map in the merged views. Therefore, it is not easy to analyze how visual stimuli affect gaze movements since existing techniques focus excessively on the eye movement data. In this paper, we propose a novel visualization technique for analyzing gaze behavior using saliency features as visual clues to express the visual attention of an observer. The visual clues that represent visual attention are analyzed to reveal which saliency features are prominent for the visual stimulus analysis. We visualize the gaze data with the saliency features to interpret the visual attention. We analyze the gaze behavior with the proposed visualization to evaluate that our approach to embedding saliency features within the visualization supports us to understand the visual attention of an observer.


Author(s):  
Athanasios Drigas ◽  
Maria Karyotaki

Motivation, affect and cognition are interrelated. However, the control of attentional deployment and more specifically, attempting to provide a more complete account of the interactions between the dorsal and ventral processing streams is still a challenge. The interaction between overt and covert attention is particularly important for models concerned with visual search. Further modeling of such interactions can assist to scrutinize many mechanisms, such as saccadic suppression, dynamic remapping of the saliency map and inhibition of return, covert pre-selection of targets for overt saccades and online understanding of complex visual scenes.


Author(s):  
ARON LARSSON ◽  
JIM JOHANSSON ◽  
LOVE EKENBERG ◽  
MATS DANIELSON

We present a decision tree evaluation method for analyzing multi-attribute decisions under risk, where information is numerically imprecise. The approach extends the use of additive and multiplicative utility functions for supporting evaluation of imprecise statements, relaxing requirements for precise estimates of decision parameters. Information is modeled in convex sets of utility and probability measures restricted by closed intervals. Evaluation is done relative to a set of rules, generalizing the concept of admissibility, computationally handled through optimization of aggregated utility functions. Pros and cons of two approaches, and tradeoffs in selecting a utility function, are discussed.


2021 ◽  
Author(s):  
Shikha Suman ◽  
Ashutosh Karna ◽  
Karina Gibert

Hierarchical clustering is one of the most preferred choices to understand the underlying structure of a dataset and defining typologies, with multiple applications in real life. Among the existing clustering algorithms, the hierarchical family is one of the most popular, as it permits to understand the inner structure of the dataset and find the number of clusters as an output, unlike popular methods, like k-means. One can adjust the granularity of final clustering to the goals of the analysis themselves. The number of clusters in a hierarchical method relies on the analysis of the resulting dendrogram itself. Experts have criteria to visually inspect the dendrogram and determine the number of clusters. Finding automatic criteria to imitate experts in this task is still an open problem. But, dependence on the expert to cut the tree represents a limitation in real applications like the fields industry 4.0 and additive manufacturing. This paper analyses several cluster validity indexes in the context of determining the suitable number of clusters in hierarchical clustering. A new Cluster Validity Index (CVI) is proposed such that it properly catches the implicit criteria used by experts when analyzing dendrograms. The proposal has been applied on a range of datasets and validated against experts ground-truth overcoming the results obtained by the State of the Art and also significantly reduces the computational cost.


2021 ◽  
Author(s):  
Philipe M. Bujold ◽  
Simone Ferrari-Toniolo ◽  
Leo Chi U Seak ◽  
Wolfram Schultz

AbstractDecisions can be risky or riskless, depending on the outcomes of the choice. Expected Utility Theory describes risky choices as a utility maximization process: we choose the option with the highest subjective value (utility), which we compute considering both the option’s value and its associated risk. According to the random utility maximization framework, riskless choices could also be based on a utility measure. Neuronal mechanisms of utility-based choice may thus be common to both risky and riskless choices. This assumption would require the existence of a utility function that accounts for both risky and riskless decisions. Here, we investigated whether the choice behavior of macaque monkeys in riskless and risky decisions could be described by a common underlying utility function. We found that the utility functions elicited in the two choice scenarios were different from each other, even after taking into account the contribution of subjective probability weighting. Our results suggest that distinct utility representations exist for riskless and risky choices, which could reflect distinct neuronal representations of the utility quantities, or distinct brain mechanisms for risky and riskless choices. The different utility functions should be taken into account in neuronal investigations of utility-based choice.


Author(s):  
Adhi Prahara ◽  
Murinto Murinto ◽  
Dewi Pramudi Ismi

The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.


Author(s):  
Steven P. Tipper ◽  
Bruce Weaver ◽  
Loretta M. Jerreat ◽  
Arloene L. Burak

Author(s):  
Kai Essig ◽  
Oleg Strogan ◽  
Helge Ritter ◽  
Thomas Schack

Various computational models of visual attention rely on the extraction of salient points or proto-objects, i.e., discrete units of attention, computed from bottom-up image features. In recent years, different solutions integrating top-down mechanisms were implemented, as research has shown that although eye movements initially are solely influenced by bottom-up information, after some time goal driven (high-level) processes dominate the guidance of visual attention towards regions of interest (Hwang, Higgins & Pomplun, 2009). However, even these improved modeling approaches are unlikely to generalize to a broader range of application contexts, because basic principles of visual attention, such as cognitive control, learning and expertise, have thus far not sufficiently been taken into account (Tatler, Hayhoe, Land & Ballard, 2011). In some recent work, the authors showed the functional role and representational nature of long-term memory structures for human perceptual skills and motor control. Based on these findings, the chapter extends a widely applied saliency-based model of visual attention (Walther & Koch, 2006) in two ways: first, it computes the saliency map using the cognitive visual attention approach (CVA) that shows a correspondence between regions of high saliency values and regions of visual interest indicated by participants’ eye movements (Oyekoya & Stentiford, 2004). Second, it adds an expertise-based component (Schack, 2012) to represent the influence of the quality of mental representation structures in long-term memory (LTM) and the roles of learning on the visual perception of objects, events, and motor actions.


2018 ◽  
Vol 61 (5) ◽  
pp. 1157-1170 ◽  
Author(s):  
Jiali Liang ◽  
Krista Wilkinson

Purpose A striking characteristic of the social communication deficits in individuals with autism is atypical patterns of eye contact during social interactions. We used eye-tracking technology to evaluate how the number of human figures depicted and the presence of sharing activity between the human figures in still photographs influenced visual attention by individuals with autism, typical development, or Down syndrome. We sought to examine visual attention to the contents of visual scene displays, a growing form of augmentative and alternative communication support. Method Eye-tracking technology recorded point-of-gaze while participants viewed 32 photographs in which either 2 or 3 human figures were depicted. Sharing activities between these human figures are either present or absent. The sampling rate was 60 Hz; that is, the technology gathered 60 samples of gaze behavior per second, per participant. Gaze behaviors, including latency to fixate and time spent fixating, were quantified. Results The overall gaze behaviors were quite similar across groups, regardless of the social content depicted. However, individuals with autism were significantly slower than the other groups in latency to first view the human figures, especially when there were 3 people depicted in the photographs (as compared with 2 people). When participants' own viewing pace was considered, individuals with autism resembled those with Down syndrome. Conclusion The current study supports the inclusion of social content with various numbers of human figures and sharing activities between human figures into visual scene displays, regardless of the population served. Study design and reporting practices in eye-tracking literature as it relates to autism and Down syndrome are discussed. Supplemental Material https://doi.org/10.23641/asha.6066545


Sign in / Sign up

Export Citation Format

Share Document