Analyzing vision at the complexity level

1990 ◽  
Vol 13 (3) ◽  
pp. 423-445 ◽  
Author(s):  
John K. Tsotsos

AbstractThe general problem of visual search can be shown to be computationally intractable in a formal, complexity-theoretic sense, yet visual search is extensively involved in everyday perception, and biological systems manage to perform it remarkably well. Complexity level analysis may resolve this contradiction. Visual search can be reshaped into tractability through approximations and by optimizing the resources devoted to visual processing. Architectural constraints can be derived using the minimum cost principle to rule out a large class of potential solutions. The evidence speaks strongly against bottom-up approaches to vision. In particular, the constraints suggest an attentional mechanism that exploits knowledge of the specific problem being solved. This analysis of visual search performance in terms of attentional influences on visual information processing and complexity satisfaction allows a large body of neurophysiological and psychological evidence to be tied together.

2021 ◽  
Author(s):  
Garance Merholz ◽  
Laetitia Grabot ◽  
Rufin VanRullen ◽  
Laura Dugué

AbstractAttention has been found to sample visual information periodically, in a wide range of frequencies below 20 Hz. This periodicity may be supported by brain oscillations at corresponding frequencies. We propose that part of the discrepancy in periodic frequencies observed in the literature is due to differences in attentional demands, resulting from heterogeneity in tasks performed. To test this hypothesis, we used visual search and manipulated task complexity, i.e., target discriminability (high, medium, low) and number of distractors (set size), while electro-encephalography was simultaneously recorded. We replicated previous results showing that the phase of pre-stimulus low-frequency oscillations predicts search performance. Crucially, such effects were observed at increasing frequencies within the theta-alpha range (6-18 Hz) for decreasing target discriminability. In medium and low discriminability conditions, correct responses were further associated with higher post-stimulus phase-locking than incorrect ones, in increasing frequency and latency. Finally, the larger the set size, the later the post-stimulus effect peaked. Together, these results suggest that increased complexity (lower discriminability or larger set size) requires more attentional cycles to perform the task, partially explaining discrepancies between reports of attentional sampling. Low-frequency oscillations structure the temporal dynamics of neural activity and aid top-down, attentional control for efficient visual processing.


Author(s):  
Jeffrey C. Joe ◽  
Casey R. Kovesdi ◽  
Andrea Mack ◽  
Tina Miyake

This study examined the relationship between how visual information is organized and people’s visual search performance. Specifically, we systematically varied how visual search information was organized (from well-organized to disorganized), and then asked participants to perform a visual search task involving finding and identifying a number of visual targets within the field of visual non-targets. We hypothesized that the visual search task would be easier when the information was well-organized versus when it was disorganized. We further speculated that visual search performance would be mediated by cognitive workload, and that the results could be generally described by the well-established speed-accuracy tradeoff phenomenon. This paper presents the details of the study we designed and our results.


2019 ◽  
Author(s):  
Louisa Lok Yee Man ◽  
Karolina Krzys ◽  
Monica Castelhano

When you walk into a room, you perceive visual information that is both close to you and farther in depth. In the current study, we investigated how visual search is affected by information across scene depth and contrasted it with the effect of semantic scene context. Across two experiments, participants performed search for target objects appearing either in the foreground or background regions within scenes that were either normally configured or had semantically mismatched foreground and background contexts (Chimera scenes; Castelhano, Fernandes, & Theriault, 2018). In Experiment 1, we found participants had shorter latencies and fewer fixations to the target. This pattern was not explained by target size. In Experiment 2, a preview of the scene prior to search was added to better establish scene context prior to search. Results again show a Foreground Bias, with faster search performance for foreground targets. Together, these studies suggest processing differences across depth in scenes, with a preference for objects closer in space.


2018 ◽  
Vol 10 (1) ◽  
pp. 19
Author(s):  
Yasuhiro Takeshima

Previous studies have not yet investigated sufficiently the relationship between visual processing and negative emotional valence intensity, which is the degree of a dimensional component included in emotional information. In Experiment 1, participants performed a visual search task with three valence levels: neutral (control) and high- and low-intensity negative emotional valence stimuli (both angry faces). Results indicated that response times for high-intensity negative emotional valence stimuli were shorter than low-intensity ones. In Experiment 2, participants were asked to detect a target face among successively presented faces. Facial stimuli were the same as in Experiment 1. Results revealed that accuracy was higher for angry faces than for neutral faces. However, performance did not differ as a function of negative emotional valence intensity. Overall, the task performance differences between negative emotional valence intensities were observed in visual search, but not in attentional blink. Therefore, negative emotional valence intensity likely contributes to the process of efficient visual information encoding.


2020 ◽  
Author(s):  
Han Zhang

Mind-wandering (MW) is ubiquitous and is associated with reduced performance across a wide range of tasks. Recent studies have shown that MW can be related to changes in gaze parameters. In this dissertation, I explored the link between eye movements and MW in three different contexts that involve complex cognitive processing: visual search, scene perception, and reading comprehension. Study 1 examined how MW affects visual search performance, particularly the ability to suppress salient but irrelevant distractors during visual search. Study 2 used a scene encoding task to study how MW affects how eye movements change over time and their relationship with scene content. Study 3 examined how MW affects readers’ ability to detect semantic incongruities in the text and make necessary revisions of their understanding as they read jokes. All three studies showed that MW was associated with decreased task performance at the behavioral level (e.g., response time, recognition, and recall). Eye-tracking further showed that these behavioral costs can be traced to deficits in specific cognitive processes. The final chapter of this dissertation explored whether there are context-independent eye movement features of MW. MW manifests itself in different ways depending on task characteristics. In tasks that require extensive sampling of the stimuli (e.g., reading and scene viewing), MW was related to a global reduction in visual processing. But this was not the case for the search task, which involved speeded, simple visual processing. MW was instead related to increased looking time on the target after it was already located. MW affects the coupling between cognitive efforts and task demands, but the nature of this decoupling depends on the specific features of particular tasks.


2017 ◽  
Vol 3 (1) ◽  
Author(s):  
Zhiyuan Wang ◽  
Simona Buetti ◽  
Alejandro Lleras

Previous work in our lab has demonstrated that efficient visual search with a fixed target has a reaction time by set size function that is best characterized by logarithmic curves. Further, the steepness of these logarithmic curves is determined by the similarity between target and distractor items (Buetti et al., 2016). A theoretical account of these findings was proposed, namely that a parallel, unlimited capacity, exhaustive processing architecture is underlying such data. Here, we conducted two experiments to expand these findings to a set of real-world stimuli, in both homogeneous and heterogeneous search displays. We used computational simulations of this architecture to identify a way to predict RT performance in heterogeneous search using parameters estimated from homogeneous search data. Further, by examining the systematic deviation from our predictions in the observed data, we found evidence that early visual processing for individual items is not independent. Instead, items in homogeneous displays seemed to facilitate each other’s processing by a multiplicative factor. These results challenge previous accounts of heterogeneity effects in visual search, and demonstrate the explanatory and predictive power of an approach that combines computational simulations and behavioral data to better understand performance in visual search.


2019 ◽  
Author(s):  
Emmanuel Daucé ◽  
Pierre Albiges ◽  
Laurent U Perrinet

AbstractWe develop a visuomotor model that implements visual search as a focal accuracy-seeking policy, with the target’s position and category drawn independently from a common generative process. Consistently with the anatomical separation between the ventral versus dorsal pathways, the model is composed of two pathways, that respectively infer what to see and where to look. The “What” network is a classical deep learning classifier, that only processes a small region around the center of fixation, providing a “foveal” accuracy. In contrast, the “Where” network processes the full visual field in a biomimetic fashion, using a log-polar retinotopic encoding, which is preserved up to the action selection level. The foveal accuracy is used to train the “Where” network. After training, the “Where” network provides an “accuracy map” that serves to guide the eye toward peripheral objects. The comparison of both networks accuracies amounts to either select a saccade or to keep the eye at the center to identify the target. We test this setup on a simple task of finding a digit in a large, cluttered image. Our simulation results demonstrate the effectiveness of this approach, increasing by one order of magnitude the radius of the visual field toward which the agent can detect and recognize a target, either through a single saccade or with multiple ones. Importantly, our log-polar treatment of the visual information exploits the strong compression rate performed at the sensory level, providing ways to implement visual search in a sub-linear fashion, in contrast with mainstream computer vision.


2011 ◽  
Vol 105 (6) ◽  
pp. 2891-2896 ◽  
Author(s):  
Neil G. Muggleton ◽  
Roger Kalla ◽  
Chi-Hung Juan ◽  
V. Walsh

Imaging, lesion, and transcranial magnetic stimulation (TMS) studies have implicated a number of regions of the brain in searching for a target defined by a combination of attributes. The necessity of both frontal eye fields (FEF) and posterior parietal cortex (PPC) in task performance has been shown by the application of TMS over these regions. The effects of stimulation over these two areas have, thus far, proved to be remarkably similar and the only dissociation reported being in the timing of their involvement. We tested the hypotheses that 1) FEF contributes to performance in terms of visual target detection (possibly by modulation of activity in extrastriate areas with respect to the target), and 2) PPC is involved in translation of visual information for action. We used a task where the presence (and location) of the target was indicated by an eye movement. Task disruption was seen with FEF TMS (with reduced accuracy on the task) but not with PPC stimulation. When a search task requiring a manual response was presented, disruption with PPC TMS was seen. These results show dissociation of FEF and PPC contributions to visual search performance and that PPC involvement seems to be dependent on the response required by the task, whereas this is not the case for FEF. This supports the idea of FEF involvement in visual processes in a manner that might not depend on the required response, whereas PPC seems to be involved when a manual motor response to a stimulus is required.


2012 ◽  
Vol 65 (6) ◽  
pp. 1068-1085 ◽  
Author(s):  
Gary Lupyan ◽  
Daniel Swingley

People often talk to themselves, yet very little is known about the functions of this self-directed speech. We explore effects of self-directed speech on visual processing by using a visual search task. According to the label feedback hypothesis (Lupyan, 2007a), verbal labels can change ongoing perceptual processing—for example, actually hearing “chair” compared to simply thinking about a chair can temporarily make the visual system a better “chair detector”. Participants searched for common objects, while being sometimes asked to speak the target's name aloud. Speaking facilitated search, particularly when there was a strong association between the name and the visual target. As the discrepancy between the name and the target increased, speaking began to impair performance. Together, these results speak to the power of words to modulate ongoing visual processing.


2012 ◽  
Vol 25 (0) ◽  
pp. 193
Author(s):  
Klemens Knöferle ◽  
Charles Spence

Searching for a particular product in a supermarket can be a challenging business. The question therefore arises as to whether cues from the shopper’s other senses can be used to facilitate, guide, or bias visual search toward a particular product or product type. Prior research suggests that characteristic sounds can facilitate visual object localization (Iordanescu et al., 2008, 2010). Extending these findings to an applied setting, we investigated whether product-related sounds would facilitate visual search for products from different categories (e.g., champagne, potato crisps, deodorant) when arranged on a virtual shelf. On each trial, participants were visually presented with the name of a target product and then located the target within a virtual shelf display containing pictures of four different products (randomly selected from a set of nine). The visual display was randomly accompanied by a target-congruent, a target-incongruent, an unrelated, or no sound. Congruent sounds were semantically related to the target (e.g., uncorking a champagne bottle), incongruent sounds were related to the product shown in the corner opposite to the target, and unrelated sounds did not correspond to any of the products shown in the display. Participants found the target product significantly faster when the sound was congruent rather than incongruent with the target. All other pairwise comparisons were non-significant. These results extend the facilitatory crossmodal effect of characteristic sounds on visual search performance described earlier to the more realistic context of a virtual shelf display, showing that characteristic sounds can crossmodally enhance the visual processing of actual products.


Sign in / Sign up

Export Citation Format

Share Document