scholarly journals Cortical Circuits for Top-down Control of Perceptual Grouping

2021 ◽  
Author(s):  
Maria Kon ◽  
Gregory Francis

A fundamental characteristic of human visual perception is the ability to group together disparate elements in a scene and treat them as a single unit. The mechanisms by which humans create such groupings remain unknown, but grouping seems to play an important role in a wide variety of visual phenomena, and a good understanding of these mechanisms might provide guidance for how to improve machine vision algorithms. Here, we build on a proposal that some groupings are the result of connections in cortical area V2 that join disparate elements, thereby allowing them to be selected and segmented together. In previous instantiations of this proposal, connection formation was based on the anatomy (e.g., extent) of receptive fields, which made connection formation obligatory when the stimulus conditions stimulate the corresponding receptive fields. We now propose dynamic circuits that provide greater flexibility in the formation of connections and that allow for top-down control of perceptual grouping. With computer simulations we explain how the circuits work and show how they can account for a wide variety of Gestalt principles of perceptual grouping and two texture segmentation tasks. We propose that human observers use such top-down control to implement task-dependent connection strategies that encourage particular groupings of stimulus elements in order to promote performance on various kinds of visual tasks.

2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


2021 ◽  
Vol 2091 (1) ◽  
pp. 012027
Author(s):  
V E Antsiperov ◽  
V A Kershner

Abstract The paper is devoted to the development of a new method for presenting biomedical images based on local characteristics of the intensity of their shape. The proposed method of image processing is focused on images that have low indicators of the intensity of the recorded radiation, resolution, contrast and signal-to-noise ratio. The method is based on the principles of machine (Bayesian) learning and on samples of random photo reports. This paper presents the results of the method and its connection with modern approaches in the field of image processing.


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Thomas SA Wallis ◽  
Christina M Funke ◽  
Alexander S Ecker ◽  
Leon A Gatys ◽  
Felix A Wichmann ◽  
...  

We subjectively perceive our visual field with high fidelity, yet peripheral distortions can go unnoticed and peripheral objects can be difficult to identify (crowding). Prior work showed that humans could not discriminate images synthesised to match the responses of a mid-level ventral visual stream model when information was averaged in receptive fields with a scaling of about half their retinal eccentricity. This result implicated ventral visual area V2, approximated ‘Bouma’s Law’ of crowding, and has subsequently been interpreted as a link between crowding zones, receptive field scaling, and our perceptual experience. However, this experiment never assessed natural images. We find that humans can easily discriminate real and model-generated images at V2 scaling, requiring scales at least as small as V1 receptive fields to generate metamers. We speculate that explaining why scenes look as they do may require incorporating segmentation and global organisational constraints in addition to local pooling.


i-Perception ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 204166952093840
Author(s):  
Li Zhaoping

Consider a gray field comprising pairs of vertically aligned dots; in each pair, one dot is white the other black. When viewed in a peripheral visual field, these pairs appear horizontally aligned. By the Central-Peripheral Dichotomy, this flip tilt illusion arises because top-down feedback from higher to lower visual cortical areas is too weak or absent in the periphery to veto confounded feedforward signals from the primary visual cortex (V1). The white and black dots in each pair activate, respectively, on and off subfields of V1 neural receptive fields. However, the sub-fields’ orientations, and the preferred orientations, of the most activated neurons are orthogonal to the dot alignment. Hence, V1 reports the flip tilt to higher visual areas. Top-down feedback vetoes such misleading reports, but only in the central visual field.


eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Luis Carlos Garcia del Molino ◽  
Guangyu Robert Yang ◽  
Jorge F Mejias ◽  
Xiao-Jing Wang

Pyramidal cells and interneurons expressing parvalbumin (PV), somatostatin (SST), and vasoactive intestinal peptide (VIP) show cell-type-specific connectivity patterns leading to a canonical microcircuit across cortex. Experiments recording from this circuit often report counterintuitive and seemingly contradictory findings. For example, the response of SST cells in mouse V1 to top-down behavioral modulation can change its sign when the visual input changes, a phenomenon that we call response reversal. We developed a theoretical framework to explain these seemingly contradictory effects as emerging phenomena in circuits with two key features: interactions between multiple neural populations and a nonlinear neuronal input-output relationship. Furthermore, we built a cortical circuit model which reproduces counterintuitive dynamics observed in mouse V1. Our analytical calculations pinpoint connection properties critical to response reversal, and predict additional novel types of complex dynamics that could be tested in future experiments.


2015 ◽  
Vol 33 (1) ◽  
pp. 110-128 ◽  
Author(s):  
Zuzana Cenkerová ◽  
Richard Parncutt

In theories of auditory scene analysis and melodic implication/realization, melodic expectation results from an interaction between top-down processes (assumed to be learned and schema-based) and bottom-up processes (assumed innate, based on Gestalt principles). If principles of melodic expectation are partly acquired, it should be possible to manipulate them – to condition listeners' expectations. In this study, the resistance of three bottom-up expectation principles to learning was tested experimentally. In Experiment 1, expectations for stepwise motion (pitch proximity) were manipulated by conditioning listeners to large melodic leaps; preference for small intervals was reduced after a brief exposure. In Experiment 2, expectations for leaps to rise and steps to fall (step declination) were manipulated by exposing listeners to melodies comprising rising steps and falling leaps; this reduced preferences for descending seconds and thirds. Experiment 3 did not find and hence failed to alter the expectation for small intervals to be followed by an interval in the same direction (step inertia). The results support the theory that bottom-up principles of melodic perception are partly learned from exposure to pitch patterns in music. The long-term learning process could be reinforced by exposure to speech based on similar organization principles.


Author(s):  
Katarzyna Kordecka ◽  
Andrzej T. Foik ◽  
Agnieszka Wierzbicka ◽  
Wioletta J. Waleszczyk

AbstractRepetitive visual stimulation is successfully used in a study on the visual evoked potential (VEP) plasticity in the visual system in mammals. Practicing visual tasks or repeated exposure to sensory stimuli can induce neuronal network changes in the cortical circuits and improve the perception of these stimuli. However little is known about the effect of visual training at the subcortical level. In the present study, we extend the knowledge showing positive results of this training in the rat’s superior colliculus (SC). In electrophysiological experiments, we showed that a single training session lasting several hours induces a response enhancement both in the primary visual cortex (V1) and in the SC. Further, we tested if collicular responses will be enhanced without V1 input. For this reason, we inactivated the V1 by applying xylocaine solution onto the cortical surface during visual training. Our results revealed that SC’s response enhancement was present even without V1 inputs and showed no difference in amplitude comparing to VEPs enhancement while the V1 was active. These data suggest that the visual system plasticity and facilitation can develop independently but simultaneously in different parts of the visual system.


Author(s):  
Anna C. (Kia) Nobre ◽  
M-Marsel Mesulam

Selective attention is essential for all aspects of cognition. Using the paradigmatic case of visual spatial attention, we present a theoretical account proposing the flexible control of attention through coordinated activity across a large-scale network of brain areas. It reviews evidence supporting top-down control of visual spatial attention by a distributed network, and describes principles emerging from a network approach. Stepping beyond the paradigm of visual spatial attention, we consider attentional control mechanisms more broadly. The chapter suggests that top-down biasing mechanisms originate from multiple sources and can be of several types, carrying information about receptive-field properties such as spatial locations or features of items; but also carrying information about properties that are not easily mapped onto receptive fields, such as the meanings or timings of items. The chapter considers how selective biases can operate on multiple slates of information processing, not restricted to the immediate sensory-motor stream, but also operating within internalized, short-term and long-term memory representations. Selective attention appears to be a general property of information processing systems rather than an independent domain within our cognitive make-up.


2014 ◽  
Vol 112 (6) ◽  
pp. 1421-1438 ◽  
Author(s):  
A. N. J. Pietersen ◽  
S. K. Cheong ◽  
S. G. Solomon ◽  
C. Tailby ◽  
P. R. Martin

Visual perception requires integrating signals arriving at different times from parallel visual streams. For example, signals carried on the phasic-magnocellular (MC) pathway reach the cerebral cortex pathways some tens of milliseconds before signals traveling on the tonic-parvocellular (PC) pathway. Visual latencies of cells in the koniocellular (KC) pathway have not been specifically studied in simian primates. Here we compared MC and PC cells to “blue-on” (BON) and “blue-off” (BOF) KC cells; these cells carry visual signals originating in short-wavelength-sensitive (S) cones. We made extracellular recordings in the lateral geniculate nucleus (LGN) of anesthetized marmosets. We found that BON visual latencies are 10–20 ms longer than those of PC or MC cells. A small number of recorded BOF cells ( n = 7) had latencies 10–20 ms longer than those of BON cells. Within all cell groups, latencies of foveal receptive fields (<10° eccentricity) were longer (by 3–8 ms) than latencies of peripheral receptive fields (>10°). Latencies of yellow-off inputs to BON cells lagged the blue-on inputs by up to 30 ms, but no differences in visual latency were seen on comparing marmosets expressing dichromatic (“red-green color-blind”) or trichromatic color vision phenotype. We conclude that S-cone signals leaving the LGN on KC pathways are delayed with respect to signals traveling on PC and MC pathways. Cortical circuits serving color vision must therefore integrate across delays in (red-green) chromatic signals carried by PC cells and (blue-yellow) signals carried by KC cells.


Sign in / Sign up

Export Citation Format

Share Document