scholarly journals Immersion in Modified Reality: Sensitivity to the Slope (α) of the Amplitude Spectrum is Dependent on the α of Recently Viewed Environments

2021 ◽  
Author(s):  
Bruno Richard ◽  
Patrick Shafto

Scenes contain many statistical regularities that, if accounted for by the visual system, could greatly benefit visual processing. One such statistic to consider is the orientation-averaged slope (α) of the amplitude spectrum of natural scenes. Human observers are differently sensitive to αs, and they may utilize this statistic when processing natural scenes. Here, we explore whether discrimination sensitivity to α is associated with the recently viewed environment. Observers were immersed, using a Head-Mounted Display, in an environment that was either unaltered or had its average α steepened or shallowed. Discrimination thresholds were affected by the average shift in α: a steeper environment decreased thresholds for very steep reference αs while a shallower environment decreased thresholds for shallow values. We modelled these data with a Bayesian observer model and explored how different prior shapes may influence the ability of the model to fit observer thresholds. We explore three potential prior shapes: unimodal, bimodal and trimodal modified-PERT distributions and found the bimodal prior to best-capture observer thresholds for all experimental conditions. Notably, the prior modes' position was shifted following adaptation, which suggests that a priori expectations for α are sufficiently malleable to account for changes in the average α of the recently viewed scenes.

2019 ◽  
Author(s):  
Jack Lindsey ◽  
Samuel A. Ocko ◽  
Surya Ganguli ◽  
Stephane Deny

AbstractThe vertebrate visual system is hierarchically organized to process visual information in successive stages. Neural representations vary drastically across the first stages of visual processing: at the output of the retina, ganglion cell receptive fields (RFs) exhibit a clear antagonistic center-surround structure, whereas in the primary visual cortex (V1), typical RFs are sharply tuned to a precise orientation. There is currently no unified theory explaining these differences in representations across layers. Here, using a deep convolutional neural network trained on image recognition as a model of the visual system, we show that such differences in representation can emerge as a direct consequence of different neural resource constraints on the retinal and cortical networks, and for the first time we find a single model from which both geometries spontaneously emerge at the appropriate stages of visual processing. The key constraint is a reduced number of neurons at the retinal output, consistent with the anatomy of the optic nerve as a stringent bottleneck. Second, we find that, for simple downstream cortical networks, visual representations at the retinal output emerge as nonlinear and lossy feature detectors, whereas they emerge as linear and faithful encoders of the visual scene for more complex cortical networks. This result predicts that the retinas of small vertebrates (e.g. salamander, frog) should perform sophisticated nonlinear computations, extracting features directly relevant to behavior, whereas retinas of large animals such as primates should mostly encode the visual scene linearly and respond to a much broader range of stimuli. These predictions could reconcile the two seemingly incompatible views of the retina as either performing feature extraction or efficient coding of natural scenes, by suggesting that all vertebrates lie on a spectrum between these two objectives, depending on the degree of neural resources allocated to their visual system.


2020 ◽  
Author(s):  
Zeynep Başgöze ◽  
David N. White ◽  
Johannes Burge ◽  
Emily A. Cooper

AbstractBinocular fusion relies on matching points in the two eyes that correspond to the same physical feature in the world. However, not all world features are binocularly visible. In particular, at depth edges parts of a scene are often visible to only one eye (so-called half occlusions). Accurate detection of these monocularly visible regions is likely to be important for stable visual perception. If monocular regions are not detected as such, the visual system may attempt to binocularly fuse non-corresponding points, which can result in unstable percepts. We investigated the hypothesis that the visual system capitalizes upon statistical regularities associated with depth edges in natural scenes to aid binocular fusion and facilitate perceptual stability. By sampling from a large set of stereoscopic natural image patches, we found evidence that monocularly visible regions near depth edges in natural scenes tend to have features more visually similar to the adjacent binocularly visible background region than to the adjacent binocularly visible foreground. The generality of these results was supported by a parametric study of three-dimensional (3D) viewing geometry in simulated environments. In two perceptual experiments, we examined if this statistical regularity may be leveraged by the visual system. The results show that perception tended to be more stable when the visual properties of the depth edge were statistically more likely. Exploiting regularities in natural environments may allow the visual system to facilitate fusion and perceptual stability of natural scenes when both binocular and monocular regions are visible.PrecisWe report an analysis of natural scenes and two perceptual studies aimed at understanding how the visual statistics of depth edges impact perceptual stability. Our results suggest that the visual system exploits natural scene regularities to aid binocular fusion and facilitate perceptual stability.


Perception ◽  
1997 ◽  
Vol 26 (9) ◽  
pp. 1089-1100 ◽  
Author(s):  
Nuala Brady

In natural scenes and other broadband images, spatial variations in luminance occur at a range of scales or frequencies. It is generally agreed that the visual image is initially represented by the activity of separate frequency-tuned channels, and this notion is supported by physiological evidence for a stage of multi-resolution filtering in early visual processing. The question whether these channels can be accessed as independent sources of information in the normal course of events is a more contentious one. In the psychophysical study of both motion and spatial vision, there are examples of tasks in which fine-scale structure dominates perception or performance and obscures information at coarser scales. It is argued here that one important factor determining the relative salience of information from different spatial scales in broadband images is the distribution of response activity across spatial channels. The special case of natural scenes that have characteristic ‘scale-invariant’ power spectra in which image contrast is roughly constant in equal octave frequency bands is considered. A review is presented of evidence which suggests that the sensitivity of frequency-tuned filters in the visual system is matched to this image statistic, so that, on average, different channels respond with equal activity to natural scenes. Under these conditions, the visual system does appear to have independent access to information at different spatial scales and spatial scale interactions are not apparent.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yue Wang ◽  
Jianpu Yan ◽  
Zhongliang Yin ◽  
Shenghan Ren ◽  
Minghao Dong ◽  
...  

Visual processing refers to the process of perceiving, analyzing, synthesizing, manipulating, transforming, and thinking of visual objects. It is modulated by both stimulus-driven and goal-directed factors and manifested in neural activities that extend from visual cortex to high-level cognitive areas. Extensive body of studies have investigated the neural mechanisms of visual object processing using synthetic or curated visual stimuli. However, synthetic or curated images generally do not accurately reflect the semantic links between objects and their backgrounds, and previous studies have not provided answers to the question of how the native background affects visual target detection. The current study bridged this gap by constructing a stimulus set of natural scenes with two levels of complexity and modulating participants' attention to actively or passively attend to the background contents. Behaviorally, the decision time was elongated when the background was complex or when the participants' attention was distracted from the detection task, and the object detection accuracy was decreased when the background was complex. The results of event-related potentials (ERP) analysis explicated the effects of scene complexity and attentional state on the brain responses in occipital and centro-parietal areas, which were suggested to be associated with varied attentional cueing and sensory evidence accumulation effects in different experimental conditions. Our results implied that efficient visual processing of real-world objects may involve a competition process between context and distractors that co-exist in the native background, and extensive attentional cues and fine-grained but semantically irrelevant scene information were perhaps detrimental to real-world object detection.


2021 ◽  
pp. 096372142199033
Author(s):  
Katherine R. Storrs ◽  
Roland W. Fleming

One of the deepest insights in neuroscience is that sensory encoding should take advantage of statistical regularities. Humans’ visual experience contains many redundancies: Scenes mostly stay the same from moment to moment, and nearby image locations usually have similar colors. A visual system that knows which regularities shape natural images can exploit them to encode scenes compactly or guess what will happen next. Although these principles have been appreciated for more than 60 years, until recently it has been possible to convert them into explicit models only for the earliest stages of visual processing. But recent advances in unsupervised deep learning have changed that. Neural networks can be taught to compress images or make predictions in space or time. In the process, they learn the statistical regularities that structure images, which in turn often reflect physical objects and processes in the outside world. The astonishing accomplishments of unsupervised deep learning reaffirm the importance of learning statistical regularities for sensory coding and provide a coherent framework for how knowledge of the outside world gets into visual cortex.


1992 ◽  
Vol 4 (4) ◽  
pp. 345-351 ◽  
Author(s):  
Anna Berti ◽  
Giacomo Rizzolatti

Can visual processing be carried out without visual awareness of the presented objects? In the present study we addressed this problem in patients with severe unilateral neglect. The patients were required to respond as fast as possible to target stimuli (pictures of animals and fruits) presented to the normal field by pressing one of the two keys according to the category of the targets. We then studied the influence of priming stimuli, again pictures of animals or fruits, presented to the neglected field on the responses to targets. By combining different pairs of primes and targets, three different experimental conditions were obtained. In the first condition, "Highly congruent," the target and prime stimuli belonged to the same category and were physically identical; in the second condition, "Congruent," the stimuli represented two elements of the same category but were physically dissimilar; in the third condition, "Noncongruent," the stimuli represented one exemplar from each of the two categories of stimuli. The results showed that the responses were facilitated not only in the Highly congruent condition, but also in the Congruent one. This finding suggests that patients with neglect are able to process stimuli presented to the neglected field to a categorical level of representation even when they deny the stimulus presence in the affected field. The implications of this finding for psychological and physiological theory of neglect and visual cognition are discussed.


2008 ◽  
Vol 275 (1649) ◽  
pp. 2299-2308 ◽  
Author(s):  
M To ◽  
P.G Lovell ◽  
T Troscianko ◽  
D.J Tolhurst

Natural visual scenes are rich in information, and any neural system analysing them must piece together the many messages from large arrays of diverse feature detectors. It is known how threshold detection of compound visual stimuli (sinusoidal gratings) is determined by their components' thresholds. We investigate whether similar combination rules apply to the perception of the complex and suprathreshold visual elements in naturalistic visual images. Observers gave magnitude estimations (ratings) of the perceived differences between pairs of images made from photographs of natural scenes. Images in some pairs differed along one stimulus dimension such as object colour, location, size or blur. But, for other image pairs, there were composite differences along two dimensions (e.g. both colour and object-location might change). We examined whether the ratings for such composite pairs could be predicted from the two ratings for the respective pairs in which only one stimulus dimension had changed. We found a pooling relationship similar to that proposed for simple stimuli: Minkowski summation with exponent 2.84 yielded the best predictive power ( r =0.96), an exponent similar to that generally reported for compound grating detection. This suggests that theories based on detecting simple stimuli can encompass visual processing of complex, suprathreshold stimuli.


1980 ◽  
Vol 89 (3_suppl) ◽  
pp. 178-184 ◽  
Author(s):  
Krishna G. Murti ◽  
Erdem I. Cantekin ◽  
Richard M. Stern ◽  
Charles D. Bluestone

New measurements of acoustical transmission through the eustachian tube (ET) have been obtained in a series of experiments directed toward the development of a clinical instrument to assess ET function behind an intact tympanic membrane (TM). Using a sound conduction method, a sound source was placed in one nostril, and the acoustical energy that was transmitted through the ET was measured by a microphone placed in the ear canal. The present study used a broadband noise as the acoustical stimulus, in contrast to the tonal stimuli employed in previous investigations. This stimulus was chosen because it is believed to reduce the variability in the data due to intersubject differences in the acoustics of the nasopharynx and ET, and to avoid any a priori assumptions concerning the specific frequencies that would be of greatest diagnostic significance. Averaged spectra of the sound transmitted to the ear canal were obtained for three experimental conditions: acoustical source present during subject swallowing, source present with no swallowing, and subject swallowing with source absent. A Bayesian classification scheme based on the statistics of these spectra was used in classifying subjects into one of two possible categories, normal and abnormal ET function. A comparison was made between sonometric classification and classification based on a tympanometric ET function test. Correlation between the two methods was 87.1%.


2009 ◽  
Vol 26 (1) ◽  
pp. 35-49 ◽  
Author(s):  
THORSTEN HANSEN ◽  
KARL R. GEGENFURTNER

AbstractForm vision is traditionally regarded as processing primarily achromatic information. Previous investigations into the statistics of color and luminance in natural scenes have claimed that luminance and chromatic edges are not independent of each other and that any chromatic edge most likely occurs together with a luminance edge of similar strength. Here we computed the joint statistics of luminance and chromatic edges in over 700 calibrated color images from natural scenes. We found that isoluminant edges exist in natural scenes and were not rarer than pure luminance edges. Most edges combined luminance and chromatic information but to varying degrees such that luminance and chromatic edges were statistically independent of each other. Independence increased along successive stages of visual processing from cones via postreceptoral color-opponent channels to edges. The results show that chromatic edge contrast is an independent source of information that can be linearly combined with other cues for the proper segmentation of objects in natural and artificial vision systems. Color vision may have evolved in response to the natural scene statistics to gain access to this independent information.


2017 ◽  
Vol 117 (1) ◽  
pp. 388-402 ◽  
Author(s):  
Michael A. Cohen ◽  
George A. Alvarez ◽  
Ken Nakayama ◽  
Talia Konkle

Visual search is a ubiquitous visual behavior, and efficient search is essential for survival. Different cognitive models have explained the speed and accuracy of search based either on the dynamics of attention or on similarity of item representations. Here, we examined the extent to which performance on a visual search task can be predicted from the stable representational architecture of the visual system, independent of attentional dynamics. Participants performed a visual search task with 28 conditions reflecting different pairs of categories (e.g., searching for a face among cars, body among hammers, etc.). The time it took participants to find the target item varied as a function of category combination. In a separate group of participants, we measured the neural responses to these object categories when items were presented in isolation. Using representational similarity analysis, we then examined whether the similarity of neural responses across different subdivisions of the visual system had the requisite structure needed to predict visual search performance. Overall, we found strong brain/behavior correlations across most of the higher-level visual system, including both the ventral and dorsal pathways when considering both macroscale sectors as well as smaller mesoscale regions. These results suggest that visual search for real-world object categories is well predicted by the stable, task-independent architecture of the visual system. NEW & NOTEWORTHY Here, we ask which neural regions have neural response patterns that correlate with behavioral performance in a visual processing task. We found that the representational structure across all of high-level visual cortex has the requisite structure to predict behavior. Furthermore, when directly comparing different neural regions, we found that they all had highly similar category-level representational structures. These results point to a ubiquitous and uniform representational structure in high-level visual cortex underlying visual object processing.


Sign in / Sign up

Export Citation Format

Share Document