Efficient coding of natural scenes improves neural system identification

Neural system identification aims at learning the response function of neurons to arbitrary stimuli using experimentally recorded data, but typically does not leverage coding principles such as efficient coding of natural environments. Visual systems, however, have evolved to efficiently process input from the natural environment. Here, we present a normative network regularization for system identification models by incorporating, as a regularizer, the efficient coding hypothesis, which states that neural response properties of sensory representations are strongly shaped by the need to preserve most of the stimulus information with limited resources. Using this approach, we explored if a system identification model can be improved by sharing its convolutional filters with those of an autoencoder which aims to efficiently encode natural stimuli. To this end, we built a hybrid model to predict the responses of retinal neurons to noise stimuli. This approach did not only yield a higher performance than the stand-alone system identification model, it also produced more biologically-plausible filters. We found these results to be consistent for retinal responses to different stimuli and across model architectures. Moreover, our normatively regularized model performed particularly well in predicting responses of direction-of-motion sensitive retinal neurons. In summary, our results support the hypothesis that efficiently encoding environmental inputs can improve system identification models of early visual processing.

Download Full-text

Summation of perceptual cues in natural visual scenes

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2008.0692 ◽

2008 ◽

Vol 275 (1649) ◽

pp. 2299-2308 ◽

Cited By ~ 16

Author(s):

M To ◽

P.G Lovell ◽

T Troscianko ◽

D.J Tolhurst

Keyword(s):

Visual Processing ◽

Neural System ◽

Stimulus Dimension ◽

Two Dimensions ◽

Natural Scenes ◽

Perceptual Cues ◽

Combination Rules ◽

Threshold Detection ◽

Visual Scenes ◽

Image Pairs

Natural visual scenes are rich in information, and any neural system analysing them must piece together the many messages from large arrays of diverse feature detectors. It is known how threshold detection of compound visual stimuli (sinusoidal gratings) is determined by their components' thresholds. We investigate whether similar combination rules apply to the perception of the complex and suprathreshold visual elements in naturalistic visual images. Observers gave magnitude estimations (ratings) of the perceived differences between pairs of images made from photographs of natural scenes. Images in some pairs differed along one stimulus dimension such as object colour, location, size or blur. But, for other image pairs, there were composite differences along two dimensions (e.g. both colour and object-location might change). We examined whether the ratings for such composite pairs could be predicted from the two ratings for the respective pairs in which only one stimulus dimension had changed. We found a pooling relationship similar to that proposed for simple stimuli: Minkowski summation with exponent 2.84 yielded the best predictive power ( r =0.96), an exponent similar to that generally reported for compound grating detection. This suggests that theories based on detecting simple stimuli can encompass visual processing of complex, suprathreshold stimuli.

Download Full-text

Testing pseudo-linear models of responses to natural scenes in primate retina

10.1101/045336 ◽

2016 ◽

Cited By ~ 29

Author(s):

Alexander Heitman ◽

Nora Brackbill ◽

Martin Greschner ◽

Alexander Sher ◽

Alan M. Litke ◽

...

Keyword(s):

Linear Model ◽

Visual Processing ◽

Large Scale ◽

Linear Models ◽

Ganglion Cells ◽

Visual Stimuli ◽

Natural Scenes ◽

Natural Environments ◽

Linear Filtering ◽

Fixational Eye Movements

A central goal of systems neuroscience is to develop accurate quantitative models of how neural circuits process information. Prevalent models of light response in retinal ganglion cells (RGCs) usually begin with linear filtering over space and time, which reduces the high-dimensional visual stimulus to a simpler and more tractable scalar function of time that in turn determines the model output. Although these pseudo-linear models can accurately replicate RGC responses to stochastic stimuli, it is unclear whether the strong linearity assumption captures the function of the retina in the natural environment. This paper tests how accurately one pseudo-linear model, the generalized linear model (GLM), explains the responses of primate RGCs to naturalistic visual stimuli. Light responses from macaque RGCs were obtained using large-scale multi-electrode recordings, and two major cell types, ON and OFF parasol, were examined. Visual stimuli consisted of images of natural environments with simulated saccadic and fixational eye movements. The GLM accurately reproduced RGC responses to white noise stimuli, as observed previously, but did not generalize to predict RGC responses to naturalistic stimuli. It also failed to capture RGC responses when fitted and tested with naturalistic stimuli alone. Fitted scalar nonlinearities before and after the linear filtering stage were insufficient to correct the failures. These findings suggest that retinal signaling under natural conditions cannot be captured by models that begin with linear filtering, and emphasize the importance of additional spatial nonlinearities, gain control, and/or peripheral effects in the first stage of visual processing.

Download Full-text

Neural system identification model of human sound localization

The Journal of the Acoustical Society of America ◽

10.1121/1.1288411 ◽

2000 ◽

Vol 108 (3) ◽

pp. 1215 ◽

Cited By ~ 11

Author(s):

Craig Jin ◽

Markus Schenkel ◽

Simon Carlile

Keyword(s):

System Identification ◽

Sound Localization ◽

Neural System ◽

Identification Model

Download Full-text

Time without clocks: Human time perception based on perceptual classification

10.1101/172387 ◽

2017 ◽

Cited By ~ 1

Author(s):

Warrick Roseboom ◽

Zafeirios Fountas ◽

Kyriacos Nikiforou ◽

David Bhowmik ◽

Murray Shanahan ◽

...

Keyword(s):

Human Brain ◽

Image Classification ◽

Time Perception ◽

Visual Processing ◽

Neural System ◽

Natural Scenes ◽

Network Activation ◽

System Input ◽

Fundamental Dimension ◽

Perception Of Time

Despite being a fundamental dimension of experience, how the human brain generates the perception of time remains unknown. Here, we provide a novel explanation for how human time perception might be accomplished, based on non-temporal perceptual clas-sification processes. To demonstrate this proposal, we built an artificial neural system centred on a feed-forward image classification network, functionally similar to human visual processing. In this system, input videos of natural scenes drive changes in network activation, and accumulation of salient changes in activation are used to estimate duration. Estimates produced by this system match human reports made about the same videos, replicating key qualitative biases, including differentiating between scenes of walking around a busy city or sitting in a cafe or office. Our approach provides a working model of duration perception from stimulus to estimation and presents a new direction for examining the foundations of this central aspect of human experience.

Download Full-text

The emergence of multiple retinal cell types through efficient coding of natural movies

10.1101/458737 ◽

2018 ◽

Cited By ~ 5

Author(s):

Samuel A. Ocko ◽

Jack Lindsey ◽

Surya Ganguli ◽

Stephane Deny

Keyword(s):

Visual Processing ◽

Visual Information ◽

Ganglion Cells ◽

Receptive Fields ◽

Cell Types ◽

Retinal Cell ◽

Retinal Eccentricity ◽

Natural Scenes ◽

Efficient Coding ◽

Cell Densities

AbstractOne of the most striking aspects of early visual processing in the retina is the immediate parcellation of visual information into multiple parallel pathways, formed by different retinal ganglion cell types each tiling the entire visual field. Existing theories of efficient coding have been unable to account for the functional advantages of such cell-type diversity in encoding natural scenes. Here we go beyond previous theories to analyze how a simple linear retinal encoding model with different convolutional cell types efficiently encodes naturalistic spatiotemporal movies given a fixed firing rate budget. We find that optimizing the receptive fields and cell densities of two cell types makes them match the properties of the two main cell types in the primate retina, midget and parasol cells, in terms of spatial and temporal sensitivity, cell spacing, and their relative ratio. Moreover, our theory gives a precise account of how the ratio of midget to parasol cells decreases with retinal eccentricity. Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types. Thus our work provides a theoretical justification, based on the efficient coding of natural movies, for the existence of the four most dominant cell types in the primate retina that together comprise 70% of all ganglion cells.

Download Full-text

The threshold of binocularity: natural image statistics explain the reduction of visual acuity in peripheral vision

10.1101/131177 ◽

2017 ◽

Author(s):

David W. Hunter ◽

Paul B. Hibbard

Keyword(s):

Visual Acuity ◽

Visual Cortex ◽

Visual Processing ◽

Visual Information ◽

Peripheral Vision ◽

Visual Performance ◽

Natural Image ◽

Natural Scenes ◽

Natural Image Statistics ◽

Efficient Coding

AbstractVisual acuity is greatest in the centre of the visual field, peaking in the fovea and degrading significantly towards the periphery. The rate of decay of visual performance with eccentricity depends strongly on the stimuli and task used in measurement. While detailed measures of this decay have been made across a broad range of tasks, a comprehensive theoretical account of this phenomenon is lacking. We demonstrate that the decay in visual performance can be attributed to the efficient encoding of binocular information in natural scenes. The efficient coding hypothesis holds that the early stages of visual processing attempt to form an efficient coding of ecologically valid stimuli. Using Independent Component Analysis to learn an efficient coding of stereoscopic images, we show that the ratio of binocular to monocular components varied with eccentricity at the same rate as human stereo acuity and Vernier acuity. Our results demonstrate that the organisation of the visual cortex is dependent on the underlying statistics of binocular scenes and, strikingly, that monocular acuity depends on the mechanisms by which the visual cortex processes binocular information. This result has important theoretical implications for understanding the encoding of visual information in the brain.

Download Full-text

Generalist camouflage can be more successful than microhabitat specialisation in natural environments

BMC Ecology and Evolution ◽

10.1186/s12862-021-01883-w ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Emmanuelle Sophie Briolat ◽

Lina María Arenas ◽

Anna E. Hughes ◽

Eric Liggins ◽

Martin Stevens

Keyword(s):

Survival Benefit ◽

Search Task ◽

Natural Scenes ◽

Science Methods ◽

Natural Environments ◽

Background Matching ◽

Computer Based ◽

Visual Scenes ◽

Small Targets ◽

Predator Defence

Abstract Background Crypsis by background-matching is a critical form of anti-predator defence for animals exposed to visual predators, but achieving effective camouflage in patchy and variable natural environments is not straightforward. To cope with heterogeneous backgrounds, animals could either specialise on particular microhabitat patches, appearing cryptic in some areas but mismatching others, or adopt a compromise strategy, providing partial matching across different patch types. Existing studies have tested the effectiveness of compromise strategies in only a limited set of circumstances, primarily with small targets varying in pattern, and usually in screen-based tasks. Here, we measured the detection risk associated with different background-matching strategies for relatively large targets, with human observers searching for them in natural scenes, and focusing on colour. Model prey were designed to either ‘specialise’ on the colour of common microhabitat patches, or ‘generalise’ by matching the average colour of the whole visual scenes. Results In both the field and an equivalent online computer-based search task, targets adopting the generalist strategy were more successful in evading detection than those matching microhabitat patches. This advantage occurred because, across all possible locations in these experiments, targets were typically viewed against a patchwork of different microhabitat areas; the putatively generalist targets were thus more similar on average to their various immediate surroundings than were the specialists. Conclusions Demonstrating close agreement between the results of field and online search experiments provides useful validation of online citizen science methods commonly used to test principles of camouflage, at least for human observers. In finding a survival benefit to matching the average colour of the visual scenes in our chosen environment, our results highlight the importance of relative scales in determining optimal camouflage strategies, and suggest how compromise coloration can succeed in nature.

Download Full-text

Independence of color and luminance edges in natural scenes

Visual Neuroscience ◽

10.1017/s0952523808080796 ◽

2009 ◽

Vol 26 (1) ◽

pp. 35-49 ◽

Cited By ~ 60

Author(s):

THORSTEN HANSEN ◽

KARL R. GEGENFURTNER

Keyword(s):

Visual Processing ◽

Artificial Vision ◽

Natural Scenes ◽

Natural Scene ◽

Natural Scene Statistics ◽

Form Vision ◽

Independent Information ◽

Edge Contrast ◽

Source Of Information ◽

Gain Access

AbstractForm vision is traditionally regarded as processing primarily achromatic information. Previous investigations into the statistics of color and luminance in natural scenes have claimed that luminance and chromatic edges are not independent of each other and that any chromatic edge most likely occurs together with a luminance edge of similar strength. Here we computed the joint statistics of luminance and chromatic edges in over 700 calibrated color images from natural scenes. We found that isoluminant edges exist in natural scenes and were not rarer than pure luminance edges. Most edges combined luminance and chromatic information but to varying degrees such that luminance and chromatic edges were statistically independent of each other. Independence increased along successive stages of visual processing from cones via postreceptoral color-opponent channels to edges. The results show that chromatic edge contrast is an independent source of information that can be linearly combined with other cues for the proper segmentation of objects in natural and artificial vision systems. Color vision may have evolved in response to the natural scene statistics to gain access to this independent information.

Download Full-text

The dynamic neural code of the retina for natural scenes

10.1101/340943 ◽

2018 ◽

Cited By ~ 5

Author(s):

Niru Maheswaranathan ◽

Lane T. McIntosh ◽

Hidenori Tanaka ◽

Satchel Grant ◽

David B. Kastner ◽

...

Keyword(s):

Visual Processing ◽

Ganglion Cells ◽

Predictive Coding ◽

Neural Code ◽

Natural Scenes ◽

Large Set ◽

Fundamental Limits ◽

Fundamental Goal ◽

Latent Effects ◽

Highly Correlated

AbstractUnderstanding how the visual system encodes natural scenes is a fundamental goal of sensory neuroscience. We show here that a three-layer network model predicts the retinal response to natural scenes with an accuracy nearing the fundamental limits of predictability. The model’s internal structure is interpretable, in that model units are highly correlated with interneurons recorded separately and not used to fit the model. We further show the ethological relevance to natural visual processing of a diverse set of phenomena of complex motion encoding, adaptation and predictive coding. Our analysis uncovers a fast timescale of visual processing that is inaccessible directly from experimental data, showing unexpectedly that ganglion cells signal in distinct modes by rapidly (< 0.1 s) switching their selectivity for direction of motion, orientation, location and the sign of intensity. A new approach that decomposes ganglion cell responses into the contribution of interneurons reveals how the latent effects of parallel retinal circuits generate the response to any possible stimulus. These results reveal extremely flexible and rapid dynamics of the retinal code for natural visual stimuli, explaining the need for a large set of interneuron pathways to generate the dynamic neural code for natural scenes.

Download Full-text

Depth in convolutional neural networks solves scene segmentation

10.1101/2019.12.16.877753 ◽

2019 ◽

Cited By ~ 1

Author(s):

N Seijdel ◽

N Tsakmakidis ◽

EHF De Haan ◽

SM Bohte ◽

HS Scholte

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Visual Processing ◽

Human Performance ◽

Object Identification ◽

Image Features ◽

Background Information ◽

Natural Scenes ◽

Scene Segmentation ◽

Deep Convolutional Neural Networks

AbstractFeedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations (‘routines’) that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or “binding” features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.

Download Full-text