scholarly journals Training for object recognition with increasing spatial frequency: A comparison of deep learning with human vision

2021 ◽  
Author(s):  
Lev Kiar Avberšek ◽  
Astrid Zeman ◽  
Hans P. Op de Beeck

AbstractThe ontogenetic development of human vision, and the real-time neural processing of visual input, both exhibit a striking similarity – a sensitivity towards spatial frequencies that progress in a coarse-to-fine manner. During early human development, sensitivity for higher spatial frequencies increases with age. In adulthood, when humans receive new visual input, low spatial frequencies are typically processed first before subsequently guiding the processing of higher spatial frequencies. We investigated to what extent this coarse-to-fine progression might impact visual representations in artificial vision and compared this to adult human representations. We simulated the coarse-to-fine progression of image processing in deep convolutional neural networks (CNNs) by gradually increasing spatial frequency information during training. We compared CNN performance, after standard and coarse-to-fine training, with a wide range of datasets from behavioural and neuroimaging experiments. In contrast to humans, CNNs that are trained using the standard protocol are very insensitive to low spatial frequency information, showing very poor performance in being able to classify such object images. By training CNNs using our coarse-to-fine method, we improved the classification accuracy of CNNs from 0% to 32% on low-pass filtered images taken from the ImageNet dataset. When comparing differently trained networks on images containing full spatial frequency information, we saw no representational differences. Overall, this integration of computational, neural, and behavioural findings shows the relevance of the exposure to and processing of input with a variation in spatial frequency content for some aspects of high-level object representations.

2013 ◽  
Vol 25 (6) ◽  
pp. 862-871 ◽  
Author(s):  
Bradford Z. Mahon ◽  
Nicholas Kumar ◽  
Jorge Almeida

It is widely argued that the ability to recognize and identify manipulable objects depends on the retrieval and simulation of action-based information associated with using those objects. Evidence for that view comes from fMRI studies that have reported differential BOLD contrast in dorsal visual stream regions when participants view manipulable objects compared with a range of baseline categories. An alternative interpretation is that processes internal to the ventral visual pathway are sufficient to support the visual identification of manipulable objects and that the retrieval of object-associated use information is contingent on analysis of the visual input by the ventral stream. Here, we sought to distinguish these two perspectives by exploiting the fact that the dorsal stream is largely driven by magnocellular input, which is biased toward low spatial frequency visual information. Thus, any tool-selective responses in parietal cortex that are driven by high spatial frequencies would be indicative of inputs from the ventral visual pathway. Participants viewed images of tools and animals containing only low, or only high, spatial frequencies during fMRI. We find an internal parcellation of left parietal “tool-preferring” voxels: Inferior aspects of left parietal cortex are driven by high spatial frequency information and have privileged connectivity with ventral stream regions that show similar category preferences, whereas superior regions are driven by low spatial frequency information. Our findings suggest that the automatic activation of complex object-associated manipulation knowledge is contingent on analysis of the visual input by the ventral visual pathway.


1998 ◽  
Vol 15 (4) ◽  
pp. 585-595 ◽  
Author(s):  
CONG YU ◽  
DENNIS M. LEVI

A psychophysical analog to cortical receptive-field end-stopping has been demonstrated previously in spatial filters tuned to a wide range of spatial frequencies (Yu & Levi, 1997a). The current study investigated tuning characteristics in psychophysical spatial filter end-stopping. When a D6 (the sixth derivative of a Gaussian) target is masked by a center mask (placed in the putative spatial filter center), two end-zone masks (placed in the filter end-zones) reduce thresholds. This “end-stopping” effect (the reduction of masking induced by end-zone masks) was measured at various spatial frequencies and orientations of end-zone masks. End-stopping reached its maximal strength when the spatial frequency and/or orientation of the end-zone masks matched the spatial frequency and/or orientation of the target and center mask, showing spatial-frequency tuning and orientation tuning. The bandwidths of spatial-frequency and orientation tuning functions decreased with increasing target spatial frequency. At larger orientation differences, however, end-zone masks induced a secondary facilitation effect, which was maximal when the spatial frequency of end-zone masks equated the target spatial frequency. This facilitation effect might be related to certain types of contour and texture perception, such as perceptual pop-out.


Perception ◽  
1973 ◽  
Vol 2 (1) ◽  
pp. 53-60 ◽  
Author(s):  
J A Movshon ◽  
C Blakemore

An adaptation method is used to determine the orientation specificity of channels sensitive to different spatial frequencies in the human visual system. Comparison between different frequencies is made possible by a data transformation in which orientational effects are expressed in terms of equivalent contrast (the contrast of a vertical grating producing the same adaptational effect as a high-contrast grating of a given orientation). It is shown that, despite great variances in the range of orientations affected by adaptation at different spatial frequencies (±10° to ±50°), the half-width at half-amplitude of the orientation channels does not vary systematically as a function of spatial frequency over the range tested (2·5 to 20 cycles deg−1). Two subjects were used and they showed significantly different orientation tuning across the range of spatial frequencies. The results are discussed with reference to previous determinations of orientation specificity, and to related psychophysical and neurophysiological phenomena.


Perception ◽  
1996 ◽  
Vol 25 (1_suppl) ◽  
pp. 162-162 ◽  
Author(s):  
T Troscianko ◽  
C A Parraga ◽  
G Brelstaff ◽  
D Carr ◽  
K Nelson

A common assumption in the study of the relationship between human vision and the visual environment is that human vision has developed in order to encode the incident information in an optimal manner. Such arguments have been used to support the 1/f dependence of scene content as a function of spatial frequency. In keeping with this assumption, we ask whether there are any important differences between the luminance and (r/g) chrominance Fourier spectra of natural scenes, the simple expectation being that the chrominance spectrum should be relatively richer in low spatial frequencies than the luminance spectrum, to correspond with the different shape of luminance and chrominance contrast sensitivity functions. We analysed a data set of 29 images of natural scenes (predominantly of vegetation at different distances) which were obtained with a hyper-spectral camera (measuring the scene through a set of 31 wavelength bands in the range 400 – 700 nm). The images were transformed to the three Smith — Pokorny cone fundamentals, and further transformed into ‘luminance’ (r+g) and ‘chrominance’ (r-g) images, with various assumptions being made about the relative weighting of the r and g components, and the form of the chrominance response. We then analysed the Fourier spectra of these images using logarithmic intervals in spatial frequency space. This allowed a determination of the total energy within each Fourier band for each of the luminance and chrominance representations. The results strongly indicate that, for the set of scenes studied here, there was no evidence of a predominance of low-spatial-frequency chrominance information. Two classes of explanation are possible: (a) that raw Fourier content may not be the main organising principle determining visual encoding of colour, and/or (b) that our scenes were atypical of what may have driven visual evolution. We present arguments in favour of both of these propositions.


Molecules ◽  
2019 ◽  
Vol 24 (6) ◽  
pp. 1018
Author(s):  
Tina Sabel

Holographic volume phase gratings are recorded in an epoxy-based, free-surface, volume holographic recording material. Light-induced gratings are formed by photo-triggered mass migration caused by component diffusion. The material resolution enables a wide range of pattern spacings, to record both transmission and reflection holograms with many different spatial frequencies. An optimum spatial frequency response is found between the low spatial frequency roll-off and the high spatial frequency cut-off. The influence of the energy density of exposure on the spatial frequency response is investigated. Secondary volume holographic gratings (parasitic gratings) are observed in the high frequency range. The possibility of distinguishing the regular grating from the secondary grating is discussed in the form of probe wavelength detuning.


Perception ◽  
1997 ◽  
Vol 26 (8) ◽  
pp. 1047-1058 ◽  
Author(s):  
Howard C Hughes ◽  
David M Aronchick ◽  
Michael D Nelson

It has previously been observed that low spatial frequencies (≤ 1.0 cycles deg−1) tend to dominate high spatial frequencies (≥ 5.0 cycles deg−1) in several types of visual-information-processing tasks. This earlier work employed reaction times as the primary performance measure and the present experiments address the possibility of low-frequency dominance by evaluating visually guided performance of a completely different response system: the control of slow-pursuit eye movements. Slow-pursuit gains (eye velocity/stimulus velocity) were obtained while observers attempted to track the motion of a sine-wave grating. The drifting gratings were presented on three types of background: a uniform background, a background consisting of a stationary grating, or a flickering background. Low-frequency dominance was evident over a wide range of velocities, in that a stationary high-frequency component produced little disruption in the pursuit of a drifting low spatial frequency, but a stationary low frequency interfered substantially with the tracking of a moving high spatial frequency. Pursuit was unaffected by temporal modulation of the background, suggesting that these effects are due to the spatial characteristics of the stationary grating. Similar asymmetries were observed with respect to the stability of fixation: active fixation was less stable in the presence of a drifting low frequency than in the presence of a drifting high frequency.


1994 ◽  
Vol 78 (1) ◽  
pp. 339-347
Author(s):  
Janet D. Larsen ◽  
Beth Anne Goldstein

The idea that low spatial-frequency information in the Mueller-Lyer figure accounts for a major part of the illusion was tested in a series of five studies. In Study 1, subjects were selectively adapted to high or low square-wave spatial-frequency gratings with no difference in the magnitude of illusion they experienced. Similarly, adaptation to sinusoidal grating patterns with either high or low spatial frequency had no effect on the magnitude of illusion experienced (Studies 2 to 5). The failure of adaptation to low spatial-frequency gratings to affect the magnitude of illusion experienced indicates either that the illusion cannot be accounted for by the low spatial-frequency information or that adaptation of the visual system by grating patterns cannot be used to explore any effects of the low spatial frequencies in the figure.


Perception ◽  
1996 ◽  
Vol 25 (1_suppl) ◽  
pp. 144-144
Author(s):  
A T Smith ◽  
T Ledgeway

We have measured perception of the direction of displacement of two-frame random-dot patterns (50% light/dark pixels) that have been spatially high-pass filtered. In the ‘standard’ condition, pairs of high-pass filtered images, identical apart from the displacement, were presented in succession. The displacement could be in either of two opposite directions and the task was to identify the direction. The ‘reverse’ condition was the same except that image contrast was inverted between the two frames. Various element sizes and filter cut-offs were used. Two distinct patterns of results were obtained. For small check sizes, performance alternated cyclically between veridical direction perception and incorrect direction perception (aliasing) as displacement size was increased over a wide range. The period of the cycle was close to one period of the lowest spatial frequency remaining in the image after filtering, ie performance was as would be expected for a grating of that spatial frequency. In the ‘reverse’ condition the cyclical psychometric functions were inverted, ie reversed-phi motion occurred. For large check sizes, and particularly for high filter cut-offs, there was no cyclical alternation of direction perception and reversed phi did not occur in the ‘reverse’ condition. The results suggest that two mechanisms are at play. In most circumstances, detection appears to be based on motion energy since the cyclical alternation is predicted by a consideration of the spatiotemporal energy of the stimulus but not by element-matching theories. But for large elements, particularly when most of the low spatial frequencies have been removed, element-matching takes over. Elements are matched without regard to their contrast polarity. The results are thus inconsistent with the single front-end filter mechanism which Morgan [1992 Nature (London)355 344] has advanced to explain performance in this type of task.


Sign in / Sign up

Export Citation Format

Share Document