2-D Affine Transformations Cannot Account for Human 3-D Object Classification

Perception ◽  
1997 ◽  
Vol 26 (1_suppl) ◽  
pp. 241-241
Author(s):  
Z Liu

Converging evidence in object recognition has shown that the performance of human observers depends on their familiarity with the appearance of the objects. The degree of this dependence is a function of the inter-object similarity in the object set. The more similar the objects are, the stronger is this dependence, and the more dominant is two-dimensional (2-D) image information. However, the extent to which 3-D structural information is used still remains an area of strong debate. Previously, we showed that all models that allowed 2-D rotations in the image plane of independent 2-D templates were unable to account for human performance in recognising novel object views. Here we derive a closed-form Bayesian ideal observer that gives rise to probably the best possible performance when applying 2-D affine transformations (translation, rotation, scaling, stretching, and other linear transformations) to stored 2-D templates. In addition, we compare human performance with a closed-form derivation that finds the best match between a 2-D template and a 2-D image under 2-D affine transformations. We also compare human performance with a generalised radial basis functions model. This model establishes optimal performance for learned 2-D templates, and then adjusts the variance of its radial basis (Gaussian) functions to achieve best possible performance for novel views of individual objects. We demonstrate that none of these models can account for human performance in 3-D object recognition. Human statistical efficiency for novel views is higher than for learned views, which suggests that 3-D structural information is used by human observers.

2020 ◽  
Vol 2 (1) ◽  
pp. 37-42
Author(s):  
A. M. Kovalchuk ◽  

The images are one of the most used kinds of the information in modern information company. Therefore actual problems is the organization of protection from unauthorized access and usage. An important characteristic of the image is the presence of contours in the image. The task of contour selection requires the use of operations on adjacent elements that are sensitive to change and suppress areas of constant levels of brightness, that is, contours are those areas where changes occur, becoming light, while other parts of the image remain dark. Mathematically, the ideal outline is to break the spatial function of the brightness levels in the image plane. Therefore, contour selection means finding the most dramatic changes, that is, the maxima of the gradient vector module. This is one of the reasons that the contours remain in the image when encrypted in the RSA system, since the encryption here is based on a modular elevation of some natural number. At the same time, on the contour and on the neighboring contours of the peak villages, the elevation of the brightness value gives an even bigger gap. Problem protect from unauthorized access is by more composite in matching with a problem protect from usage. Basis for organization of protection is the interpretation of the image as stochastic signal. It stipulates carry of methods of encoding of signals on a case of the images. But the images are a specific signal, which one in possesses, is padding to representative selfless creativeness, also by visual selfless creativeness. Therefore to methods of encoding, in case of their usage concerning the images, one more requirement – full noise of the coded image is put forward. It is necessary to make to impossible usage of methods of visual image processing. The algorithm RSA is one of the most used production specifications of encoding of signals. In attitude of the images there are some problems of its encoding, the contours on the coded image are in particular saved. Therefore actual problem is the mining of modification to a method RSA such, that: to supply stability to decoding; to supply full noise of the images. One of pathes of the solution of this problem is usage of affine transformations.


2015 ◽  
Vol 114 (6) ◽  
pp. 3076-3096 ◽  
Author(s):  
Ryan M. Peters ◽  
Phillip Staibano ◽  
Daniel Goldreich

The ability to resolve the orientation of edges is crucial to daily tactile and sensorimotor function, yet the means by which edge perception occurs is not well understood. Primate cortical area 3b neurons have diverse receptive field (RF) spatial structures that may participate in edge orientation perception. We evaluated five candidate RF models for macaque area 3b neurons, previously recorded while an oriented bar contacted the monkey's fingertip. We used a Bayesian classifier to assign each neuron a best-fit RF structure. We generated predictions for human performance by implementing an ideal observer that optimally decoded stimulus-evoked spike counts in the model neurons. The ideal observer predicted a saturating reduction in bar orientation discrimination threshold with increasing bar length. We tested 24 humans on an automated, precision-controlled bar orientation discrimination task and observed performance consistent with that predicted. We next queried the ideal observer to discover the RF structure and number of cortical neurons that best matched each participant's performance. Human perception was matched with a median of 24 model neurons firing throughout a 1-s period. The 10 lowest-performing participants were fit with RFs lacking inhibitory sidebands, whereas 12 of the 14 higher-performing participants were fit with RFs containing inhibitory sidebands. Participants whose discrimination improved as bar length increased to 10 mm were fit with longer RFs; those who performed well on the 2-mm bar, with narrower RFs. These results suggest plausible RF features and computational strategies underlying tactile spatial perception and may have implications for perceptual learning.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Xiaoxuan Jia ◽  
Ha Hong ◽  
Jim DiCarlo

Temporal continuity of object identity is a feature of natural visual input, and is potentially exploited -- in an unsupervised manner -- by the ventral visual stream to build the neural representation in inferior temporal (IT) cortex. Here we investigated whether plasticity of individual IT neurons underlies human core-object-recognition behavioral changes induced with unsupervised visual experience. We built a single-neuron plasticity model combined with a previously established IT population-to-recognition-behavior linking model to predict human learning effects. We found that our model, after constrained by neurophysiological data, largely predicted the mean direction, magnitude and time course of human performance changes. We also found a previously unreported dependency of the observed human performance change on the initial task difficulty. This result adds support to the hypothesis that tolerant core object recognition in human and non-human primates is instructed -- at least in part -- by naturally occurring unsupervised temporal contiguity experience.


Author(s):  
D. Van Dyck ◽  
M. Op de Beeck

Reliable interpretation of high resolution electron micrographs is only possible by comparison with computer simulations. However, simulation is a tedious trial and error technique which can only be successful if the experimental parameters are known and if the number of plausible structure models is small. For instance the interpretation of images of amorphous objects by simulation is hopeless. This makes the power of HREM very much dependent on the amount of prior information available from other techniques such as X-ray diffraction. HREM would be much more powerful and independent if a direct method could be established to retrieve the structural information of the object directly from the electron micrographs.We present a new “focus variation method” in which the phase is retrieved in the image plane in a deterministic way from a combination of images at closely spaced focus values, as inspired on the method proposed in [1] [2]. In a sense the whole information is used in the 3D image area of the electron microscope. The method allows to correct for chromatic aberration, spherical aberration and focus and is robust against noise. In a second stage we retrieve the projected structure of the object directly from the knowledge of the wavefunction at the exit face, using the channelling theory proposed in [3]. The first results are very promising.


1995 ◽  
Vol 348 (1325) ◽  
pp. 321-340 ◽  

We develop model-independent methods for characterizing the reliability of neural spike trains in response to brief stimuli. Through this approach we measure the discriminability of similar stimuli based on the real-time response of a single neuron in much the same way that modern psychophysical techniques measure the discrimination performance of the whole animal. Extending these techniques, we quantify discriminability as a function of time after stimulus presentation, so that it is possible to compare the measured reliability of the neuron to its theoretical limit predicted from signal transduction and noise levels in the sensory periphery. The methods are applied to a wide-field movement-sensitive neuron (HI) in the visual system of the blowfly Calliphora vicina , where we also record from the photoreceptor cells that provide the sensory input to HI. From an analysis of neural responses to wide-field stepwise movements of various step sizes we find the following. (1) One or two spikes are sufficient to encode just noticeable differences of approximately one-tenth the angular spacing between photoreceptors, comparable to the hyperacuity regime observed in humans. (2) Discriminability improves upon observation of successive spikes as if the interspike intervals carried independent information. Coding seems orderly and analogue in the sense that we find no indication of information being transmitted in complex combinations of spike intervals. (3) As a result of neural refractoriness the real neuron’s performance is significantly better than that of a neuron generating spikes according to a Poisson process at the same firing rate. (4) Over behaviourally relevant time intervals following the movement step, that is up to about 30-40 ms, the discrimination performance of the neuron is close to that of an ideal observer who extracts movement information from all the photoreceptor cells in the field of stimulation. Beyond this time the neuron's performance relative to the ideal observer decreases significantly.


Sign in / Sign up

Export Citation Format

Share Document