SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition

1997 ◽  
Vol 9 (4) ◽  
pp. 777-804 ◽  
Author(s):  
Bartlett W. Mel

Severe architectural and timing constraints within the primate visual system support the conjecture that the early phase of object recognition in the brain is based on a feedforward feature-extraction hierarchy. To assess the plausibility of this conjecture in an engineering context, a difficult three-dimensional object recognition domain was developed to challenge a pure feedforward, receptive-field based recognition model called SEEMORE. SEEMORE is based on 102 viewpoint-invariant nonlinear filters that as a group are sensitive to contour, texture, and color cues. The visual domain consists of 100 real objects of many different types, including rigid (shovel), nonrigid (telephone cord), and statistical (maple leaf cluster) objects and photographs of complex scenes. Objects were in dividually presented in color video images under normal room lighting conditions. Based on 12 to 36 training views, SEEMORE was required to recognize unnormalized test views of objects that could vary in position, orientation in the image plane and in depth, and scale (factor of 2); for non rigid objects, recognition was also tested under gross shape deformations. Correct classification performance on a test set consisting of 600 novel object views was 97 percent (chance was 1 percent) and was comparable for the subset of 15 nonrigid objects. Performance was also measured under a variety of image degradation conditions, including partial occlusion, limited clutter, color shift, and additive noise. Generalization behavior and classification errors illustrate the emergence of several striking natural shape categories that are not explicitly encoded in the dimensions of the feature space. It is concluded that in the light of the vast hardware resources available in the ventral stream of the primate visual system relative to those exercised here, the appealingly simple feature-space conjecture remains worthy of serious consideration as a neurobiological model.

2019 ◽  
Vol 5 (5) ◽  
pp. eaav7903 ◽  
Author(s):  
Khaled Nasr ◽  
Pooja Viswanathan ◽  
Andreas Nieder

Humans and animals have a “number sense,” an innate capability to intuitively assess the number of visual items in a set, its numerosity. This capability implies that mechanisms to extract numerosity indwell the brain’s visual system, which is primarily concerned with visual object recognition. Here, we show that network units tuned to abstract numerosity, and therefore reminiscent of real number neurons, spontaneously emerge in a biologically inspired deep neural network that was merely trained on visual object recognition. These numerosity-tuned units underlay the network’s number discrimination performance that showed all the characteristics of human and animal number discriminations as predicted by the Weber-Fechner law. These findings explain the spontaneous emergence of the number sense based on mechanisms inherent to the visual system.


1992 ◽  
Vol 4 (2) ◽  
pp. 270-286 ◽  
Author(s):  
Gary Bradski ◽  
Gail A. Carpenter ◽  
Stephen Grossberg

Working memory neural networks, called Sustained Temporal Order REcurrent (STORE) models, encode the invariant temporal order of sequential events in short-term memory (STM). Inputs to the networks may be presented with widely differing growth rates, amplitudes, durations, and interstimulus intervals without altering the stored STM representation. The STORE temporal order code is designed to enable groupings of the stored events to be stably learned and remembered in real time, even as new events perturb the system. Such invariance and stability properties are needed in neural architectures which self-organize learned codes for variable-rate speech perception, sensorimotor planning, or three-dimensional (3-D) visual object recognition. Using such a working memory, a self-organizing architecture for invariant 3-D visual object recognition is described. The new model is based on the model of Seibert and Waxman (1990a), which builds a 3-D representation of an object from a temporally ordered sequence of its two-dimensional (2-D) aspect graphs. The new model, called an ARTSTORE model, consists of the following cascade of processing modules: Invariant Preprocessor → ART 2 → STORE Model → ART 2 → Outstar Network.


2000 ◽  
Vol 12 (11) ◽  
pp. 2547-2572 ◽  
Author(s):  
Edmund T. Rolls ◽  
T. Milward

VisNet2 is a model to investigate some aspects of invariant visual object recognition in the primate visual system. It is a four-layer feedforward network with convergence to each part of a layer from a small region of the preceding layer, with competition between the neurons within a layer and with a trace learning rule to help it learn transform invariance. The trace rule is a modified Hebbian rule, which modifies synaptic weights according to both the current firing rates and the firing rates to recently seen stimuli. This enables neurons to learn to respond similarly to the gradually transforming inputs it receives, which over the short term are likely to be about the same object, given the statistics of normal visual inputs. First, we introduce for VisNet2 both single-neuron and multiple-neuron information-theoretic measures of its ability to respond to transformed stimuli. Second, using these measures, we show that quantitatively resetting the trace between stimuli is not necessary for good performance. Third, it is shown that the sigmoid activation functions used in VisNet2, which allow the sparseness of the representation to be controlled, allow good performance when using sparse distributed representations. Fourth, it is shown that VisNet2 operates well with medium-range lateral inhibition with a radius in the same order of size as the region of the preceding layer from which neurons receive inputs. Fifth, in an investigation of different learning rules for learning transform invariance, it is shown that VisNet2 operates better with a trace rule that incorporates in the trace only activity from the preceding presentations of a given stimulus, with no contribution to the trace from the current presentation, and that this is related to temporal difference learning.


2007 ◽  
Author(s):  
K. Suzanne Scherf ◽  
Marlene Behrmann ◽  
Kate Humphreys ◽  
Beatriz Luna

Sign in / Sign up

Export Citation Format

Share Document