scholarly journals On the necessity of recurrent processing during object recognition: it depends on the need for scene segmentation

2020 ◽  
Author(s):  
Noor Seijdel ◽  
Jessica Loke ◽  
Ron van de Klundert ◽  
Matthew van der Meer ◽  
Eva Quispel ◽  
...  

AbstractWhile feed-forward activity may suffice for recognizing objects in isolation, additional visual operations that aid object recognition might be needed for real-world scenes. One such additional operation is figure-ground segmentation; extracting the relevant features and locations of the target object while ignoring irrelevant features. In this study of 60 participants, we show objects on backgrounds of increasing complexity to investigate whether recurrent computations are increasingly important for segmenting objects from more complex backgrounds. Three lines of evidence show that recurrent processing is critical for recognition of objects embedded in complex scenes. First, behavioral results indicated a greater reduction in performance after masking objects presented on more complex backgrounds; with the degree of impairment increasing with increasing background complexity. Second, electroencephalography (EEG) measurements showed clear differences in the evoked response potentials (ERPs) between conditions around 200ms - a time point beyond feed-forward activity and object decoding based on the EEG signal indicated later decoding onsets for objects embedded in more complex backgrounds. Third, Deep Convolutional Neural Network performance confirmed this interpretation; feed-forward and less deep networks showed a higher degree of impairment in recognition for objects in complex backgrounds compared to recurrent and deeper networks. Together, these results support the notion that recurrent computations drive figure-ground segmentation of objects in complex scenes.

Perception ◽  
1998 ◽  
Vol 27 (1) ◽  
pp. 47-68 ◽  
Author(s):  
Fiona N Newell

The effect of stimulus factors such as interobject similarity and stimulus density on the recognition of objects across changes in view was investigated in five experiments. The recognition of objects across views was found to depend on the degree of interobject similarity and on stimulus density: recognition was view dependent when both interobject similarity and stimulus density were high, irrespective of the familiarity of the target object. However, when stimulus density or interobject similarity was low recognition was invariant to viewpoint. It was found that recognition was accomplished through view-dependent procedures when discriminability between objects was low. The findings are discussed in terms of an exemplar-based model in which the dimensions used for discriminating between objects are optimised to maximise the differences between the objects. This optimisation process is characterised as a perceptual ‘ruler’ which measures interobject similarity by stretching across objects in representational space. It is proposed that the ‘ruler’ optimises the feature differences between objects in such a way that recognition is view invariant but that such a process incurs a cost in discriminating between small feature differences, which results in view-dependent recognition performance.


2018 ◽  
Author(s):  
Iris I. A. Groen ◽  
Sara Jahfari ◽  
Noor Seijdel ◽  
Sennary Ghebreab ◽  
Victor A. F. Lamme ◽  
...  

AbstractObject recognition is thought to be mediated by rapid feed-forward activation of object-selective cortex, with limited contribution of feedback. However, disruption of visual evoked activity beyond feed-forward processing stages has been demonstrated to affect object recognition performance. Here, we unite these findings by reporting that the detection of target objects in natural scenes is selectively characterized by enhanced feedback when these objects are embedded in high complexity scenes. Human participants performed an animal target detection task on scenes with low, medium or high complexity as determined by a biologically plausible computational model of low-level contrast statistics. Three converging lines of evidence indicate that feedback was enhanced during categorization of scenes with high, but not low or medium complexity. First, functional magnetic resonance imaging (fMRI) activity in early visual cortex (V1) was selectively enhanced for target objects in scenes with high complexity. Second, event-related potentials (ERPs) evoked by high complexity scenes were selectively enhanced from 220 ms after stimulus-onset. Third, behavioral performance deteriorated for highly complex scenes when participants were pressed for time, but not when they could process the scenes fully and thereby benefit from the enhanced feedback. Formal modeling of the reaction time distributions revealed that object information accumulated more slowly for high complexity scenes (resulting in more errors especially for fast decisions), and directly related to the build-up of the feedback activity that was observed exclusively for high complexity scenes. Together, these results suggest that while feed-forward activity may suffice for simple scenes, the brain employs recurrent processing more adaptively in naturalistic settings, using minimal feedback for sparse, coherent scenes and increasing feedback for complex, fragmented scenes.Author summaryHow much neural processing is required to detect objects of interest in natural scenes? The astonishing speed of object recognition suggests that fast feed-forward buildup of perceptual activity is sufficient. However, this view is contradicted by findings that show that disruption of slower neural feedback leads to decreased detection performance. Our study unites these discrepancies by identifying scene complexity as a critical driver of neural feedback. We show how feedback is enhanced for complex, cluttered scenes compared to simple, well-organized scenes. Moreover, for complex scenes, more feedback is associated with better performances. These findings relate the flexibility of neural processes to perceptual decision-making by demonstrating that the brain dynamically directs neural resources based on the complexity of real-world visual inputs.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1919
Author(s):  
Shuhua Liu ◽  
Huixin Xu ◽  
Qi Li ◽  
Fei Zhang ◽  
Kun Hou

With the aim to solve issues of robot object recognition in complex scenes, this paper proposes an object recognition method based on scene text reading. The proposed method simulates human-like behavior and accurately identifies objects with texts through careful reading. First, deep learning models with high accuracy are adopted to detect and recognize text in multi-view. Second, datasets including 102,000 Chinese and English scene text images and their inverse are generated. The F-measure of text detection is improved by 0.4% and the recognition accuracy is improved by 1.26% because the model is trained by these two datasets. Finally, a robot object recognition method is proposed based on the scene text reading. The robot detects and recognizes texts in the image and then stores the recognition results in a text file. When the user gives the robot a fetching instruction, the robot searches for corresponding keywords from the text files and achieves the confidence of multiple objects in the scene image. Then, the object with the maximum confidence is selected as the target. The results show that the robot can accurately distinguish objects with arbitrary shape and category, and it can effectively solve the problem of object recognition in home environments.


Author(s):  
Kohitij Kar ◽  
James J DiCarlo

SummaryDistributed neural population spiking patterns in macaque inferior temporal (IT) cortex that support core visual object recognition require additional time to develop for specific (“late-solved”) images suggesting the necessity of recurrent processing in these computations. Which brain circuit motifs are most responsible for computing and transmitting these putative recurrent signals to IT? To test whether the ventral prefrontal cortex (vPFC) is a critical recurrent circuit node in this system, here we pharmacologically inactivated parts of the vPFC and simultaneously measured IT population activity, while monkeys performed object discrimination tasks. Our results show that vPFC inactivation deteriorated the quality of the late-phase (>150 ms from image onset) IT population code, along with commensurate, specific behavioral deficits for “late-solved” images. Finally, silencing vPFC caused the monkeys’ IT activity patterns and behavior to become more like those produced by feedforward artificial neural network models of the ventral stream. Together with prior work, these results argue that fast recurrent processing through the vPFC is critical to the production of behaviorally-sufficient object representations in IT.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 210 ◽  
Author(s):  
Yi-Chun Du ◽  
Muslikhin Muslikhin ◽  
Tsung-Han Hsieh ◽  
Ming-Shyan Wang

This paper develops a hybrid algorithm of adaptive network-based fuzzy inference system (ANFIS) and regions with convolutional neural network (R-CNN) for stereo vision-based object recognition and manipulation. The stereo camera at an eye-to-hand configuration firstly captures the image of the target object. Then, the shape, features, and centroid of the object are estimated. Similar pixels are segmented by the image segmentation method, and similar regions are merged through selective search. The eye-to-hand calibration is based on ANFIS to reduce computing burden. A six-degree-of-freedom (6-DOF) robot arm with a gripper will conduct experiments to demonstrate the effectiveness of the proposed system.


2013 ◽  
Vol 2 (2) ◽  
pp. 66-79 ◽  
Author(s):  
Onsy A. Abdel Alim ◽  
Amin Shoukry ◽  
Neamat A. Elboughdadly ◽  
Gehan Abouelseoud

In this paper, a pattern recognition module that makes use of 3-D images of objects is presented. The proposed module takes advantage of both the generalization capability of neural networks and the possibility of manipulating 3-D images to generate views at different poses of the object that is to be recognized. This allows the construction of a robust 3-D object recognition module that can find use in various applications including military, biomedical and mine detection applications. The paper proposes an efficient training procedure and decision making strategy for the suggested neural network. Sample results of testing the module on 3-D images of several objects are also included along with an insightful discussion of the implications of the results.


Author(s):  
Michael S. Brickner ◽  
Amir Zvuloni

Thermal imaging (TI) systems, transform the distribution of relative temperatures in a scene into a visible TV image. TIs differ significantly from regular TV images. Most TI systems allow their operators to select preferred polarity which determines the way in which gray shades represent different temperatures. Polarity may be set to either black hot (BH) or white hot (WH). The present experiments were designed to investigate the effects of polarity on object recognition performance in TI and to compare object recognition performance of experts and novices. In the first experiment, twenty flight candidates were asked to recognize target objects in 60 dynamic TI recordings taken from two different TI systems. The targets included a variety of human placed and natural objects. Each subject viewed half the targets in BH and the other half in WH polarity in a balanced experimental design. For 24 out of the 60 targets one direction of polarity produced better performance than the other. Although the direction of superior polarity (BH or WH better) was not consistent, the preferred representation of the target object was very consistent. For example, vegetation was more readily recognized when presented as dark objects on a brighter background. The results are discussed in terms of importance of surface determinants versus edge determinants in the recognition of TI objects. In the second experiment, the performance of 10 expert TI users was found to be significantly more accurate but not much faster than the performance of 20 novice subjects.


2011 ◽  
Vol 2 (2) ◽  
pp. 207-226
Author(s):  
LYDIA SÁNCHEZ ◽  
MANUEL CAMPOS

Puzzles concerning attitude reports are at the origin of traditional theories of content. According to most of these theories, content has to involve some sort of conceptual entities, like senses, which determine reference. Conceptual views, however, have been challenged by direct reference theories and informational perspectives on content. In this paper we lay down the central elements of the more relevant strategies for solving cognitive puzzles. We then argue that the best solution available to those who maintain a view of content as truth conditions is to abandon the idea that content is the only element of mental attitudes that can make a difference as to the truth value of attitude reports. We finally resort to means of recognition of objects as one obvious element that helps explain differences in attitudes.


Sign in / Sign up

Export Citation Format

Share Document