Parallel processing in high-level categorization of natural images

10.1038/nn866 ◽  
2002 ◽  
Vol 5 (7) ◽  
pp. 629-630 ◽  
Author(s):  
Guillaume A. Rousselet ◽  
Michèle Fabre-Thorpe ◽  
Simon J. Thorpe

Author(s):  
Kai Zhao ◽  
Wei Shen ◽  
Shanghua Gao ◽  
Dandan Li ◽  
Ming-Ming Cheng

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.



2017 ◽  
Author(s):  
Santiago A. Cadena ◽  
George H. Denfield ◽  
Edgar Y. Walker ◽  
Leon A. Gatys ◽  
Andreas S. Tolias ◽  
...  

AbstractDespite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have been successfully applied to neural data: On the one hand, transfer learning from networks trained on object recognition worked remarkably well for predicting neural responses in higher areas of the primate ventral stream, but has not yet been used to model spiking activity in early stages such as V1. On the other hand, data-driven models have been used to predict neural responses in the early visual system (retina and V1) of mice, but not primates. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. Even though V1 is rather at an early to intermediate stage of the visual system, we found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals.Author summaryPredicting the responses of sensory neurons to arbitrary natural stimuli is of major importance for understanding their function. Arguably the most studied cortical area is primary visual cortex (V1), where many models have been developed to explain its function. However, the most successful models built on neurophysiologists’ intuitions still fail to account for spiking responses to natural images. Here, we model spiking activity in primary visual cortex (V1) of monkeys using deep convolutional neural networks (CNNs), which have been successful in computer vision. We both trained CNNs directly to fit the data, and used CNNs trained to solve a high-level task (object categorization). With these approaches, we are able to outperform previous models and improve the state of the art in predicting the responses of early visual neurons to natural images. Our results have two important implications. First, since V1 is the result of several nonlinear stages, it should be modeled as such. Second, functional models of entire visual pathways, of which V1 is an early stage, do not only account for higher areas of such pathways, but also provide useful representations for V1 predictions.



Author(s):  
Kiyoshi Fujimoto

Human vision recognizes the direction of a human, an animal, and objects in translational motion, even when they are displayed in a still position on a screen as filmed by a panning camera and with the background erased. Because there is no clue to relative motion between the object and the background, the recognition relies on a facing direction and/or movements of its internal parts like limbs. Such high-level object-based motion representation is capable of affecting lower-level motion perception. An ambiguous motion pattern is inserted to the screen behind the translating object. Then the pattern appears moving in a direction opposite to that which the object implies. This is called the backscroll illusion, and psychophysical studies were conducted to investigate phenomenal aspects with the hypothesis that the illusion reflects a strategy the visual system adopts in everyday circumstances. The backscroll illusion convincingly demonstrates that natural images contain visual illusions.



2003 ◽  
Author(s):  
John A. Black, Jr. ◽  
Kanav Kahol ◽  
Prem Kuchi ◽  
Gamal F. Fahmy ◽  
Sethuraman Panchanathan


2020 ◽  
Author(s):  
Guy Gaziv ◽  
Roman Beliy ◽  
Niv Granot ◽  
Assaf Hoogi ◽  
Francesca Strappini ◽  
...  

AbstractReconstructing natural images and decoding their semantic category from fMRI brain recordings is challenging. Acquiring sufficient pairs (image, fMRI) that span the huge space of natural images is prohibitive. We present a novel self-supervised approach for fMRI-to-image reconstruction and classification that goes well beyond the scarce paired data. By imposing cycle consistency, we train our image reconstruction deep neural network on many “unpaired” data: a plethora of natural images without fMRI recordings (from many novel categories), and fMRI recordings without images. Combining high-level perceptual objectives with self-supervision on unpaired data results in a leap improvement over top existing methods, achieving: (i) Unprecedented image-reconstruction from fMRI of never-before-seen images (evaluated by image metrics and human testing); (ii) Large-scale semantic classification (1000 diverse classes) of categories that are never-before-seen during network training. Such large-scale (1000-way) semantic classification capabilities from fMRI recordings have never been demonstrated before. Finally, we provide evidence for the biological plausibility of our learned model. 1



Author(s):  
Le Dong ◽  
Ebroul Izquierdo ◽  
Shuzhi Ge

In this chapter, research on visual information classification based on biologically inspired visually selective attention with knowledge structuring is presented. The research objective is to develop visual models and corresponding algorithms to automatically extract features from selective essential areas of natural images, and finally, to achieve knowledge structuring and classification within a structural description scheme. The proposed scheme consists of three main aspects: biologically inspired visually selective attention, knowledge structuring and classification of visual information. Biologically inspired visually selective attention closely follow the mechanisms of the visual “what” and “where” pathways in the human brain. The proposed visually selective attention model uses a bottom-up approach to generate essential areas based on low-level features extracted from natural images. This model also exploits a low-level top-down selective attention mechanism which performs decisions on interesting objects by human interaction with preference or refusal inclination. Knowledge structuring automatically creates a relevance map from essential areas generated by visually selective attention. The developed algorithms derive a set of well-structured representations from low-level description to drive the final classification. The knowledge structuring relays on human knowledge to produce suitable links between low-level descriptions and high-level representation on a limited training set. The backbone is a distribution mapping strategy involving two novel modules: structured low-level feature extraction using convolution neural network and topology preservation based on sparse representation and unsupervised learning algorithm. Classification is achieved by simulating high-level top-down visual information perception and classification using an incremental Bayesian parameter estimation method. The utility of the proposed scheme for solving relevant research problems is validated. The proposed modular architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.





Sign in / Sign up

Export Citation Format

Share Document