Parallel processing in high-level categorization of natural images

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and object parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing a novel hierarchical feature integration mechanism, named Hi-Fi, to address the object skeleton detection problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well as low-level details from shallower layers. By hierarchically integrating different CNN feature levels with bidirectional guidance, our approach (1) enables mutual refinement across features of different levels, and (2) possesses the strong ability to capture both rich object context and high-resolution details. Experimental results show that our method significantly outperforms the state-of-the-art methods in terms of effectively fusing features from very different scales, as evidenced by a considerable performance improvement on several benchmarks.

Deep convolutional models improve predictions of macaque V1 responses to natural images

10.1101/201764 ◽

2017 ◽

Cited By ~ 23

Author(s):

Santiago A. Cadena ◽

George H. Denfield ◽

Edgar Y. Walker ◽

Leon A. Gatys ◽

Andreas S. Tolias ◽

...

Keyword(s):

Visual Cortex ◽

Transfer Learning ◽

Primary Visual Cortex ◽

State Of The Art ◽

Natural Images ◽

Data Driven ◽

Neural Responses ◽

Natural Stimuli ◽

High Level ◽

Spiking Activity

AbstractDespite great efforts over several decades, our best models of primary visual cortex (V1) still predict spiking activity quite poorly when probed with natural stimuli, highlighting our limited understanding of the nonlinear computations in V1. Recently, two approaches based on deep learning have been successfully applied to neural data: On the one hand, transfer learning from networks trained on object recognition worked remarkably well for predicting neural responses in higher areas of the primate ventral stream, but has not yet been used to model spiking activity in early stages such as V1. On the other hand, data-driven models have been used to predict neural responses in the early visual system (retina and V1) of mice, but not primates. Here, we test the ability of both approaches to predict spiking activity in response to natural images in V1 of awake monkeys. Even though V1 is rather at an early to intermediate stage of the visual system, we found that the transfer learning approach performed similarly well to the data-driven approach and both outperformed classical linear-nonlinear and wavelet-based feature representations that build on existing theories of V1. Notably, transfer learning using a pre-trained feature space required substantially less experimental time to achieve the same performance. In conclusion, multi-layer convolutional neural networks (CNNs) set the new state of the art for predicting neural responses to natural images in primate V1 and deep features learned for object recognition are better explanations for V1 computation than all previous filter bank theories. This finding strengthens the necessity of V1 models that are multiple nonlinearities away from the image domain and it supports the idea of explaining early visual cortex based on high-level functional goals.Author summaryPredicting the responses of sensory neurons to arbitrary natural stimuli is of major importance for understanding their function. Arguably the most studied cortical area is primary visual cortex (V1), where many models have been developed to explain its function. However, the most successful models built on neurophysiologists’ intuitions still fail to account for spiking responses to natural images. Here, we model spiking activity in primary visual cortex (V1) of monkeys using deep convolutional neural networks (CNNs), which have been successful in computer vision. We both trained CNNs directly to fit the data, and used CNNs trained to solve a high-level task (object categorization). With these approaches, we are able to outperform previous models and improve the state of the art in predicting the responses of early visual neurons to natural images. Our results have two important implications. First, since V1 is the result of several nonlinear stages, it should be modeled as such. Second, functional models of entire visual pathways, of which V1 is an early stage, do not only account for higher areas of such pathways, but also provide useful representations for V1 predictions.

Backscroll Illusion

10.1093/acprof:oso/9780199794607.003.0065 ◽

2017 ◽

Author(s):

Kiyoshi Fujimoto

Keyword(s):

Relative Motion ◽

Translational Motion ◽

Human Vision ◽

Natural Images ◽

Motion Pattern ◽

Motion Representation ◽

Object Based ◽

High Level ◽

Psychophysical Studies ◽

Direction Opposite

Human vision recognizes the direction of a human, an animal, and objects in translational motion, even when they are displayed in a still position on a screen as filmed by a panning camera and with the background erased. Because there is no clue to relative motion between the object and the background, the recognition relies on a facing direction and/or movements of its internal parts like limbs. Such high-level object-based motion representation is capable of affecting lower-level motion perception. An ambiguous motion pattern is inserted to the screen behind the translating object. Then the pattern appears moving in a direction opposite to that which the object implies. This is called the backscroll illusion, and psychophysical studies were conducted to investigate phenomenal aspects with the hypothesis that the illusion reflects a strategy the visual system adopts in everyday circumstances. The backscroll illusion convincingly demonstrates that natural images contain visual illusions.

Characterizing the high-level content of natural images using lexical basis functions

10.1117/12.477775 ◽

2003 ◽

Cited By ~ 3

Author(s):

John A. Black, Jr. ◽

Kanav Kahol ◽

Prem Kuchi ◽

Gamal F. Fahmy ◽

Sethuraman Panchanathan

Keyword(s):

Basis Functions ◽

Natural Images ◽

High Level

Self-Supervised Natural Image Reconstruction and Rich Semantic Classification from Brain Activity

10.1101/2020.09.06.284794 ◽

2020 ◽

Author(s):

Guy Gaziv ◽

Roman Beliy ◽

Niv Granot ◽

Assaf Hoogi ◽

Francesca Strappini ◽

...

Keyword(s):

Image Reconstruction ◽

Large Scale ◽

Brain Activity ◽

Semantic Category ◽

Natural Images ◽

Natural Image ◽

Paired Data ◽

Semantic Classification ◽

Network Training ◽

High Level

AbstractReconstructing natural images and decoding their semantic category from fMRI brain recordings is challenging. Acquiring sufficient pairs (image, fMRI) that span the huge space of natural images is prohibitive. We present a novel self-supervised approach for fMRI-to-image reconstruction and classification that goes well beyond the scarce paired data. By imposing cycle consistency, we train our image reconstruction deep neural network on many “unpaired” data: a plethora of natural images without fMRI recordings (from many novel categories), and fMRI recordings without images. Combining high-level perceptual objectives with self-supervision on unpaired data results in a leap improvement over top existing methods, achieving: (i) Unprecedented image-reconstruction from fMRI of never-before-seen images (evaluated by image metrics and human testing); (ii) Large-scale semantic classification (1000 diverse classes) of categories that are never-before-seen during network training. Such large-scale (1000-way) semantic classification capabilities from fMRI recordings have never been demonstrated before. Finally, we provide evidence for the biological plausibility of our learned model. 1

Bio-Inspired Scheme for Classification of Visual Information

Computer Vision for Multimedia Applications ◽

10.4018/978-1-60960-024-2.ch014 ◽

2011 ◽

pp. 238-262

Author(s):

Le Dong ◽

Ebroul Izquierdo ◽

Shuzhi Ge

Keyword(s):

Selective Attention ◽

Visual Information ◽

Natural Images ◽

Top Down ◽

Biologically Inspired ◽

Low Level ◽

Research Problems ◽

Knowledge Structuring ◽

High Level

In this chapter, research on visual information classification based on biologically inspired visually selective attention with knowledge structuring is presented. The research objective is to develop visual models and corresponding algorithms to automatically extract features from selective essential areas of natural images, and finally, to achieve knowledge structuring and classification within a structural description scheme. The proposed scheme consists of three main aspects: biologically inspired visually selective attention, knowledge structuring and classification of visual information. Biologically inspired visually selective attention closely follow the mechanisms of the visual “what” and “where” pathways in the human brain. The proposed visually selective attention model uses a bottom-up approach to generate essential areas based on low-level features extracted from natural images. This model also exploits a low-level top-down selective attention mechanism which performs decisions on interesting objects by human interaction with preference or refusal inclination. Knowledge structuring automatically creates a relevance map from essential areas generated by visually selective attention. The developed algorithms derive a set of well-structured representations from low-level description to drive the final classification. The knowledge structuring relays on human knowledge to produce suitable links between low-level descriptions and high-level representation on a limited training set. The backbone is a distribution mapping strategy involving two novel modules: structured low-level feature extraction using convolution neural network and topology preservation based on sparse representation and unsupervised learning algorithm. Classification is achieved by simulating high-level top-down visual information perception and classification using an incremental Bayesian parameter estimation method. The utility of the proposed scheme for solving relevant research problems is validated. The proposed modular architecture offers straightforward expansion to include user relevance feedback, contextual input, and multimodal information if available.

Parallel Processing: From Low- to High-Level Vision

Data Analysis in Astronomy II ◽

10.1007/978-1-4613-2249-8_24 ◽

1986 ◽

pp. 253-262

Author(s):

Steven L. Tanimoto

Keyword(s):

Parallel Processing ◽

High Level

Parallel processing of run-length-encoded Boolean imagery: linear transformations and high-level image operations

10.1117/12.154993 ◽

1993 ◽

Author(s):

Mark S. Schmalz

Keyword(s):

Parallel Processing ◽

Linear Transformations ◽

Run Length ◽

High Level