scholarly journals Levels of Representation in a Deep Learning Model of Categorization

2019 ◽  
Author(s):  
Olivia Guest ◽  
Bradley C. Love

AbstractDeep convolutional neural networks (DCNNs) rival humans in object recognition. The layers (or levels of representation) in DCNNs have been successfully aligned with processing stages along the ventral stream for visual processing. Here, we propose a model of concept learning that uses visual representations from these networks to build memory representations of novel categories, which may rely on the medial temporal lobe (MTL) and medial prefrontal cortex (mPFC). Our approach opens up two possibilities:a) formal investigations can involve photographic stimuli as opposed to stimuli handcrafted and coded by the experimenter;b) model comparison can determine which level of representation within a DCNN a learner is using during categorization decisions. Pursuing the latter point, DCNNs suggest that the shape bias in children relies on representations at more advanced network layers whereas a learner that relied on lower network layers would display a color bias. These results confirm the role of natural statistics in the shape bias (i.e., shape is predictive of category membership) while highlighting that the type of statistics matter, i.e., those from lower or higher levels of representation. We use the same approach to provide evidence that pigeons performing seemingly sophisticated categorization of complex imagery may in fact be relying on representations that are very low-level (i.e., retinotopic). Although complex features, such as shape, relatively predominate at more advanced network layers, even simple features, such as spatial frequency and orientation, are better represented at the more advanced layers, contrary to a standard hierarchical view.

Agriculture ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 651
Author(s):  
Shengyi Zhao ◽  
Yun Peng ◽  
Jizhan Liu ◽  
Shuo Wu

Crop disease diagnosis is of great significance to crop yield and agricultural production. Deep learning methods have become the main research direction to solve the diagnosis of crop diseases. This paper proposed a deep convolutional neural network that integrates an attention mechanism, which can better adapt to the diagnosis of a variety of tomato leaf diseases. The network structure mainly includes residual blocks and attention extraction modules. The model can accurately extract complex features of various diseases. Extensive comparative experiment results show that the proposed model achieves the average identification accuracy of 96.81% on the tomato leaf diseases dataset. It proves that the model has significant advantages in terms of network complexity and real-time performance compared with other models. Moreover, through the model comparison experiment on the grape leaf diseases public dataset, the proposed model also achieves better results, and the average identification accuracy of 99.24%. It is certified that add the attention module can more accurately extract the complex features of a variety of diseases and has fewer parameters. The proposed model provides a high-performance solution for crop diagnosis under the real agricultural environment.


Author(s):  
Hannah Garcia Doherty ◽  
Roberto Arnaiz Burgueño ◽  
Roeland P. Trommel ◽  
Vasileios Papanastasiou ◽  
Ronny I. A. Harmanny

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.


Author(s):  
N Seijdel ◽  
N Tsakmakidis ◽  
EHF De Haan ◽  
SM Bohte ◽  
HS Scholte

AbstractFeedforward deep convolutional neural networks (DCNNs) are, under specific conditions, matching and even surpassing human performance in object recognition in natural scenes. This performance suggests that the analysis of a loose collection of image features could support the recognition of natural object categories, without dedicated systems to solve specific visual subtasks. Research in humans however suggests that while feedforward activity may suffice for sparse scenes with isolated objects, additional visual operations (‘routines’) that aid the recognition process (e.g. segmentation or grouping) are needed for more complex scenes. Linking human visual processing to performance of DCNNs with increasing depth, we here explored if, how, and when object information is differentiated from the backgrounds they appear on. To this end, we controlled the information in both objects and backgrounds, as well as the relationship between them by adding noise, manipulating background congruence and systematically occluding parts of the image. Results indicate that with an increase in network depth, there is an increase in the distinction between object- and background information. For more shallow networks, results indicated a benefit of training on segmented objects. Overall, these results indicate that, de facto, scene segmentation can be performed by a network of sufficient depth. We conclude that the human brain could perform scene segmentation in the context of object identification without an explicit mechanism, by selecting or “binding” features that belong to the object and ignoring other features, in a manner similar to a very deep convolutional neural network.


2021 ◽  
Author(s):  
D. Merika W. Sanders ◽  
Rosemary A. Cowell

Representational theories predict that brain regions contribute to cognition according to the information they represent (e.g., simple versus complex), contradicting the traditional notion that brain regions are specialized for cognitive functions (e.g., perception versus memory). In support of representational accounts, substantial evidence now attests that the Medial Temporal Lobe (MTL) is not specialized solely for long-term declarative memory, but underpins other functions including perception and future-imagining for complex stimuli and events. However, a complementary prediction has been less well explored, namely that the cortical locus of declarative memory may fall outside the MTL if the to-be-remembered content is sufficiently simple. Specifically, the locus should coincide with the optimal neural code for the representations being retrieved. To test this prediction, we manipulated the complexity of the to-be-remembered representations in a recognition memory task. First, participants in the scanner viewed novel 3D objects and scenes, and we used multivariate analyses to identify regions in the ventral visual-MTL pathway that preferentially coded for either simple features of the stimuli, or complex conjunctions of those features. Next, in a separate scan, we tested recognition memory for these stimuli and performed neuroimaging contrasts that revealed two memory signals ‒ feature memory and conjunction memory. Feature memory signals were found in visual cortex, while conjunction memory signals emerged in MTL. Further, the regions optimally representing features via preferential feature-coding coincided with those exhibiting feature memory signals. These findings suggest that representational content, rather than cognitive function, is the primary organizing principle in the ventral visual-MTL pathway.


2020 ◽  
Author(s):  
Long Luu ◽  
Alan A. Stocker

AbstractCategorical judgments can systematically bias the perceptual interpretation of stimulus features. However, it remained unclear whether categorical judgments directly modify working memory representations or, alternatively, generate these biases via an inference process down-stream from working memory. To address this question we ran two novel psychophysical experiments in which human subjects had to revert their categorical judgments about a stimulus feature, if incorrect based on feedback, before providing an estimate of the feature. If categorical judgments indeed directly altered sensory representations in working memory, subjects’ estimates should reflect some aspects of their initial (incorrect) categorical judgment in those trials.We found no traces of the initial categorical judgment. Rather, subjects seem to be able to flexibly switch their categorical judgment if needed and use the correct corresponding categorical prior to properly perform feature inference. A cross-validated model comparison also revealed that feedback may lead to selective memory recall such that only memory samples that are consistent with the categorical judgment are accepted for the inference process. Our results suggest that categorical judgments do not modify sensory information in working memory but rather act as top-down expectation in the subsequent sensory recall and inference process down-stream from working memory.


Author(s):  
D. Marmanis ◽  
J. D. Wegner ◽  
S. Galliani ◽  
K. Schindler ◽  
M. Datcu ◽  
...  

This paper describes a deep learning approach to semantic segmentation of very high resolution (aerial) images. Deep neural architectures hold the promise of end-to-end learning from raw images, making heuristic feature design obsolete. Over the last decade this idea has seen a revival, and in recent years deep convolutional neural networks (CNNs) have emerged as the method of choice for a range of image interpretation tasks like visual recognition and object detection. Still, standard CNNs do not lend themselves to per-pixel semantic segmentation, mainly because one of their fundamental principles is to gradually aggregate information over larger and larger image regions, making it hard to disentangle contributions from different pixels. Very recently two extensions of the CNN framework have made it possible to trace the semantic information back to a precise pixel position: deconvolutional network layers undo the spatial downsampling, and Fully Convolution Networks (FCNs) modify the fully connected classification layers of the network in such a way that the location of individual activations remains explicit. We design a FCN which takes as input intensity and range data and, with the help of aggressive deconvolution and recycling of early network layers, converts them into a pixelwise classification at full resolution. We discuss design choices and intricacies of such a network, and demonstrate that an ensemble of several networks achieves excellent results on challenging data such as the <i>ISPRS semantic labeling benchmark</i>, using only the raw data as input.


2019 ◽  
Author(s):  
David Stawarczyk ◽  
Christopher N. Wahlheim ◽  
Joset A. Etzel ◽  
Abraham Z. Snyder ◽  
Jeffrey M. Zacks

AbstractWhen encountering unexpected event changes, memories of relevant past experiences must be updated to form new representations. Current models of memory updating propose that people must first generate memory-based predictions to detect and register that features of the environment have changed, then encode the new event features and integrate them with relevant memories of past experiences to form configural memory representations. Each of these steps may be impaired in older adults. Using functional MRI, we investigated these mechanisms in healthy young and older adults. In the scanner, participants first watched a movie depicting everyday activities in a day of an actor’s life. They next watched a second nearly identical movie in which some scenes ended differently. Crucially, before watching the last part of each activity, the second movie stopped, and participants were asked to mentally replay how the activity previously ended. Three days later, participants were asked to recall the activities. Neural activity pattern reinstatement in medial temporal lobe (MTL) during the replay phase of the second movie was associated with detecting changes and with better memory for the original activity features. Reinstatements in posterior medial cortex (PMC) additionally predicted better memory for changed features. Compared to young adults, older adults showed a reduced ability to detect and remember changes, and weaker associations between reinstatement and memory performance. These findings suggest that PMC and MTL contribute to change processing by reinstating previous event features, and that older adults are less able to use reinstatement to update memory for changed features.


2020 ◽  
Author(s):  
Simon W. Davis ◽  
Benjamin R. Geib ◽  
Erik A. Wing ◽  
Wei-Chun Wang ◽  
Mariam Hovhannisyan ◽  
...  

AbstractIt is generally assumed that the encoding of a single event generates multiple memory representations, which contribute differently to subsequent episodic memory. We used fMRI and representational similarity analysis (RSA) to examine how visual and semantic representations predicted subsequent memory for single item encoding (e.g., seeing an orange). Three levels of visual representations corresponding to early, middle, and late visual processing stages were based on a deep neural network. Three levels of semantic representations were based on normative Observed (“is round”), Taxonomic (“is a fruit”), and Encyclopedic features (“is sweet”). We identified brain regions where each representation type predicted later Perceptual Memory, Conceptual Memory, or both (General Memory). Participants encoded objects during fMRI, and then completed both a word-based conceptual and picture-based perceptual memory test. Visual representations predicted subsequent Perceptual Memory in visual cortices, but also facilitated Conceptual and General Memory in more anterior regions. Semantic representations, in turn, predicted Perceptual Memory in visual cortex, Conceptual Memory in the perirhinal and inferior prefrontal cortex, and General Memory in the angular gyrus. These results suggest that the contribution of visual and semantic representations to subsequent memory effects depends on a complex interaction between representation, test type, and storage location.


2022 ◽  
Author(s):  
Akshay Vivek Jagadeesh ◽  
Justin Gardner

The human visual ability to recognize objects and scenes is widely thought to rely on representations in category-selective regions of visual cortex. These representations could support object vision by specifically representing objects, or, more simply, by representing complex visual features regardless of the particular spatial arrangement needed to constitute real world objects. That is, by representing visual textures. To discriminate between these hypotheses, we leveraged an image synthesis approach that, unlike previous methods, provides independent control over the complexity and spatial arrangement of visual features. We found that human observers could easily detect a natural object among synthetic images with similar complex features that were spatially scrambled. However, observer models built from BOLD responses from category-selective regions, as well as a model of macaque inferotemporal cortex and Imagenet-trained deep convolutional neural networks, were all unable to identify the real object. This inability was not due to a lack of signal-to-noise, as all of these observer models could predict human performance in image categorization tasks. How then might these texture-like representations in category-selective regions support object perception? An image-specific readout from category-selective cortex yielded a representation that was more selective for natural feature arrangement, showing that the information necessary for object discrimination is available. Thus, our results suggest that the role of human category-selective visual cortex is not to explicitly encode objects but rather to provide a basis set of texture-like features that can be infinitely reconfigured to flexibly learn and identify new object categories.


2004 ◽  
Vol 92 (1) ◽  
pp. 660-664 ◽  
Author(s):  
Florian Ostendorf ◽  
Carsten Finke ◽  
Christoph J. Ploner

Voluntary behavior critically depends on attentional selection and short-term maintenance of perceptual information. Recent research suggests a tight coupling of both cognitive functions with visual processing being selectively enhanced by working memory representations. Here, we combined a memory-guided saccade paradigm (6-s delay) with a visual discrimination task, performed either 1,500, 2,500, or 3,500 ms after presentation of the memory cue. Contrary to what can be expected from previous studies, our results show that memory of spatial cues can transiently delay speeded discrimination of stimuli presented at remembered locations. This effect was not observed in a control experiment without memory requirements. Furthermore, delayed discrimination was dependent on the strength of actual memory representations as reflected by accuracy of memory-guided saccades. We propose an active inhibitory mechanism that counteracts facilitating effects of spatial working memory, promoting flexible orienting to novel information during maintenance of spatial memoranda for intended actions. Inhibitory delay-period activity in prefrontal cortex is a likely source for this mechanism which may be mediated by prefronto-tectal projections.


Sign in / Sign up

Export Citation Format

Share Document