scholarly journals Computational reconstruction of mental representations using human behavior

2022 ◽  
Author(s):  
Laurent Caplette ◽  
Nicholas Turk-Browne

Revealing the contents of mental representations is a longstanding goal of cognitive science. However, there is currently no general framework for providing direct access to representations of high-level visual concepts. We asked participants to indicate what they perceived in images synthesized from random visual features in a deep neural network. We then inferred a mapping between the semantic features of their responses and the visual features of the images. This allowed us to reconstruct the mental representation of virtually any common visual concept, both those reported and others extrapolated from the same semantic space. We successfully validated 270 of these reconstructions as containing the target concept in a separate group of participants. The visual-semantic mapping uncovered with our method further generalized to new stimuli, participants, and tasks. Finally, it allowed us to reveal how the representations of individual observers differ from each other and from those of neural networks.

2019 ◽  
Author(s):  
Michael B. Bone ◽  
Fahad Ahmad ◽  
Bradley R. Buchsbaum

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.


Author(s):  
Xinxun Xu ◽  
Muli Yang ◽  
Yanhua Yang ◽  
Hao Wang

Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a specific cross-modal retrieval task for searching natural images given free-hand sketches under the zero-shot scenario. Most existing methods solve this problem by simultaneously projecting visual features and semantic supervision into a low-dimensional common space for efficient retrieval. However, such low-dimensional projection destroys the completeness of semantic knowledge in original semantic space, so that it is unable to transfer useful knowledge well when learning semantic features from different modalities. Moreover, the domain information and semantic information are entangled in visual features, which is not conducive for cross-modal matching since it will hinder the reduction of domain gap between sketch and image. In this paper, we propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of original semantic knowledge, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR. The progressive projection strategy maintains strong semantic supervision. Besides, to guarantee the retrieval features to capture clean and complete semantic information, the cross-reconstruction loss is introduced to encourage that any combinations of retrieval features and domain features can reconstruct the visual features. Extensive experiments demonstrate the superiority of our PDFD over state-of-the-art competitors.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 95
Author(s):  
Qiang Yu ◽  
Xinyu Xiao ◽  
Chunxia Zhang ◽  
Lifei Song ◽  
Chunhong Pan

Recently, image attributes containing high-level semantic information have been widely used in computer vision tasks, including visual recognition and image captioning. Existing attribute extraction methods map visual concepts to the probabilities of frequently-used words by directly using Convolutional Neural Networks (CNNs). Typically, two main problems exist in those methods. First, words of different parts of speech (POSs) are handled in the same way, but non-nominal words can hardly be mapped to visual regions through CNNs only. Second, synonymous nominal words are treated as independent and different words, in which similarities are ignored. In this paper, a novel Refined Universal Detection (RUDet) method is proposed to solve these two problems. Specifically, a Refinement (RF) module is designed to extract refined attributes of non-nominal words based on the attributes of nominal words and visual features. In addition, a Word Tree (WT) module is constructed to integrate synonymous nouns, which ensures that similar words hold similar and more accurate probabilities. Moreover, a Feature Enhancement (FE) module is adopted to enhance the ability to mine different visual concepts in different scales. Experiments conducted on the large-scale Microsoft (MS) COCO dataset illustrate the effectiveness of our proposed method.


2021 ◽  
Author(s):  
Bria Long ◽  
Judith Fan ◽  
Renata Chai ◽  
Michael C. Frank

To what extent do visual concepts of dogs, cars, and clocks change across childhood? We hypothesized that as children progressively learn which features best distinguish visual concepts from one another, they also improve their ability to connect this knowledge with external representations. To examine this possibility, we investigated developmental changes in children's ability to produce and recognize drawings of common object categories. First, we recruited children aged 2-10 years to produce drawings of 48 categories via a free-standing kiosk in a children's museum, and we measured how recognizable these >37K drawings were using a deep convolutional neural network model of object recognition. Second, we recruited other children across the same age range to identify the drawn category in a subset of these drawings via "guessing games" at the same kiosk.We found consistent developmental gains in both children's ability to include diagnostic visual features in their drawings and in children's ability to use these features when recognizing other children's drawings. Our results suggest that children's ability to connect internal and external representations of visual concepts improves gradually across childhood and imply that developmental trajectories of visual concept learning may be more protracted than previously thought.


Author(s):  
Rhong Zhao ◽  
William I. Grosky

The emergence of multimedia technology and the rapidly expanding image and video collections on the Internet have attracted significant research efforts in providing tools for effective retrieval and management of visual data. Image retrieval is based on the availability of a representation scheme of image content. Image content descriptors may be visual features such as color, texture, shape, and spatial relationships, or semantic primitives. Conventional information retrieval was based solely on text, and those approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways. However, “a picture is worth a thousand words.” Image content is much more versatile compared with text, and the amount of visual data is already enormous and still expanding very rapidly. Hoping to cope with these special characteristics of visual data, content-based image retrieval methods have been introduced. It has been widely recognized that the family of image retrieval techniques should become an integration of both low-level visual features addressing the more detailed perceptual aspects and high-level semantic features underlying the more general conceptual aspects of visual data. Neither of these two types of features is sufficient to retrieve or manage visual data in an effective or efficient way (Smeulders, et al., 2000). Although efforts have been devoted to combining these two aspects of visual data, the gap between them is still a huge barrier in front of researchers. Intuitive and heuristic approaches do not provide us with satisfactory performance. Therefore, there is an urgent need of finding the latent correlation between low-level features and high-level concepts and merging them from a different perspective. How to find this new perspective and bridge the gap between visual features and semantic features has been a major challenge in this research field. Our chapter addresses these issues.


2011 ◽  
Vol 268-270 ◽  
pp. 1427-1432
Author(s):  
Chang Yong Ri ◽  
Min Yao

This paper presented the key problems to shorten “semantic gap” between low-level visual features and high-level semantic features to implement high-level semantic image retrieval. First, introduced ontology based semantic image description and semantic extraction methods based on machine learning. Then, illustrated image grammar on the high-level semantic image understanding and retrieval, and-or graph and context based methods of semantic image. Finally, we discussed the development directions and research emphases in this field.


2020 ◽  
Author(s):  
Joshua S. Rule ◽  
Maximilian Riesenhuber

AbstractHumans quickly learn new visual concepts from sparse data, sometimes just a single example. Decades of prior work have established the hierarchical organization of the ventral visual stream as key to this ability. Computational work has shown that networks which hierarchically pool afferents across scales and positions can achieve human-like object recognition performance and predict human neural activity. Prior computational work has also reused previously acquired features to efficiently learn novel recognition tasks. These approaches, however, require magnitudes of order more examples than human learners and only reuse intermediate features at the object level or below. None has attempted to reuse extremely high-level visual features capturing entire visual concepts. We used a benchmark deep learning model of object recognition to show that leveraging prior learning at the concept level leads to vastly improved abilities to learn from few examples. These results suggest computational techniques for learning even more efficiently as well as neuroscientific experiments to better understand how the brain learns from sparse data. Most importantly, however, the model architecture provides a biologically plausible way to learn new visual concepts from a small number of examples, and makes several novel predictions regarding the neural bases of concept representations in the brain.Author summaryWe are motivated by the observation that people regularly learn new visual concepts from as little as one or two examples, far better than, e.g., current machine vision architectures. To understand the human visual system’s superior visual concept learning abilities, we used an approach inspired by computational models of object recognition which: 1) use deep neural networks to achieve human-like performance and predict human brain activity; and 2) reuse previous learning to efficiently master new visual concepts. These models, however, require many times more examples than human learners and, critically, reuse only low-level and intermediate information. None has attempted to reuse extremely high-level visual features (i.e., entire visual concepts). We used a neural network model of object recognition to show that reusing concept-level features leads to vastly improved abilities to learn from few examples. Our findings suggest techniques for future software models that could learn even more efficiently, as well as neuroscience experiments to better understand how people learn so quickly. Most importantly, however, our model provides a biologically plausible way to learn new visual concepts from a small number of examples.


Author(s):  
Silvester Tena ◽  
Rudy Hartanto ◽  
Igi Ardiyanto

In <span>recent years, a great deal of research has been conducted in the area of fabric image retrieval, especially the identification and classification of visual features. One of the challenges associated with the domain of content-based image retrieval (CBIR) is the semantic gap between low-level visual features and high-level human perceptions. Generally, CBIR includes two main components, namely feature extraction and similarity measurement. Therefore, this research aims to determine the content-based image retrieval for fabric using feature extraction techniques grouped into traditional methods and convolutional neural networks (CNN). Traditional descriptors deal with low-level features, while CNN addresses the high-level, called semantic features. Traditional descriptors have the advantage of shorter computation time and reduced system requirements. Meanwhile, CNN descriptors, which handle high-level features tailored to human perceptions, deal with large amounts of data and require a great deal of computation time. In general, the features of a CNN's fully connected layers are used for matching query and database images. In several studies, the extracted features of the CNN's convolutional layer were used for image retrieval. At the end of the CNN layer, hash codes are added to reduce  </span>search time.


Sign in / Sign up

Export Citation Format

Share Document