visual concepts Latest Research Papers

Computational reconstruction of mental representations using human behavior

10.31234/osf.io/7fdvw ◽

2022 ◽

Author(s):

Laurent Caplette ◽

Nicholas Turk-Browne

Keyword(s):

Mental Representations ◽

Semantic Space ◽

Direct Access ◽

Visual Features ◽

Semantic Mapping ◽

Semantic Features ◽

Visual Concept ◽

Visual Concepts ◽

Computational Reconstruction ◽

High Level

Revealing the contents of mental representations is a longstanding goal of cognitive science. However, there is currently no general framework for providing direct access to representations of high-level visual concepts. We asked participants to indicate what they perceived in images synthesized from random visual features in a deep neural network. We then inferred a mapping between the semantic features of their responses and the visual features of the images. This allowed us to reconstruct the mental representation of virtually any common visual concept, both those reported and others extrapolated from the same semantic space. We successfully validated 270 of these reconstructions as containing the target concept in a separate group of participants. The visual-semantic mapping uncovered with our method further generalized to new stimuli, participants, and tasks. Finally, it allowed us to reveal how the representations of individual observers differ from each other and from those of neural networks.

Download Full-text

Interpreting and Comparing Convolutional Neural Networks: A Quantitative Approach

10.20944/preprints202101.0579.v2 ◽

2021 ◽

Author(s):

Mohammad Mohaiminul Islam ◽

Zahid Hassan Tushar

Keyword(s):

Neural Network ◽

Region Of Interest ◽

Input Image ◽

Black Box ◽

Discriminative Ability ◽

Highly Active ◽

Semantic Concepts ◽

Gradient Based ◽

Visual Concepts ◽

Hidden Neurons

A convolutional neural network (CNN) is sometimes understood as a black box in the sense that while it can approximate any function, studying its structure will not give us any insights into the nature of the function being approximated. In other terms, the discriminative ability does not reveal much about the latent representation of a network. This research aims to establish a framework for interpreting the CNNs by profiling them in terms of interpretable visual concepts and verifying them by means of Integrated Gradient. We also ask the question, "Do different input classes have a relationship or are they unrelated?" For instance, could there be an overlapping set of highly active neurons to identify different classes? Could there be a set of neurons that are useful for one input class whereas misleading for a different one? Intuition answers these questions positively, implying the existence of a structured set of neurons inclined to a particular class. Knowing this structure has significant values; it provides a principled way for identifying redundancies across the classes. Here the interpretability profiling has been done by evaluating the correspondence between individual hidden neurons and a set of human-understandable visual semantic concepts. We also propose an integrated gradient-based class-specific relevance mapping approach that takes the spatial position of the region of interest in the input image. Our relevance score verifies the interpretability scores in terms of neurons tuned to a particular concept/class. Further, we perform network ablation and measure the performance of the network based on our approach.

Download Full-text

A biologically inspired architecture with switching units can learn to generalize across backgrounds

10.1101/2021.11.08.467807 ◽

2021 ◽

Author(s):

Doris Voina ◽

Eric Shea-Brown ◽

Stefan Mihalas

Keyword(s):

Gaussian Noise ◽

Network Motif ◽

Switching Network ◽

Biologically Inspired ◽

Activation Patterns ◽

Future Studies ◽

Network Layers ◽

Visual Concepts ◽

Shallow And Deep Networks ◽

Continual Learning

Humans and other animals navigate different landscapes and environments with ease, a feat that requires the brain's ability to rapidly and accurately adapt to different visual domains, generalizing across contexts/backgrounds. Despite recent progress in deep learning applied to classification and detection in the presence of multiple confounds including contextual ones, there remain important challenges to address regarding how networks can perform context-dependent computations and how contextually-invariant visual concepts are formed. For instance, recent studies have shown artificial networks that repeatedly misclassified familiar objects set on new backgrounds, e.g. incorrectly labeling known animals when they appeared in a different setting. Here, we show how a bio-inspired network motif can explicitly address this issue. We do this using a novel dataset which can be used as a benchmark for future studies probing invariance to backgrounds. The dataset consists of MNIST digits of varying transparency, set on one of two backgrounds with different statistics: a Gaussian noise or a more naturalistic background from the CIFAR-10 dataset. We use this dataset to learn digit classification when contexts are shown sequentially, and find that both shallow and deep networks have sharply decreased performance when returning to the first background after experience learning the second -- the catastrophic forgetting phenomenon in continual learning. To overcome this, we propose an architecture with additional ``switching'' units that are activated in the presence of a new background. We find that the switching network can learn the new context even with very few switching units, while maintaining the performance in the previous context -- but that they must be recurrently connected to network layers. When the task is difficult due to high transparency, the switching network trained on both contexts outperforms networks without switching trained on only one context. The switching mechanism leads to sparser activation patterns, and we provide intuition for why this helps to solve the task. We compare our architecture with other prominent learning methods, and find that elastic weight consolidation is not successful in our setting, while progressive nets are more complex but less effective. Our study therefore shows how a bio-inspired architectural motif can contribute to task generalization across context.

Download Full-text

Ibiebe Alphabet and Ideograms as Motifs for Fabric Embellishment

Advances in Social Sciences Research Journal ◽

10.14738/assrj.810.11077 ◽

2021 ◽

Vol 8 (10) ◽

pp. 545-555

Author(s):

Margaret Ajiginni ◽

Bakare Olayinka Olumide

Keyword(s):

Cultural Heritage ◽

Aesthetic Theory ◽

Writing System ◽

Indigenous Culture ◽

Creative Potential ◽

Visual Concepts ◽

Cultural Dialogue ◽

Alexander Gottlieb Baumgarten ◽

The Aesthetic ◽

Visual History

The invented Bruce Onobrakpeya’s Ibiebe alphabet and ideogram (writing system) have not been explored maximally and redesigned as recurring motifs to embellish contemporary fabric. These are artistic codified graphical images that represent the visual translation of myths, legends, ideal concepts, and the philosophies of the Urhobo cultural heritage from Delta State. They are mostly explored in paintings and sculptural pieces for aesthetic and refinement purposes. Whereas, it is pertinent to encourage the integration of the creative potential of indigenous culture as visual concepts into contemporary works, since art is a potent medium for cultural dialogue. Therefore, this paper seeks to redesign the versatility and ingenuity embedded in Bruce Onobrakpeya’s formation as a recurring motif for fabric embellishment. It is essentially to provoke creativity, the development of knowledge, skills, in-studio experimentation/exploration, and the creation of new design possibilities with a diverse visual relationship. The Aesthetic theory propounded by Alexander Gottlieb Baumgarten (1714-1762) and the Modern Creativity theory by Kanematsu, H. and Barry, D. M. (2016) were adopted. The approach is exploratory and descriptive and relies on literal information. It will serve as an encyclopedia of redesigned motifs that cut across visual history.

Download Full-text

A computational framework for reconstructing mental representations of natural visual concepts

Journal of Vision ◽

10.1167/jov.21.9.2297 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2297

Author(s):

Laurent Caplette ◽

Nicholas B. Turk-Browne

Keyword(s):

Mental Representations ◽

Computational Framework ◽

Visual Concepts

Download Full-text

Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations

International Journal of Computer Vision ◽

10.1007/s11263-021-01498-0 ◽

2021 ◽

Author(s):

Max Losch ◽

Mario Fritz ◽

Bernt Schiele

Keyword(s):

Deep Learning ◽

Dimensionality Reduction ◽

High Performance ◽

State Of The Art ◽

Semantic Segmentation ◽

Learning Systems ◽

Visual Concepts ◽

Regular Network ◽

Work First

AbstractToday’s deep learning systems deliver high performance based on end-to-end training but are notoriously hard to inspect. We argue that there are at least two reasons making inspectability challenging: (i) representations are distributed across hundreds of channels and (ii) a unifying metric quantifying inspectability is lacking. In this paper, we address both issues by proposing Semantic Bottlenecks (SB), which can be integrated into pretrained networks, to align channel outputs with individual visual concepts and introduce the model agnostic Area Under inspectability Curve (AUiC) metric to measure the alignment. We present a case study on semantic segmentation to demonstrate that SBs improve the AUiC up to six-fold over regular network outputs. We explore two types of SB-layers in this work. First, concept-supervised SB-layers (SSB), which offer inspectability w.r.t. predefined concepts that the model is demanded to rely on. And second, unsupervised SBs (USB), which offer equally strong AUiC improvements by restricting distributedness of representations across channels. Importantly, for both SB types, we can recover state of the art segmentation performance across two different models despite a drastic dimensionality reduction from 1000s of non aligned channels to 10s of semantics-aligned channels that all downstream results are based on.

Download Full-text

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3447685 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-19

Author(s):

Fenglin Liu ◽

Xian Wu ◽

Shen Ge ◽

Xuancheng Ren ◽

Wei Fan ◽

...

Keyword(s):

Visual Information ◽

Language Modeling ◽

Systematic Analysis ◽

Visual Storytelling ◽

Fine Grained ◽

Latent Space ◽

Visual Concepts ◽

Representative Task ◽

Multimodal Attention ◽

Vision And Language

Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space . To tackle this problem, we propose DiMBERT (short for Di sentangled M ultimodal-Attention BERT ), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image–sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Di sentangled M ultimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts.

Download Full-text

User-Guided Global Explanations for Deep Image Recognition: A User Study

10.22541/au.162464860.09582229/v1 ◽

2021 ◽

Author(s):

Mandana Hamidi-Haines ◽

Zhongang Qi ◽

Alan Paul Fern ◽

Li Fuxin ◽

Prasad Tadepalli

Keyword(s):

Image Recognition ◽

Test Data ◽

Visual Recognition ◽

User Study ◽

Data Sets ◽

Data Set ◽

Significant Agreement ◽

Deep Networks ◽

Visual Concepts ◽

Deep Image

We study a user-guided approach for producing global explanations of deep networks for image recognition. The global explanations are produced with respect to a test data set and give the overall frequency of different “recognition reasons” across the data. Each reason corresponds to a small number of the most significant human-recognizable visual concepts used by the network. The key challenge is that the visual concepts cannot be predetermined and those concepts will often not correspond to existing vocabulary or have labelled data sets. We address this issue via an interactive-naming interface, which allows users to freely cluster significant image regions in the data into visually similar concepts. Our main contribution is a user study on two visual recognition tasks. The results show that the participants were able to produce a small number of visual concepts sufficient for explanation and that there was significant agreement among the concepts, and hence global explanations, produced by different participants.

Download Full-text

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

10.1109/cvpr46437.2021.00356 ◽

2021 ◽

Author(s):

Soravit Changpinyo ◽

Piyush Sharma ◽

Nan Ding ◽

Radu Soricut

Keyword(s):

Long Tail ◽

Visual Concepts

Download Full-text

A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

10.1109/cvpr46437.2021.00223 ◽

2021 ◽

Author(s):

Yunhao Ge ◽

Yao Xiao ◽

Zhi Xu ◽

Meng Zheng ◽

Srikrishna Karanam ◽

...

Keyword(s):

Neural Networks ◽

Visual Concepts

Download Full-text

visual concepts
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computational reconstruction of mental representations using human behavior

Interpreting and Comparing Convolutional Neural Networks: A Quantitative Approach

A biologically inspired architecture with switching units can learn to generalize across backgrounds

Ibiebe Alphabet and Ideograms as Motifs for Fabric Embellishment

A computational framework for reconstructing mental representations of natural visual concepts

Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

User-Guided Global Explanations for Deep Image Recognition: A User Study

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Export Citation Format

visual conceptsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Computational reconstruction of mental representations using human behavior

Interpreting and Comparing Convolutional Neural Networks: A Quantitative Approach

A biologically inspired architecture with switching units can learn to generalize across backgrounds

Ibiebe Alphabet and Ideograms as Motifs for Fabric Embellishment

A computational framework for reconstructing mental representations of natural visual concepts

Semantic Bottlenecks: Quantifying and Improving Inspectability of Deep Representations

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

User-Guided Global Explanations for Deep Image Recognition: A User Study

Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts

A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

visual concepts
Recently Published Documents