visual concepts
Recently Published Documents


TOTAL DOCUMENTS

146
(FIVE YEARS 49)

H-INDEX

14
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Laurent Caplette ◽  
Nicholas Turk-Browne

Revealing the contents of mental representations is a longstanding goal of cognitive science. However, there is currently no general framework for providing direct access to representations of high-level visual concepts. We asked participants to indicate what they perceived in images synthesized from random visual features in a deep neural network. We then inferred a mapping between the semantic features of their responses and the visual features of the images. This allowed us to reconstruct the mental representation of virtually any common visual concept, both those reported and others extrapolated from the same semantic space. We successfully validated 270 of these reconstructions as containing the target concept in a separate group of participants. The visual-semantic mapping uncovered with our method further generalized to new stimuli, participants, and tasks. Finally, it allowed us to reveal how the representations of individual observers differ from each other and from those of neural networks.


Author(s):  
Mohammad Mohaiminul Islam ◽  
Zahid Hassan Tushar

A convolutional neural network (CNN) is sometimes understood as a black box in the sense that while it can approximate any function, studying its structure will not give us any insights into the nature of the function being approximated. In other terms, the discriminative ability does not reveal much about the latent representation of a network. This research aims to establish a framework for interpreting the CNNs by profiling them in terms of interpretable visual concepts and verifying them by means of Integrated Gradient. We also ask the question, "Do different input classes have a relationship or are they unrelated?" For instance, could there be an overlapping set of highly active neurons to identify different classes? Could there be a set of neurons that are useful for one input class whereas misleading for a different one? Intuition answers these questions positively, implying the existence of a structured set of neurons inclined to a particular class. Knowing this structure has significant values; it provides a principled way for identifying redundancies across the classes. Here the interpretability profiling has been done by evaluating the correspondence between individual hidden neurons and a set of human-understandable visual semantic concepts. We also propose an integrated gradient-based class-specific relevance mapping approach that takes the spatial position of the region of interest in the input image. Our relevance score verifies the interpretability scores in terms of neurons tuned to a particular concept/class. Further, we perform network ablation and measure the performance of the network based on our approach.


2021 ◽  
Author(s):  
Doris Voina ◽  
Eric Shea-Brown ◽  
Stefan Mihalas

Humans and other animals navigate different landscapes and environments with ease, a feat that requires the brain's ability to rapidly and accurately adapt to different visual domains, generalizing across contexts/backgrounds. Despite recent progress in deep learning applied to classification and detection in the presence of multiple confounds including contextual ones, there remain important challenges to address regarding how networks can perform context-dependent computations and how contextually-invariant visual concepts are formed. For instance, recent studies have shown artificial networks that repeatedly misclassified familiar objects set on new backgrounds, e.g. incorrectly labeling known animals when they appeared in a different setting. Here, we show how a bio-inspired network motif can explicitly address this issue. We do this using a novel dataset which can be used as a benchmark for future studies probing invariance to backgrounds. The dataset consists of MNIST digits of varying transparency, set on one of two backgrounds with different statistics: a Gaussian noise or a more naturalistic background from the CIFAR-10 dataset. We use this dataset to learn digit classification when contexts are shown sequentially, and find that both shallow and deep networks have sharply decreased performance when returning to the first background after experience learning the second -- the catastrophic forgetting phenomenon in continual learning. To overcome this, we propose an architecture with additional ``switching'' units that are activated in the presence of a new background. We find that the switching network can learn the new context even with very few switching units, while maintaining the performance in the previous context -- but that they must be recurrently connected to network layers. When the task is difficult due to high transparency, the switching network trained on both contexts outperforms networks without switching trained on only one context. The switching mechanism leads to sparser activation patterns, and we provide intuition for why this helps to solve the task. We compare our architecture with other prominent learning methods, and find that elastic weight consolidation is not successful in our setting, while progressive nets are more complex but less effective. Our study therefore shows how a bio-inspired architectural motif can contribute to task generalization across context.


2021 ◽  
Vol 8 (10) ◽  
pp. 545-555
Author(s):  
Margaret Ajiginni ◽  
Bakare Olayinka Olumide

The invented Bruce Onobrakpeya’s Ibiebe alphabet and ideogram (writing system) have not been explored maximally and redesigned as recurring motifs to embellish contemporary fabric. These are artistic codified graphical images that represent the visual translation of myths, legends, ideal concepts, and the philosophies of the Urhobo cultural heritage from Delta State. They are mostly explored in paintings and sculptural pieces for aesthetic and refinement purposes. Whereas, it is pertinent to encourage the integration of the creative potential of indigenous culture as visual concepts into contemporary works, since art is a potent medium for cultural dialogue. Therefore, this paper seeks to redesign the versatility and ingenuity embedded in Bruce Onobrakpeya’s formation as a recurring motif for fabric embellishment. It is essentially to provoke creativity, the development of knowledge, skills, in-studio experimentation/exploration, and the creation of new design possibilities with a diverse visual relationship. The Aesthetic theory propounded by Alexander Gottlieb Baumgarten (1714-1762) and the Modern Creativity theory by Kanematsu, H. and Barry, D. M. (2016) were adopted. The approach is exploratory and descriptive and relies on literal information. It will serve as an encyclopedia of redesigned motifs that cut across visual history.


Author(s):  
Max Losch ◽  
Mario Fritz ◽  
Bernt Schiele

AbstractToday’s deep learning systems deliver high performance based on end-to-end training but are notoriously hard to inspect. We argue that there are at least two reasons making inspectability challenging: (i) representations are distributed across hundreds of channels and (ii) a unifying metric quantifying inspectability is lacking. In this paper, we address both issues by proposing Semantic Bottlenecks (SB), which can be integrated into pretrained networks, to align channel outputs with individual visual concepts and introduce the model agnostic Area Under inspectability Curve (AUiC) metric to measure the alignment. We present a case study on semantic segmentation to demonstrate that SBs improve the AUiC up to six-fold over regular network outputs. We explore two types of SB-layers in this work. First, concept-supervised SB-layers (SSB), which offer inspectability w.r.t. predefined concepts that the model is demanded to rely on. And second, unsupervised SBs (USB), which offer equally strong AUiC improvements by restricting distributedness of representations across channels. Importantly, for both SB types, we can recover state of the art segmentation performance across two different models despite a drastic dimensionality reduction from 1000s of non aligned channels to 10s of semantics-aligned channels that all downstream results are based on.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-19
Author(s):  
Fenglin Liu ◽  
Xian Wu ◽  
Shen Ge ◽  
Xuancheng Ren ◽  
Wei Fan ◽  
...  

Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space . To tackle this problem, we propose DiMBERT (short for Di sentangled M ultimodal-Attention BERT ), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image–sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Di sentangled M ultimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts.


Author(s):  
Mandana Hamidi-Haines ◽  
Zhongang Qi ◽  
Alan Paul Fern ◽  
Li Fuxin ◽  
Prasad Tadepalli

We study a user-guided approach for producing global explanations of deep networks for image recognition. The global explanations are produced with respect to a test data set and give the overall frequency of different “recognition reasons” across the data. Each reason corresponds to a small number of the most significant human-recognizable visual concepts used by the network. The key challenge is that the visual concepts cannot be predetermined and those concepts will often not correspond to existing vocabulary or have labelled data sets. We address this issue via an interactive-naming interface, which allows users to freely cluster significant image regions in the data into visually similar concepts. Our main contribution is a user study on two visual recognition tasks. The results show that the participants were able to produce a small number of visual concepts sufficient for explanation and that there was significant agreement among the concepts, and hence global explanations, produced by different participants.


2021 ◽  
Author(s):  
Soravit Changpinyo ◽  
Piyush Sharma ◽  
Nan Ding ◽  
Radu Soricut
Keyword(s):  

2021 ◽  
Author(s):  
Yunhao Ge ◽  
Yao Xiao ◽  
Zhi Xu ◽  
Meng Zheng ◽  
Srikrishna Karanam ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document