scholarly journals Robust Image Labeling Using Conditional Random Fields

2021 ◽  
Author(s):  
Maryam Nematollahi Arani

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.

2021 ◽  
Author(s):  
Maryam Nematollahi Arani

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.


Author(s):  
Xin-Jing Wang ◽  
Mo Yu ◽  
Lei Zhang ◽  
Wei-Ying Ma

In this chapter, we introduce the Argo system which provides intelligent advertising made possible from user generated photos. Based on the intuition that user-generated photos imply user interests which are the key for profitable targeted ads, Argo attempts to learn a user’s profile from his shared photos and suggests relevant ads accordingly. To learn a user interest, in an offline step, a hierarchical and efficient topic space is constructed based on the ODP ontology, which is used later on for bridging the vocabulary gap between ads and photos as well as reducing the effect of noisy photo tags. In the online stage, the process of Argo contains three steps: 1) understanding the content and semantics of a user’s photos and auto-tagging each photo to supplement user-submitted tags (such tags may not be available); 2) learning the user interest given a set of photos based on the learnt hierarchical topic space; and 3) representing ads in the topic space and matching their topic distributions with the target user interest; the top ranked ads are output as the suggested ads. Two key challenges are tackled during the process: 1) the semantic gap between the low-level image visual features and the high-level user semantics; and 2) the vocabulary impedance between photos and ads. We conducted a series of experiments based on real Flickr users and Amazon.com products (as candidate ads), which show the effectiveness of the proposed approach.


2016 ◽  
Vol 2 (1) ◽  
pp. 475-478
Author(s):  
Nico Hoffmann ◽  
Edmund Koch ◽  
Uwe Petersohn ◽  
Matthias Kirsch ◽  
Gerald Steiner

AbstractIntraoperative thermal neuroimaging is a novel intraoperative imaging technique for the characterization of perfusion disorders, neural activity and other pathological changes of the brain. It bases on the correlation of (sub-)cortical metabolism and perfusion with the emitted heat of the cortical surface. In order to minimize required computational resources and prevent unwanted artefacts in subsequent data analysis workflows foreground detection is a important preprocessing technique to differentiate pixels representing the cerebral cortex from background objects. We propose an efficient classification framework that integrates characteristic dynamic thermal behaviour into this classification task to include additional discriminative features. The first stage of our framework consists of learning this representation of characteristic thermal time-frequency behaviour. This representation models latent interconnections in the time-frequency domain that cover specific, yet a priori unknown, thermal properties of the cortex. In a second stage these features are then used to classify each pixel’s state with conditional random fields. We quantitatively evaluate several approaches to learning high-level features and their impact to the overall prediction accuracy. The introduction of high-level features leads to a significant accuracy improvement compared to a baseline classifier.


2019 ◽  
Vol 168 ◽  
pp. 100-108 ◽  
Author(s):  
Jose-Raul Ruiz-Sarmiento ◽  
Cipriano Galindo ◽  
Javier Monroy ◽  
Francisco-Angel Moreno ◽  
Javier Gonzalez-Jimenez

2019 ◽  
Author(s):  
Michael B. Bone ◽  
Fahad Ahmad ◽  
Bradley R. Buchsbaum

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.


2019 ◽  
Author(s):  
Remington Mallett ◽  
Anurima Mummaneni ◽  
Jarrod Lewis-Peacock

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.


2019 ◽  
Author(s):  
Kathryn E Schertz ◽  
Omid Kardan ◽  
Marc Berman

It has recently been shown that the perception of visual features of the environment can influence thought content. Both low-level (e.g., fractalness) and high-level (e.g., presence of water) visual features of the environment can influence thought content, in real-world and experimental settings where these features can make people more reflective and contemplative in their thoughts. It remains to be seen, however, if these visual features retain their influence on thoughts in the absence of overt semantic content, which could indicate a more fundamental mechanism for this effect. In this study, we removed this limitation, by creating scrambled edge versions of images, which maintain edge content from the original images but remove scene identification. Non-straight edge density is one visual feature which has been shown to influence many judgements about objects and landscapes, and has also been associated with thoughts of spirituality. We extend previous findings by showing that non-straight edges retain their influence on the selection of a “Spiritual & Life Journey” topic after scene identification removal. These results strengthen the implication of a causal role for the perception of low-level visual features on the influence of higher-order cognitive function, by demonstrating that in the absence of overt semantic content, low-level features, such as edges, influence cognitive processes.


Sign in / Sign up

Export Citation Format

Share Document