Argo | ScienceGate

Robust Image Labeling Using Conditional Random Fields

10.32920/ryerson.14651541 ◽

2021 ◽

Author(s):

Maryam Nematollahi Arani

Keyword(s):

Object Recognition ◽

Mixture Models ◽

Random Fields ◽

Conditional Random Fields ◽

Semantic Gap ◽

Visual Features ◽

Image Labeling ◽

Low Level ◽

Robust Image ◽

High Level

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.

Download Full-text

Robust Image Labeling Using Conditional Random Fields

10.32920/ryerson.14651541.v1 ◽

2021 ◽

Author(s):

Maryam Nematollahi Arani

Keyword(s):

Object Recognition ◽

Mixture Models ◽

Random Fields ◽

Conditional Random Fields ◽

Semantic Gap ◽

Visual Features ◽

Image Labeling ◽

Low Level ◽

Robust Image ◽

High Level

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.

Download Full-text

Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Signal Processing Image Communication ◽

10.1016/j.image.2011.04.001 ◽

2011 ◽

Vol 26 (10) ◽

pp. 612-627 ◽

Cited By ~ 2

Author(s):

Hyun-seok Min ◽

Jae Young Choi ◽

Wesley De Neve ◽

Yong Man Ro

Keyword(s):

Video Clip ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level ◽

Duplicate Video

Download Full-text

Mining User Interests and Change Patterns in Microblog

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1959 ◽

2013 ◽

Vol 380-384 ◽

pp. 1959-1962

Author(s):

Dong Liu ◽

Quan Yuan Wu

Keyword(s):

Feature Vector ◽

Hamming Distance ◽

User Interest ◽

User Interests ◽

Series Of Experiments ◽

Share Information

Nowadays, more and more people use microblogs to share information. Consequently, mining microblog users behavior features is very valuable. In the paper, we propose a user interest mining framework. After data pre-processing, VSM is used to generate the feature vector of the tweet sets. Furthermore, k-bit binaries called interest hash-value and continuous interest hash-value are generated by use of Simhash algorithm. The user interests and change patterns could be mined by analyzing the hamming distance sequences between adjacent two hash-values. Taking Sina microblog as background, a series of experiments are done to prove the effectiveness of the algorithms.

Download Full-text

Feature-Specific Neural Reactivation during Episodic Memory

10.1101/622837 ◽

2019 ◽

Author(s):

Michael B. Bone ◽

Fahad Ahmad ◽

Bradley R. Buchsbaum

Keyword(s):

Frontal Cortex ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

Visual Hierarchy ◽

Strict Interpretation ◽

Novel Approach ◽

High Level ◽

Hierarchical Representations

AbstractWhen recalling an experience of the past, many of the component features of the original episode may be, to a greater or lesser extent, reconstructed in the mind’s eye. There is strong evidence that the pattern of neural activity that occurred during an initial perceptual experience is recreated during episodic recall (neural reactivation), and that the degree of reactivation is correlated with the subjective vividness of the memory. However, while we know that reactivation occurs during episodic recall, we have lacked a way of precisely characterizing the contents—in terms of its featural constituents—of a reactivated memory. Here we present a novel approach, feature-specific informational connectivity (FSIC), that leverages hierarchical representations of image stimuli derived from a deep convolutional neural network to decode neural reactivation in fMRI data collected while participants performed an episodic recall task. We show that neural reactivation associated with low-level visual features (e.g. edges), high-level visual features (e.g. facial features), and semantic features (e.g. “terrier”) occur throughout the dorsal and ventral visual streams and extend into the frontal cortex. Moreover, we show that reactivation of both low- and high-level visual features correlate with the vividness of the memory, whereas only reactivation of low-level features correlates with recognition accuracy when the lure and target images are semantically similar. In addition to demonstrating the utility of FSIC for mapping feature-specific reactivation, these findings resolve the relative contributions of low- and high-level features to the vividness of visual memories, clarify the role of the frontal cortex during episodic recall, and challenge a strict interpretation the posterior-to-anterior visual hierarchy.

Download Full-text

Fusing Low-Level Visual Features and High-Level Semantic Features for Breast Cancer Diagnosis in Digital Mammograms

2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE) ◽

10.1109/bibe50027.2020.00149 ◽

2020 ◽

Author(s):

George Apostolopoulos ◽

Athanasios Koutras ◽

Dionysios Anyfantis ◽

Ioanna Christoyianni ◽

Evaggelos Dermatas

Keyword(s):

Breast Cancer ◽

Cancer Diagnosis ◽

Breast Cancer Diagnosis ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level

Download Full-text

Distraction biases working memory for faces

10.31219/osf.io/qvez5 ◽

2019 ◽

Author(s):

Remington Mallett ◽

Anurima Mummaneni ◽

Jarrod Lewis-Peacock

Keyword(s):

Working Memory ◽

Irrelevant Information ◽

Visual Features ◽

Maintenance Period ◽

Stimulus Space ◽

Estimation Task ◽

Low Level ◽

The Face ◽

High Level ◽

Task Irrelevant

Working memory persists in the face of distraction, yet not without consequence. Previous research has shown that memory for low-level visual features is systematically influenced by the maintenance or presentation of a similar distractor stimulus. Responses are frequently biased in stimulus space towards a perceptual distractor, though this has yet to be determined for high-level stimuli. We investigated whether these influences are shared for complex visual stimuli such as faces. To quantify response accuracies for these stimuli, we used a delayed-estimation task with a computer-generated “face space” consisting of eighty faces that varied continuously as a function of age and sex. In a set of three experiments, we found that responses for a target face held in working memory were biased towards a distractor face presented during the maintenance period. The amount of response bias did not vary as a function of distance between target and distractor. Our data suggest that, similar to low-level visual features, high-level face representations in working memory are biased by the processing of related but task-irrelevant information.

Download Full-text

Rapid detection of social interactions is the result of domain general attentional processes

PLoS ONE ◽

10.1371/journal.pone.0258832 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0258832

Author(s):

Jonathan C. Flavell ◽

Harriet Over ◽

Tim Vestner ◽

Richard Cook ◽

Steven P. Tipper

Keyword(s):

Social Interactions ◽

Visual Features ◽

Search Performance ◽

Attentional Processes ◽

Low Level ◽

Attention Orienting ◽

Primary Mechanism ◽

Series Of Experiments ◽

Additional Role

Using visual search displays of interacting and non-interacting pairs, it has been demonstrated that detection of social interactions is facilitated. For example, two people facing each other are found faster than two people with their backs turned: an effect that may reflect social binding. However, recent work has shown the same effects with non-social arrow stimuli, where towards facing arrows are detected faster than away facing arrows. This latter work suggests a primary mechanism is an attention orienting process driven by basic low-level direction cues. However, evidence for lower level attentional processes does not preclude a potential additional role of higher-level social processes. Therefore, in this series of experiments we test this idea further by directly comparing basic visual features that orient attention with representations of socially interacting individuals. Results confirm the potency of orienting of attention via low-level visual features in the detection of interacting objects. In contrast, there is little evidence for the representation of social interactions influencing initial search performance.

Download Full-text

Visual features influence thought content in the absence of overt semantic information

10.31234/osf.io/ptxdr ◽

2019 ◽

Author(s):

Kathryn E Schertz ◽

Omid Kardan ◽

Marc Berman

Keyword(s):

Cognitive Processes ◽

Semantic Content ◽

Visual Features ◽

Edge Density ◽

Low Level ◽

Life Journey ◽

Edge Content ◽

High Level ◽

Scene Identification ◽

Selection Of

It has recently been shown that the perception of visual features of the environment can influence thought content. Both low-level (e.g., fractalness) and high-level (e.g., presence of water) visual features of the environment can influence thought content, in real-world and experimental settings where these features can make people more reflective and contemplative in their thoughts. It remains to be seen, however, if these visual features retain their influence on thoughts in the absence of overt semantic content, which could indicate a more fundamental mechanism for this effect. In this study, we removed this limitation, by creating scrambled edge versions of images, which maintain edge content from the original images but remove scene identification. Non-straight edge density is one visual feature which has been shown to influence many judgements about objects and landscapes, and has also been associated with thoughts of spirituality. We extend previous findings by showing that non-straight edges retain their influence on the selection of a “Spiritual & Life Journey” topic after scene identification removal. These results strengthen the implication of a causal role for the perception of low-level visual features on the influence of higher-order cognitive function, by demonstrating that in the absence of overt semantic content, low-level features, such as edges, influence cognitive processes.

Download Full-text

Combining visual features and contextual information for image retrieval and annotation

10.32920/ryerson.14649465.v1 ◽

2021 ◽

Author(s):

Rui Zhang

Keyword(s):

Image Retrieval ◽

Image Annotation ◽

Bayesian Framework ◽

Superior Performance ◽

Visual Features ◽

Feature Combination ◽

Low Level ◽

Combination Methods ◽

High Level ◽

Visual Domain

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.

Download Full-text