Simple object recognition based on spatial relations and visual features represented using irregular pyramids

2011 ◽  
Vol 63 (3) ◽  
pp. 875-897 ◽  
Author(s):  
Annette Morales-González ◽  
Edel B. García-Reyes
2021 ◽  
Author(s):  
Maryam Nematollahi Arani

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.


2018 ◽  
Vol 18 (10) ◽  
pp. 414 ◽  
Author(s):  
Drew Linsley ◽  
Dan Shiebler ◽  
Sven Eberhardt ◽  
Andreas Karagounis ◽  
Thomas Serre

2016 ◽  
Vol 205 ◽  
pp. 382-392 ◽  
Author(s):  
Saeed Reza Kheradpisheh ◽  
Mohammad Ganjtabesh ◽  
Timothée Masquelier

Perception ◽  
1993 ◽  
Vol 22 (11) ◽  
pp. 1261-1270 ◽  
Author(s):  
John Duncan

Performance often suffers when two visual discriminations must be made concurrently (‘divided attention’). In the modular primate visual system, different cortical areas analyse different kinds of visual information. Especially important is a distinction between an occipitoparietal ‘where?’ system, analysing spatial relations, and an occipitotemporal ‘what?’ system responsible for object recognition. Though such visual subsystems are anatomically parallel, their functional relationship when ‘what?’ and ‘where?’ discriminations are made concurrently is unknown. In the present experiments, human subjects made concurrent discriminations concerning a brief visual display. Discriminations were either similar (two ‘what?’ or two ‘where?’ discriminations) or dissimilar (one of each), and concerned the same or different objects. When discriminations concerned different objects, there was strong interference between them. This was equally severe whether discriminations were similar—and therefore dependent on the same cortical system—or dissimilar. When concurrent ‘what?’ and ‘where?’ discriminations concerned the same object, however, all interference disappeared. Such results suggest that ‘what?’ and ‘where?’ systems are coordinated in visual attention: their separate outputs can be used simultaneously without cost, but only when they concern one object.


Sign in / Sign up

Export Citation Format

Share Document