scholarly journals Toward an Effective Combination of multiple Visual Features for Semantic Image Annotation

2015 ◽  
Vol 15 (3) ◽  
pp. 533
Author(s):  
B. Minaoui ◽  
M. Oujaoura ◽  
M. Fakir ◽  
M. Sajieddine

In this paper we study the problem of combining low-level visual features for semantic image annotation. The problem is tackled with a two different approaches that combines texture, color and shape features via a Bayesian network classifier. In first approach, vector concatenation has been applied to combine the three low-level visual features. All three descriptors are normalized and merged into a unique vector used with single classifier. In the second approach, the three types of visual features are combined in parallel scheme via three classifiers. Each type of descriptors is used separately with single classifier. The experimental results show that the semantic image annotation accuracy is higher when the second approach is used.

2010 ◽  
Vol 22 (8) ◽  
pp. 1412-1420 ◽  
Author(s):  
Zhixin Li ◽  
Zhiping Shi ◽  
Xi Liu ◽  
Zhongzhi Shi

2021 ◽  
Author(s):  
Rui Zhang

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Jing Zhang ◽  
Da Li ◽  
Weiwei Hu ◽  
Zhihua Chen ◽  
Yubo Yuan

Due to the semantic gap between visual features and semantic concepts, automatic image annotation has become a difficult issue in computer vision recently. We propose a new image multilabel annotation method based on double-layer probabilistic latent semantic analysis (PLSA) in this paper. The new double-layer PLSA model is constructed to bridge the low-level visual features and high-level semantic concepts of images for effective image understanding. The low-level features of images are represented as visual words by Bag-of-Words model; latent semantic topics are obtained by the first layer PLSA from two aspects of visual and texture, respectively. Furthermore, we adopt the second layer PLSA to fuse the visual and texture latent semantic topics and achieve a top-layer latent semantic topic. By the double-layer PLSA, the relationships between visual features and semantic concepts of images are established, and we can predict the labels of new images by their low-level features. Experimental results demonstrate that our automatic image annotation model based on double-layer PLSA can achieve promising performance for labeling and outperform previous methods on standard Corel dataset.


2021 ◽  
Author(s):  
Rui Zhang

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.


2011 ◽  
Vol 32 (3) ◽  
pp. 516-523 ◽  
Author(s):  
Zhixin Li ◽  
Zhiping Shi ◽  
Xi Liu ◽  
Zhongzhi Shi

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yunjun Nam ◽  
Takayuki Sato ◽  
Go Uchida ◽  
Ekaterina Malakhova ◽  
Shimon Ullman ◽  
...  

AbstractHumans recognize individual faces regardless of variation in the facial view. The view-tuned face neurons in the inferior temporal (IT) cortex are regarded as the neural substrate for view-invariant face recognition. This study approximated visual features encoded by these neurons as combinations of local orientations and colors, originated from natural image fragments. The resultant features reproduced the preference of these neurons to particular facial views. We also found that faces of one identity were separable from the faces of other identities in a space where each axis represented one of these features. These results suggested that view-invariant face representation was established by combining view sensitive visual features. The face representation with these features suggested that, with respect to view-invariant face representation, the seemingly complex and deeply layered ventral visual pathway can be approximated via a shallow network, comprised of layers of low-level processing for local orientations and colors (V1/V2-level) and the layers which detect particular sets of low-level elements derived from natural image fragments (IT-level).


2021 ◽  
Author(s):  
Maryam Nematollahi Arani

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.


Sign in / Sign up

Export Citation Format

Share Document