scholarly journals Bag-of-Words Representation in Image Annotation: A Review

2012 ◽  
Vol 2012 ◽  
pp. 1-19 ◽  
Author(s):  
Chih-Fong Tsai

Content-based image retrieval (CBIR) systems require users to query images by their low-level visual content; this not only makes it hard for users to formulate queries, but also can lead to unsatisfied retrieval results. To this end, image annotation was proposed. The aim of image annotation is to automatically assign keywords to images, so image retrieval users are able to query images by keywords. Image annotation can be regarded as the image classification problem: that images are represented by some low-level features and some supervised learning techniques are used to learn the mapping between low-level features and high-level concepts (i.e., class labels). One of the most widely used feature representation methods is bag-of-words (BoW). This paper reviews related works based on the issues of improving and/or applying BoW for image annotation. Moreover, many recent works (from 2006 to 2012) are compared in terms of the methodology of BoW feature generation and experimental design. In addition, several different issues in using BoW are discussed, and some important issues for future research are discussed.

2021 ◽  
Author(s):  
Rui Zhang

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.


Author(s):  
Kalaivani Anbarasan ◽  
Chitrakala S.

The content based image retrieval system retrieves relevant images based on image features. The lack of performance in the content based image retrieval system is due to the semantic gap. Image annotation is a solution to bridge the semantic gap between low-level content features and high-level semantic concepts Image annotation is defined as tagging images with a single or multiple keywords based on low-level image features. The major issue in building an effective annotation framework is the integration of both low level visual features and high-level textual information into an annotation model. This chapter focus on new statistical-based image annotation model towards semantic based image retrieval system. A multi-label image annotation with multi-level tagging system is introduced to annotate image regions with class labels and extract color, location and topological tags of segmented image regions. The proposed method produced encouraging results and the experimental results outperformed state-of-the-art methods


2021 ◽  
Author(s):  
Rui Zhang

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.


2019 ◽  
Vol 128 (2) ◽  
pp. 261-318 ◽  
Author(s):  
Li Liu ◽  
Wanli Ouyang ◽  
Xiaogang Wang ◽  
Paul Fieguth ◽  
Jie Chen ◽  
...  

Abstract Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and have led to remarkable breakthroughs in the field of generic object detection. Given this period of rapid evolution, the goal of this paper is to provide a comprehensive survey of the recent achievements in this field brought about by deep learning techniques. More than 300 research contributions are included in this survey, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics. We finish the survey by identifying promising directions for future research.


Author(s):  
Xinge Zhu ◽  
Liang Li ◽  
Weigang Zhang ◽  
Tianrong Rao ◽  
Min Xu ◽  
...  

Visual emotion recognition aims to associate images with appropriate emotions. There are different visual stimuli that can affect human emotion from low-level to high-level, such as color, texture, part, object, etc. However, most existing methods treat different levels of features as independent entity without having effective method for feature fusion. In this paper, we propose a unified CNN-RNN model to predict the emotion based on the fused features from different levels by exploiting the dependency among them. Our proposed architecture leverages convolutional neural network (CNN) with multiple layers to extract different levels of features with in a multi-task learning framework, in which two related loss functions are introduced to learn the feature representation. Considering the dependencies within the low-level and high-level features, a new bidirectional recurrent neural network (RNN) is proposed to integrate the learned features from different layers in the CNN model. Extensive experiments on both Internet images and art photo datasets demonstrate that our method outperforms the state-of-the-art methods with at least 7% performance improvement.


Author(s):  
Rui Zhang ◽  
Ling Guan

With nearly twenty years of intensive study on the content-based image retrieval and annotation, the topic still remains difficult. By and large, the essential challenge lies in the limitation of using low-level visual features to characterize the semantic information of images, commonly known as the semantic gap. To bridge this gap, various approaches have been proposed based on the incorporation of human knowledge and textual information as well as the learning techniques utilizing the information of different modalities. At the same time, contextual information which represents the relationship between different real world/conceptual entities has shown its significance with respect to recognition tasks not only through real life experience but also scientific studies. In this chapter, the authors first review the state of the art of the existing works on image annotation and retrieval. Moreover, a general Bayesian framework which integrates content and contextual information and its application to both image annotation and retrieval are elaborated. The contextual information is considered as the statistical relationship between different images and different semantic concepts for image retrieval and annotation, respectively. The framework has efficient learning and classification procedures and the effectiveness is evaluated based on experimental studies, which demonstrate its advantage over both content-based and context-based approaches.


2011 ◽  
Vol 271-273 ◽  
pp. 1090-1095
Author(s):  
Yu Tang Guo ◽  
Chang Gang Han

Due to the existing of the semantic gap, images with the same or similar low level features are possibly different on semantic level. How to find the underlying relationship between the high-level semantic and low level features is one of the difficult problems for image annotation. In this paper, a new image annotation method based on graph spectral clustering with the consistency of semantics is proposed with detailed analysis on the advantages and disadvantages of the existed image annotation methods. The proposed method firstly cluster image into several semantic classes by semantic similarity measurement in the semantic subspace. Within each semantic class, images are re-clustered with visual features of region Then, the joint probability distribution of blobs and words was modeled by using Multiple-Bernoulli Relevance Model. We can annotate a unannotated image by using the joint distribution. Experimental results show the the effectiveness of the proposed approach in terms of quality of the image annotation. the consistency of high-level semantics and low level features is efficiently achieved.


Author(s):  
Er Aman ◽  
Amit Rawat ◽  
Ashwin Giri ◽  
Hardik Gothwal

Learning efficient options illustrations and equivalency metric measures are imperative to the searching performance of a content-based image retrieval (CBIR) machine. Despite in depth analysis efforts for many years, it remains one amongst the foremost difficult open issues that significantly hinders the success of real- world CBIR systems. The key issue has been associated to the commonly known “linguistic gap” problem that exists between low-level image pixels captured by machines and high-level linguistics ideas perceived by humans. Among varied techniques, machine learning has been actively investigated as a potential direction to bridge the linguistics gap in the long run. Motivated by recent success of deep learning techniques for computer vision and other applications, In this paper, we'll conceive to address an open problem: if deep learning could be a hope for bridging the linguistics gap in CBIR and the way a lot of enhancements in CBIR tasks may be achieved by exploring the progressive deep learning methodologies for learning options illustrations and equivalency measures. Speci?cally, we'll investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a progressive deep learning technique (Convolutional Neural Networks) for CBIR tasks in varied settings. From our empirical studies, we found some encouraging results and summarized some vital insights for future analysis. CBIR tasks may be achieved by exploring the progressive deep learning techniques for learning options illustrations and equivalency measures.


Sign in / Sign up

Export Citation Format

Share Document