Combining visual features and contextual information for image retrieval and annotation

Mapping Intimacies ◽

10.32920/ryerson.14649465.v1 ◽

2021 ◽

Author(s):

Rui Zhang

Keyword(s):

Image Retrieval ◽

Image Annotation ◽

Bayesian Framework ◽

Superior Performance ◽

Visual Features ◽

Feature Combination ◽

Low Level ◽

Combination Methods ◽

High Level ◽

Visual Domain

This thesis is primarily focused on the information combination at different levels of a statistical pattern classification framework for image annotation and retrieval. Based on the previous study within the fields of image annotation and retrieval, it has been well-recognized that the low-level visual features, such as color and texture, and high-level features, such as textual description and context, are distinct yet complementary in terms of their distributions and the corresponding discriminative powers of dealing with machine-based recognition and retrieval tasks. Therefore, effective feature combination for image annotation and retrieval has become a desirable and promising perspective from which the semantic gap can be further bridged. Motivated by this fact, the combination of the visual and context modalities and that of different features in the visual domain are tackled by developing two statistical patterns classification approaches considering that the features of the visual modality and those across different modalities exhibit different degrees of heterogeneities, and thus, should be treated differently. Regarding the cross-modality feature combination, a Bayesian framework is proposed to integrate visual content and context, which has been applied to various image annotation and retrieval frameworks. In terms of the combination of different low-level features in the visual domain, the problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. To evaluate the proposed frameworks, many different datasets are employed in the experiments, including the COREL database for image retrieval and the MSRC, LabelMe, PASCAL VOC2009, and an animal image database collected by ourselves for image annotation. Using various evaluation criteria, the first framework is shown to be more effective than the methods purely based on the low-level features or high-level context. As for the second, the experimental results demonstrate not only its superior performance to other feature combination methods but also its ability to discover visual clusters using texture and color simultaneously. Moreover, a demo search engine based on the Bayesian framework is implemented and available online.

Download Full-text

Combining visual features and contextual information for image retrieval and annotation

10.32920/ryerson.14649465 ◽

2021 ◽

Author(s):

Rui Zhang

Keyword(s):

Image Retrieval ◽

Image Annotation ◽

Bayesian Framework ◽

Superior Performance ◽

Visual Features ◽

Feature Combination ◽

Low Level ◽

Combination Methods ◽

High Level ◽

Visual Domain

Download Full-text

Bag-of-Words Representation in Image Annotation: A Review

ISRN Artificial Intelligence ◽

10.5402/2012/376804 ◽

2012 ◽

Vol 2012 ◽

pp. 1-19 ◽

Cited By ~ 75

Author(s):

Chih-Fong Tsai

Keyword(s):

Image Retrieval ◽

Image Annotation ◽

Classification Problem ◽

Feature Representation ◽

Future Research ◽

Bag Of Words ◽

Low Level ◽

Learning Techniques ◽

Class Labels ◽

High Level

Content-based image retrieval (CBIR) systems require users to query images by their low-level visual content; this not only makes it hard for users to formulate queries, but also can lead to unsatisfied retrieval results. To this end, image annotation was proposed. The aim of image annotation is to automatically assign keywords to images, so image retrieval users are able to query images by keywords. Image annotation can be regarded as the image classification problem: that images are represented by some low-level features and some supervised learning techniques are used to learn the mapping between low-level features and high-level concepts (i.e., class labels). One of the most widely used feature representation methods is bag-of-words (BoW). This paper reviews related works based on the issues of improving and/or applying BoW for image annotation. Moreover, many recent works (from 2006 to 2012) are compared in terms of the methodology of BoW feature generation and experimental design. In addition, several different issues in using BoW are discussed, and some important issues for future research are discussed.

Download Full-text

A Multi-Label Image Annotation With Multi-Level Tagging System

Developments and Trends in Intelligent Technologies and Smart Systems - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-3686-4.ch003 ◽

2018 ◽

pp. 28-47

Author(s):

Kalaivani Anbarasan ◽

Chitrakala S.

Keyword(s):

Image Retrieval ◽

Retrieval System ◽

Image Annotation ◽

Image Features ◽

Semantic Gap ◽

Low Level ◽

Image Retrieval System ◽

Multi Level ◽

High Level ◽

Tagging System

The content based image retrieval system retrieves relevant images based on image features. The lack of performance in the content based image retrieval system is due to the semantic gap. Image annotation is a solution to bridge the semantic gap between low-level content features and high-level semantic concepts Image annotation is defined as tagging images with a single or multiple keywords based on low-level image features. The major issue in building an effective annotation framework is the integration of both low level visual features and high-level textual information into an annotation model. This chapter focus on new statistical-based image annotation model towards semantic based image retrieval system. A multi-label image annotation with multi-level tagging system is introduced to annotate image regions with class labels and extract color, location and topological tags of segmented image regions. The proposed method produced encouraging results and the experimental results outperformed state-of-the-art methods

Download Full-text

Bridging the Semantic Gap in Image Retrieval

Distributed Multimedia Databases ◽

10.4018/978-1-930708-29-7.ch002 ◽

2002 ◽

pp. 14-36 ◽

Cited By ~ 31

Author(s):

Rhong Zhao ◽

William I. Grosky

Keyword(s):

Information Retrieval ◽

Image Retrieval ◽

Research Field ◽

Visual Features ◽

Semantic Features ◽

Visual Data ◽

Image Content ◽

Low Level ◽

Significant Research ◽

High Level

The emergence of multimedia technology and the rapidly expanding image and video collections on the Internet have attracted significant research efforts in providing tools for effective retrieval and management of visual data. Image retrieval is based on the availability of a representation scheme of image content. Image content descriptors may be visual features such as color, texture, shape, and spatial relationships, or semantic primitives. Conventional information retrieval was based solely on text, and those approaches to textual information retrieval have been transplanted into image retrieval in a variety of ways. However, “a picture is worth a thousand words.” Image content is much more versatile compared with text, and the amount of visual data is already enormous and still expanding very rapidly. Hoping to cope with these special characteristics of visual data, content-based image retrieval methods have been introduced. It has been widely recognized that the family of image retrieval techniques should become an integration of both low-level visual features addressing the more detailed perceptual aspects and high-level semantic features underlying the more general conceptual aspects of visual data. Neither of these two types of features is sufficient to retrieve or manage visual data in an effective or efficient way (Smeulders, et al., 2000). Although efforts have been devoted to combining these two aspects of visual data, the gap between them is still a huge barrier in front of researchers. Intuitive and heuristic approaches do not provide us with satisfactory performance. Therefore, there is an urgent need of finding the latent correlation between low-level features and high-level concepts and merging them from a different perspective. How to find this new perspective and bridge the gap between visual features and semantic features has been a major challenge in this research field. Our chapter addresses these issues.

Download Full-text

Based on the Semantics of the Low-Level Visual Features Image Retrieval

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.482-484.512 ◽

2012 ◽

Vol 482-484 ◽

pp. 512-517

Author(s):

Xian Wen Zeng ◽

Xue Dong Shen

Keyword(s):

Image Retrieval ◽

Input Parameter ◽

Paper Analysis ◽

Image Features ◽

Visual Features ◽

Color Feature ◽

Semantic Image Retrieval ◽

Low Level ◽

High Level

This paper analysis the reasons that traditional CBIR can’t support based Semantic image retrieval, and gave a kind of method that Using SVM may solute it. Through studying and Classification, combining HSV Color feature as input parameter ,it realized the connection and map between the high-level semantics and low-level image features .Using this method to retrieve can have proved to get higher accuracy.

Download Full-text

Content-based image retrieval for fabric images: A survey

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v23.i3.pp1861-1872 ◽

2021 ◽

Vol 23 (3) ◽

pp. 1861

Author(s):

Silvester Tena ◽

Rudy Hartanto ◽

Igi Ardiyanto

Keyword(s):

Feature Extraction ◽

Image Retrieval ◽

Search Time ◽

Computation Time ◽

Content Based Image Retrieval ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

Human Perceptions ◽

High Level

In <span>recent years, a great deal of research has been conducted in the area of fabric image retrieval, especially the identification and classification of visual features. One of the challenges associated with the domain of content-based image retrieval (CBIR) is the semantic gap between low-level visual features and high-level human perceptions. Generally, CBIR includes two main components, namely feature extraction and similarity measurement. Therefore, this research aims to determine the content-based image retrieval for fabric using feature extraction techniques grouped into traditional methods and convolutional neural networks (CNN). Traditional descriptors deal with low-level features, while CNN addresses the high-level, called semantic features. Traditional descriptors have the advantage of shorter computation time and reduced system requirements. Meanwhile, CNN descriptors, which handle high-level features tailored to human perceptions, deal with large amounts of data and require a great deal of computation time. In general, the features of a CNN's fully connected layers are used for matching query and database images. In several studies, the extracted features of the CNN's convolutional layer were used for image retrieval. At the end of the CNN layer, hash codes are added to reduce </span>search time.

Download Full-text

Bimodal fusion of low-level visual features and high-level semantic features for near-duplicate video clip detection

Signal Processing Image Communication ◽

10.1016/j.image.2011.04.001 ◽

2011 ◽

Vol 26 (10) ◽

pp. 612-627 ◽

Cited By ~ 2

Author(s):

Hyun-seok Min ◽

Jae Young Choi ◽

Wesley De Neve ◽

Yong Man Ro

Keyword(s):

Video Clip ◽

Visual Features ◽

Semantic Features ◽

Low Level ◽

High Level ◽

Duplicate Video

Download Full-text

Robust Image Labeling Using Conditional Random Fields

10.32920/ryerson.14651541 ◽

2021 ◽

Author(s):

Maryam Nematollahi Arani

Keyword(s):

Object Recognition ◽

Mixture Models ◽

Random Fields ◽

Conditional Random Fields ◽

Semantic Gap ◽

Visual Features ◽

Image Labeling ◽

Low Level ◽

Robust Image ◽

High Level

Object recognition has become a central topic in computer vision applications such as image search, robotics and vehicle safety systems. However, it is a challenging task due to the limited discriminative power of low-level visual features in describing the considerably diverse range of high-level visual semantics of objects. Semantic gap between low-level visual features and high-level concepts are a bottleneck in most systems. New content analysis models need to be developed to bridge the semantic gap. In this thesis, algorithms based on conditional random fields (CRF) from the class of probabilistic graphical models are developed to tackle the problem of multiclass image labeling for object recognition. Image labeling assigns a specific semantic category from a predefined set of object classes to each pixel in the image. By well capturing spatial interactions of visual concepts, CRF modeling has proved to be a successful tool for image labeling. This thesis proposes novel approaches to empowering the CRF modeling for robust image labeling. Our primary contributions are twofold. To better represent feature distributions of CRF potentials, new feature functions based on generalized Gaussian mixture models (GGMM) are designed and their efficacy is investigated. Due to its shape parameter, GGMM can provide a proper fit to multi-modal and skewed distribution of data in nature images. The new model proves more successful than Gaussian and Laplacian mixture models. It also outperforms a deep neural network model on Corel imageset by 1% accuracy. Further in this thesis, we apply scene level contextual information to integrate global visual semantics of the image with pixel-wise dense inference of fully-connected CRF to preserve small objects of foreground classes and to make dense inference robust to initial misclassifications of the unary classifier. Proposed inference algorithm factorizes the joint probability of labeling configuration and image scene type to obtain prediction update equations for labeling individual image pixels and also the overall scene type of the image. The proposed context-based dense CRF model outperforms conventional dense CRF model by about 2% in terms of labeling accuracy on MSRC imageset and by 4% on SIFT Flow imageset. Also, the proposed model obtains the highest scene classification rate of 86% on MSRC dataset.

Download Full-text

Fuzzy Model for Human Color Perception and Its Application in E-Commerce

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488516400109 ◽

2016 ◽

Vol 24 (Suppl. 2) ◽

pp. 47-70 ◽

Cited By ~ 4

Author(s):

Pakizar Shamoi ◽

Atsushi Inoue ◽

Hiroharu Kawanaka

Keyword(s):

Image Retrieval ◽

Color Perception ◽

Fuzzy Model ◽

Visual Features ◽

Perceptual Categorization ◽

Purchasing Behavior ◽

User Query ◽

Hsi Space ◽

Underlying Mechanisms ◽

High Level

Although image retrieval for e-commerce field has a huge commercial potential, e-commerce oriented content-based image retrieval is still very raw. Modern online shopping systems have certain limitations. In particular, they use conventional tag-based retrieval and lack making use of visual content. The paper presents a methodology to retrieve images of shopping items based on fuzzy dominant colors. People regard color as an aesthetic issue, especially when it comes to choosing the colors of their clothing, apartment design and other objects around. No doubt, color inuences purchasing behavior — to a certain extent, it is a reection of human's likes and dislikes. The fuzzy color model that we are proposing represents the collection of fuzzy sets, providing the conceptual quantization of crisp HSI space having soft boundaries. The proposed method has two parts: assigning a fuzzy colorimetric profile to the image and processing the user query. We also use underlying mechanisms of attention from a theory of visual attention, like perceptual categorization. Subjectivity and sensitivity of humans in color perception and bridging the semantic gap between low-level color visual features and high-level concepts are major issues that we plan to tackle in this research.

Download Full-text

Semantic text-based image retrieval with multi-modality ontology and DBpedia

The Electronic Library ◽

10.1108/el-06-2016-0127 ◽

2017 ◽

Vol 35 (6) ◽

pp. 1191-1214 ◽

Cited By ~ 2

Author(s):

Yanti Idaya Aspura M.K. ◽

Shahrul Azman Mohd Noah

Keyword(s):

Image Retrieval ◽

Semantic Distance ◽

Image Features ◽

Superior Performance ◽

Semantic Retrieval ◽

Content Type ◽

Modality Approach ◽

High Level ◽

Image Collection ◽

Practical Implications

Purpose The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use of DBpedia to improve the comprehensiveness of the ontology to enhance semantic retrieval. Design/methodology/approach A multi-modality ontology-based approach was developed to integrate high-level concepts and low-level features, as well as integrate the ontology base with DBpedia to enrich the knowledge resource. A complete ontology model was also developed to represent the domain of sport news, with image caption keywords and image features. Precision and recall were used as metrics to evaluate the effectiveness of the multi-modality approach, and the outputs were compared with those obtained using a single-modality approach (i.e. textual ontology and visual ontology). Findings The results based on ten queries show a superior performance of the multi-modality ontology-based IMR system integrated with DBpedia in retrieving correct images in accordance with user queries. The system achieved 100 per cent precision for six of the queries and greater than 80 per cent precision for the other four queries. The text-based system only achieved 100 per cent precision for one query; all other queries yielded precision rates less than 0.500. Research limitations/implications This study only focused on BBC Sport News collection in the year 2009. Practical implications The paper includes implications for the development of ontology-based retrieval on image collection. Originality value This study demonstrates the strength of using a multi-modality ontology integrated with DBpedia for image retrieval to overcome the deficiencies of text-based and ontology-based systems. The result validates semantic text-based with multi-modality ontology and DBpedia as a useful model to reduce the semantic distance.

Download Full-text