A document expansion framework for tag-based image retrieval

2018 ◽  
Vol 70 (1) ◽  
pp. 47-65 ◽  
Author(s):  
Wei Lu ◽  
Heng Ding ◽  
Jiepu Jiang

Purpose The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR). Design/methodology/approach The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate). Findings The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD. Research limitations/implications Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach. Originality/value Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.

2015 ◽  
Vol 6 (2) ◽  
pp. 25-40
Author(s):  
S. Sathiya Devi

In this paper, a simple image retrieval method incorporating relevance feedback based on the multiresolution enhanced orthogonal polynomials model is proposed. In the proposed method, the low level image features such as texture, shape and color are extracted from the reordered orthogonal polynomials model coefficients and linearly combined to form a multifeature set. Then the dimensionality of the multifeature set is reduced by utilizing multi objective Genetic Algorithm (GA) and multiclass binary Support Vector Machine (SVM). The obtained optimized multifeature set is used for image retrieval. In order to improve the retrieval accuracy and to bridge the semantic gap, a correlation based k-Nearest Neighbor (k-NN) method for relevance feedback is also proposed. In this method, an appropriate relevance score is computed for each image in the database based on relevant and non relevant set chosen by the user with correlation based k-NN method. The experiments are carried out with Corel and Caltech database images and the retrieval rates are computed. The proposed method with correlation based k-NN for relevance feedback gives an average retrieval rate of 94.67%.


2017 ◽  
Vol 35 (6) ◽  
pp. 1191-1214 ◽  
Author(s):  
Yanti Idaya Aspura M.K. ◽  
Shahrul Azman Mohd Noah

Purpose The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use of DBpedia to improve the comprehensiveness of the ontology to enhance semantic retrieval. Design/methodology/approach A multi-modality ontology-based approach was developed to integrate high-level concepts and low-level features, as well as integrate the ontology base with DBpedia to enrich the knowledge resource. A complete ontology model was also developed to represent the domain of sport news, with image caption keywords and image features. Precision and recall were used as metrics to evaluate the effectiveness of the multi-modality approach, and the outputs were compared with those obtained using a single-modality approach (i.e. textual ontology and visual ontology). Findings The results based on ten queries show a superior performance of the multi-modality ontology-based IMR system integrated with DBpedia in retrieving correct images in accordance with user queries. The system achieved 100 per cent precision for six of the queries and greater than 80 per cent precision for the other four queries. The text-based system only achieved 100 per cent precision for one query; all other queries yielded precision rates less than 0.500. Research limitations/implications This study only focused on BBC Sport News collection in the year 2009. Practical implications The paper includes implications for the development of ontology-based retrieval on image collection. Originality value This study demonstrates the strength of using a multi-modality ontology integrated with DBpedia for image retrieval to overcome the deficiencies of text-based and ontology-based systems. The result validates semantic text-based with multi-modality ontology and DBpedia as a useful model to reduce the semantic distance.


2019 ◽  
Vol 12 (3) ◽  
pp. 162-170 ◽  
Author(s):  
Thiriveedhi Yellamanda Srinivasa Rao ◽  
Pakanati Chenna Reddy

Background: This paper renders a classification and retrieval of image achievements in the search area of image retrieval, especially content-based image retrieval, an area that has been very active and successful in the past few years. Objective: Primarily the features extracted established on the bag of visual words (BOW) can be arranged by utilizing Scaling Invariant Feature Transform (SIFT) and developed K-Means clustering method. Methods: The texture is extracted for a developed multi-texton method by our study. Our retrieval process consists of two stages such as retrieval and classification. The images will be classified established on the features by applying k- Nearest Neighbor (kNN) algorithm. This will separate the images into various classes in order to develop the precision and recall rate initially. Results: After the classification of images, the similar images are retrieved from the relevant class as per the afforded query image.


Author(s):  
PAUL W. KWAN ◽  
KEISUKE KAMEYAMA ◽  
JUNBIN GAO ◽  
KAZUO TORAICHI

Content-based Image Retrieval (CBIR) has been an active area of research for retrieving similar images from large repositories, without the prerequisite of manual labeling. Most current CBIR algorithms can faithfully return a list of images that matches the visual perspective of their inventors, who might decide to use a certain combination of image features like edges, colors and textures of regions as well as their spatial distribution during processing. In practice, however, the retrieved images rarely correspond exactly to the results expected by the users, a problem that has come to be known as the semantic gap. In this paper, we propose a novel and extensible multidimensional approach called matrix of visual perspectives as a solution for addressing this semantic gap. Our approach exploits the dynamic cross-interaction (in other words, mix-and-match) of image features and similarity metrics to produce results that attempt to mimic the mental visual picture of the user. Experimental results on retrieving similar Japanese cultural heritage symbols called kamons by a prototype system confirm that the interaction of visual perspectives in the user can be effectively captured and reflected. The benefits of this approach are broader. They can be equally applicable to the development of CBIR systems for other types of images, whether cultural or noncultural, by adapting to different sets of application specific image features.


2020 ◽  
Vol 10 (3) ◽  
pp. 223-237
Author(s):  
Rafał Grycuk ◽  
Adam Wojciechowski ◽  
Wei Wei ◽  
Agnieszka Siwocha

AbstractContent-based image retrieval methods develop rapidly with a growing scale of image repositories. They are usually based on comparing and indexing some image features. We developed a new algorithm for finding objects in images by traversing their edges. Moreover, we describe the objects by histograms of local features and angles. We use such a description to retrieve similar images fast. We performed extensive experiments on three established image datasets proving the effectiveness of the proposed method.


2016 ◽  
Vol 2016 ◽  
pp. 1-12 ◽  
Author(s):  
Zahid Mehmood ◽  
Syed Muhammad Anwar ◽  
Nouman Ali ◽  
Hafiz Adnan Habib ◽  
Muhammad Rashid

Content-based image retrieval (CBIR) provides a sustainable solution to retrieve similar images from an image archive. In the last few years, the Bag-of-Visual-Words (BoVW) model gained attention and significantly improved the performance of image retrieval. In the standard BoVW model, an image is represented as an orderless global histogram of visual words by ignoring the spatial layout. The spatial layout of an image carries significant information that can enhance the performance of CBIR. In this paper, we are presenting a novel image representation that is based on a combination of local and global histograms of visual words. The global histogram of visual words is constructed over the whole image, while the local histogram of visual words is constructed over the local rectangular region of the image. The local histogram contains the spatial information about the salient objects. Extensive experiments and comparisons conducted on Corel-A, Caltech-256, and Ground Truth image datasets demonstrate that the proposed image representation increases the performance of image retrieval.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Chanattra Ammatmanee ◽  
Lu Gan

Purpose Due to the worldwide growth of digital image sharing and the maturity of the tourism industry, the vast and growing collections of digital images have become a challenge for those who use and/or manage these image data across tourism settings. To overcome the image indexing task with less labour cost and improve the image retrieval task with less human errors, the content-based image retrieval (CBIR) technique has been investigated for the tourism domain particularly. This paper aims to review the relevant literature in the field to understand these previous works and identify research gaps for future directions. Design/methodology/approach A systematic and comprehensive review of CBIR studies in tourism from the year 2010 to 2019, focussing on journal articles and conference proceedings in reputable online databases, is conducted by taking a comparative approach to critically analyse and address the trends of each fundamental element in these research experiments. Findings Based on the review of the literature, the trends of CBIR studies in tourism is to improve image representation and retrieval by advancing existing feature extraction techniques, contributing novel techniques in the feature extraction process through fine-tuning fusion features and improving image query of CBIR systems. Co-authorship, tourist attraction sector and fusion image features have been in focus. Nonetheless, the number of studies in other tourism sectors and available image databases could be further explored. Originality/value The fact that no existing academic review of CBIR studies in tourism makes this paper a novel contribution.


2021 ◽  
Vol 13 (24) ◽  
pp. 4965
Author(s):  
Qimin Cheng ◽  
Haiyan Huang ◽  
Lan Ye ◽  
Peng Fu ◽  
Deqiao Gan ◽  
...  

Conventional remote sensing image retrieval (RSIR) systems perform single-label retrieval with a single label to represent the most dominant semantic content for an image. Improved spatial resolution dramatically boosts the remote sensing image scene complexity, as a remote sensing image always contains multiple categories of surface features. In this case, a single label cannot comprehensively describe the semantic content of a complex remote sensing image scene and therefore results in poor retrieval performance in practical applications. As a result, researchers have begun to pay attention to multi-label image retrieval. However, in the era of massive remote sensing data, how to increase retrieval efficiency and reduce feature storage while preserving semantic information remains unsolved. Considering the powerful capability of hashing learning in overcoming the curse of dimensionality caused by high-dimensional image representation in Approximate Nearest Neighbor (ANN) search problems, we propose a new semantic-preserving deep hashing model for multi-label remote sensing image retrieval. Our model consists of three main components: (1) a convolutional neural network to extract image features; (2) a hash layer to generate binary codes; (3) a new loss function to better maintain the multi-label semantic information of hash learning contained in context remote sensing image scene. As far as we know, this is the first attempt to apply deep hashing into the multi-label remote sensing image retrieval. Experimental results indicate the effectiveness and promising of the introduction of hashing methods in the multi-label remote sensing image retrieval.


2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


Sign in / Sign up

Export Citation Format

Share Document