scholarly journals Revisiting Low-Resolution Images Retrieval with Attention Mechanism and Contrastive Learning

2021 ◽  
Vol 11 (15) ◽  
pp. 6783
Author(s):  
Thanh-Vu Dang ◽  
Gwang-Hyun Yu ◽  
Jin-Young Kim

Recent empirical works reveal that visual representation learned by deep neural networks can be successfully used as descriptors for image retrieval. A common technique is to leverage pre-trained models to learn visual descriptors by ranking losses and fine-tuning with labeled data. However, retrieval systems’ performance significantly decreases when querying images of lower resolution than the training images. This study considered a contrastive learning framework fine-tuned on features extracted from a pre-trained neural network encoder equipped with an attention mechanism to address the image retrieval task for low-resolution image retrieval. Our method is simple yet effective since the contrastive learning framework drives similar samples close to each other in feature space by manipulating variants of their augmentations. To benchmark the proposed framework, we conducted quantitative and qualitative analyses of CARS196 (mAP = 0.8804), CUB200-2011 (mAP = 0.9379), and Stanford Online Products datasets (mAP = 0.9141) and analyzed their performances.


2021 ◽  
Vol 12 (2) ◽  
Author(s):  
João V. O. Novaes ◽  
Lúcio F. D. Santos ◽  
Luiz Olmes Carvalho ◽  
Daniel De Oliveira ◽  
Marcos V. N. Bedo ◽  
...  

Similarity searches can be modeled by means of distances following the Metric Spaces Theory and constitute a fast and explainable query mechanism behind content-based image retrieval (CBIR) tasks. However, classical distance-based queries, e.g., Range and k-Nearest Neighbors, may be unsuitable for exploring large datasets because the retrieved elements are often similar among themselves. Although similarity searching is enriched with the imposition of rules to foster result diversification, the fine-tuning of the diversity query is still an open issue, which is is usually carried out with and a non-optimal expensive computational inspection. This paper introduces J-EDA, a practical workbench implemented in Java that supports the tuning of similarity and diversity search parameters by enabling the automatic and parallel exploration of multiple search settings regarding a user-posed content-based image retrieval task. J-EDA implements a wide variety of classical and diversity-driven search queries, as well as many CBIR settings such as feature extractors for images, distance functions, and relevance feedback techniques. Accordingly, users can define multiple query settings and inspect their performances for spotting the most suitable parameterization for a content-based image retrieval problem at hand. The workbench reports the experimental performances with several internal and external evaluation metrics such as P × R and Mean Average Precision (mAP), which are calculated towards either incremental or batch procedures performed with or without human interaction.



2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Chanattra Ammatmanee ◽  
Lu Gan

Purpose Due to the worldwide growth of digital image sharing and the maturity of the tourism industry, the vast and growing collections of digital images have become a challenge for those who use and/or manage these image data across tourism settings. To overcome the image indexing task with less labour cost and improve the image retrieval task with less human errors, the content-based image retrieval (CBIR) technique has been investigated for the tourism domain particularly. This paper aims to review the relevant literature in the field to understand these previous works and identify research gaps for future directions. Design/methodology/approach A systematic and comprehensive review of CBIR studies in tourism from the year 2010 to 2019, focussing on journal articles and conference proceedings in reputable online databases, is conducted by taking a comparative approach to critically analyse and address the trends of each fundamental element in these research experiments. Findings Based on the review of the literature, the trends of CBIR studies in tourism is to improve image representation and retrieval by advancing existing feature extraction techniques, contributing novel techniques in the feature extraction process through fine-tuning fusion features and improving image query of CBIR systems. Co-authorship, tourist attraction sector and fusion image features have been in focus. Nonetheless, the number of studies in other tourism sectors and available image databases could be further explored. Originality/value The fact that no existing academic review of CBIR studies in tourism makes this paper a novel contribution.



2021 ◽  
Vol 11 (3) ◽  
pp. 1242
Author(s):  
So-Mi Cha ◽  
Seung-Seok Lee ◽  
Bonggyun Ko

Pneumonia is a form of acute respiratory infection commonly caused by germs, viruses, and fungi, and can prove fatal at any age. Chest X-rays is the most common technique for diagnosing pneumonia. There have been several attempts to apply transfer learning based on a Convolutional Neural Network to build a stable model in computer-aided diagnosis. Recently, with the appearance of an attention mechanism that automatically focuses on the critical part of the image that is crucial for the diagnosis of disease, it is possible to increase the performance of previous models. The goal of this study is to improve the accuracy of a computer-aided diagnostic approach that medical professionals can easily use as an auxiliary tool. In this paper, we proposed the attention-based transfer learning framework for efficient pneumonia detection in chest X-ray images. We collected features from three-types of pre-trained models, ResNet152, DenseNet121, ResNet18 as a role of feature extractor. We redefined the classifier for a new task and applied the attention mechanism as a feature selector. As a result, the proposed approach achieved accuracy, F-score, Area Under the Curve(AUC), precision and recall of 96.63%, 0.973, 96.03%, 96.23% and 98.46%, respectively.



2020 ◽  
Vol 34 (07) ◽  
pp. 12589-12596 ◽  
Author(s):  
Fan Yang ◽  
Zheng Wang ◽  
Jing Xiao ◽  
Shin'ichi Satoh

Most recent approaches for the zero-shot cross-modal image retrieval map images from different modalities into a uniform feature space to exploit their relevance by using a pre-trained model. Based on the observation that manifolds of zero-shot images are usually deformed and incomplete, we argue that the manifolds of unseen classes are inevitably distorted during the training of a two-stream model that simply maps images from different modalities into a uniform space. This issue directly leads to poor cross-modal retrieval performance. We propose a bi-directional random walk scheme to mining more reliable relationships between images by traversing heterogeneous manifolds in the feature space of each modality. Our proposed method benefits from intra-modal distributions to alleviate the interference caused by noisy similarities in the cross-modal feature space. As a result, we achieved great improvement in the performance of the thermal v.s. visible image retrieval task. The code of this paper: https://github.com/fyang93/cross-modal-retrieval



Author(s):  
Andrea Tagarelli ◽  
Andrea Simeri

AbstractModeling law search and retrieval as prediction problems has recently emerged as a predominant approach in law intelligence. Focusing on the law article retrieval task, we present a deep learning framework named LamBERTa, which is designed for civil-law codes, and specifically trained on the Italian civil code. To our knowledge, this is the first study proposing an advanced approach to law article prediction for the Italian legal system based on a BERT (Bidirectional Encoder Representations from Transformers) learning framework, which has recently attracted increased attention among deep learning approaches, showing outstanding effectiveness in several natural language processing and learning tasks. We define LamBERTa models by fine-tuning an Italian pre-trained BERT on the Italian civil code or its portions, for law article retrieval as a classification task. One key aspect of our LamBERTa framework is that we conceived it to address an extreme classification scenario, which is characterized by a high number of classes, the few-shot learning problem, and the lack of test query benchmarks for Italian legal prediction tasks. To solve such issues, we define different methods for the unsupervised labeling of the law articles, which can in principle be applied to any law article code system. We provide insights into the explainability and interpretability of our LamBERTa models, and we present an extensive experimental analysis over query sets of different type, for single-label as well as multi-label evaluation tasks. Empirical evidence has shown the effectiveness of LamBERTa, and also its superiority against widely used deep-learning text classifiers and a few-shot learner conceived for an attribute-aware prediction task.



2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.









1998 ◽  
Author(s):  
Aki Kobayashi ◽  
Toshiyuki Yoshida ◽  
Yoshinori Sakai


Sign in / Sign up

Export Citation Format

Share Document