scholarly journals Connecting Images through Sources: Exploring Low-Data, Heterogeneous Instance Retrieval

2021 ◽  
Vol 13 (16) ◽  
pp. 3080
Author(s):  
Dimitri Gominski ◽  
Valérie Gouet-Brunet ◽  
Liming Chen

Along with a new volume of images containing valuable information about our past, the digitization of historical territorial imagery has brought the challenge of understanding and interconnecting collections with unique or rare representation characteristics, and sparse metadata. Content-based image retrieval offers a promising solution in this context, by building links in the data without relying on human supervision. However, while the latest propositions in deep learning have shown impressive results in applications linked to feature learning, they often rely on the hypothesis that there exists a training dataset matching the use case. Increasing generalization and robustness to variations remains an open challenge, poorly understood in the context of real-world applications. Introducing the alegoria benchmark, containing multi-date vertical and oblique aerial digitized photography mixed with more modern street-level pictures, we formulate the problem of low-data, heterogeneous image retrieval, and propose associated evaluation setups and measures. We propose a review of ideas and methods to tackle this problem, extensively compare state-of-the-art descriptors and propose a new multi-descriptor diffusion method to exploit their comparative strengths. Our experiments highlight the benefits of combining descriptors and the compromise between absolute and cross-domain performance.

2020 ◽  
Vol 34 (10) ◽  
pp. 13714-13715
Author(s):  
Subhajit Chaudhury

Neural networks have contributed to tremendous progress in the domains of computer vision, speech processing, and other real-world applications. However, recent studies have shown that these state-of-the-art models can be easily compromised by adding small imperceptible perturbations. My thesis summary frames the problem of adversarial robustness as an equivalent problem of learning suitable features that leads to good generalization in neural networks. This is motivated from learning in humans which is not trivially fooled by such perturbations due to robust feature learning which shows good out-of-sample generalization.


Author(s):  
Tong Man ◽  
Huawei Shen ◽  
Xiaolong Jin ◽  
Xueqi Cheng

Data sparsity is one of the most challenging problems for recommender systems. One promising solution to this problem is cross-domain recommendation, i.e., leveraging feedbacks or ratings from multiple domains to improve recommendation performance in a collective manner. In this paper, we propose an Embedding and Mapping framework for Cross-Domain Recommendation, called EMCDR. The proposed EMCDR framework distinguishes itself from existing cross-domain recommendation models in two aspects. First, a multi-layer perceptron is used to capture the nonlinear mapping function across domains, which offers high flexibility for learning domain-specific features of entities in each domain. Second, only the entities with sufficient data are used to learn the mapping function, guaranteeing its robustness to noise caused by data sparsity in single domain. Extensive experiments on two cross-domain recommendation scenarios demonstrate that EMCDR significantly outperforms state-of-the-art cross-domain recommendation methods.


Author(s):  
Wenjie Wang ◽  
Yufeng Shi ◽  
Shiming Chen ◽  
Qinmu Peng ◽  
Feng Zheng ◽  
...  

Zero-shot sketch-based image retrieval (ZS-SBIR), which aims to retrieve photos with sketches under the zero-shot scenario, has shown extraordinary talents in real-world applications. Most existing methods leverage language models to generate class-prototypes and use them to arrange the locations of all categories in the common space for photos and sketches. Although great progress has been made, few of them consider whether such pre-defined prototypes are necessary for ZS-SBIR, where locations of unseen class samples in the embedding space are actually determined by visual appearance and a visual embedding actually performs better. To this end, we propose a novel Norm-guided Adaptive Visual Embedding (NAVE) model, for adaptively building the common space based on visual similarity instead of language-based pre-defined prototypes. To further enhance the representation quality of unseen classes for both photo and sketch modality, modality norm discrepancy and noisy label regularizer are jointly employed to measure and repair the modality bias of the learned common embedding. Experiments on two challenging datasets demonstrate the superiority of our NAVE over state-of-the-art competitors.


Author(s):  
Xiabing Zhou ◽  
Xingxing Xing ◽  
Lei Han ◽  
Haikun Hong ◽  
Kaigui Bian ◽  
...  

Learning with incomplete data remains challenging in many real-world applications especially when the data is high-dimensional and dynamic. Many imputation-based algorithms have been proposed to handle with incomplete data, where these algorithms use statistics of the historical information to remedy the missing parts. However, these methods merely use the structural information existing in the data, which are very helpful for sharing between the complete entries and the missing ones. For example, in traffic system, some group information and temporal smoothness exist in the data structure. In this paper, we propose to incorporate these structural information and develop structural feature leaning method for learning with incomplete data (SFLIC). The SFLIC model adopt a fused Lasso based regularizer and a group Lasso style regularizer to enlarge the data sharing along both the temporal smoothness level and the feature group level to fill the gap where the data entries are missing. The proposed SFLIC model is a nonsmooth function according to the model parameters, and we adopt the smoothing proximal gradient (SPG) method to seek for an efficient solution. We evaluate our model on both synthetic and real-world highway traffic datasets. Experimental results show that our method outperforms the state-of-the-art methods.


2020 ◽  
Author(s):  
Lucas Pascotti Valem ◽  
Daniel Carlos Guimarães Pedronette

The CBIR (Content-Based Image Retrieval) systems are one of the main solutions for image retrieval tasks. These systems are mainly supported by the use of different visual features and machine learning methods. As distinct features produce complementary ranking results with different effectiveness performance, a promising solution consists in combining them. However, how to decide which visual features to combine is a very challenging task, especially when no training data is available. This work proposes three novel methods for selecting and combining ranked lists by estimating their effectiveness in an unsupervised way. The approaches were evaluated in five different image collections and several descriptors, achieving results comparable or superior to the state-of-the-art in most of the scenarios.


Author(s):  
Xiao Song ◽  
Guorun Yang ◽  
Xinge Zhu ◽  
Hui Zhou ◽  
Yuexin Ma ◽  
...  

AbstractRecently, records on stereo matching benchmarks are constantly broken by end-to-end disparity networks. However, the domain adaptation ability of these deep models is quite limited. Addressing such problem, we present a novel domain-adaptive approach called AdaStereo that aims to align multi-level representations for deep stereo matching networks. Compared to previous methods, our AdaStereo realizes a more standard, complete and effective domain adaptation pipeline. Firstly, we propose a non-adversarial progressive color transfer algorithm for input image-level alignment. Secondly, we design an efficient parameter-free cost normalization layer for internal feature-level alignment. Lastly, a highly related auxiliary task, self-supervised occlusion-aware reconstruction is presented to narrow the gaps in output space. We perform intensive ablation studies and break-down comparisons to validate the effectiveness of each proposed module. With no extra inference overhead and only a slight increase in training complexity, our AdaStereo models achieve state-of-the-art cross-domain performance on multiple benchmarks, including KITTI, Middlebury, ETH3D and DrivingStereo, even outperforming some state-of-the-art disparity networks finetuned with target-domain ground-truths. Moreover, based on two additional evaluation metrics, the superiority of our domain-adaptive stereo matching pipeline is further uncovered from more perspectives. Finally, we demonstrate that our method is robust to various domain adaptation settings, and can be easily integrated into quick adaptation application scenarios and real-world deployments.


2021 ◽  
Vol 13 (5) ◽  
pp. 869
Author(s):  
Zheng Zhuo ◽  
Zhong Zhou

In recent years, the amount of remote sensing imagery data has increased exponentially. The ability to quickly and effectively find the required images from massive remote sensing archives is the key to the organization, management, and sharing of remote sensing image information. This paper proposes a high-resolution remote sensing image retrieval method with Gabor-CA-ResNet and a split-based deep feature transform network. The main contributions include two points. (1) For the complex texture, diverse scales, and special viewing angles of remote sensing images, A Gabor-CA-ResNet network taking ResNet as the backbone network is proposed by using Gabor to represent the spatial-frequency structure of images, channel attention (CA) mechanism to obtain stronger representative and discriminative deep features. (2) A split-based deep feature transform network is designed to divide the features extracted by the Gabor-CA-ResNet network into several segments and transform them separately for reducing the dimensionality and the storage space of deep features significantly. The experimental results on UCM, WHU-RS, RSSCN7, and AID datasets show that, compared with the state-of-the-art methods, our method can obtain competitive performance, especially for remote sensing images with rare targets and complex textures.


2021 ◽  
Vol 54 (6) ◽  
pp. 1-35
Author(s):  
Ninareh Mehrabi ◽  
Fred Morstatter ◽  
Nripsuta Saxena ◽  
Kristina Lerman ◽  
Aram Galstyan

With the widespread use of artificial intelligence (AI) systems and applications in our everyday lives, accounting for fairness has gained significant importance in designing and engineering of such systems. AI systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that these decisions do not reflect discriminatory behavior toward certain groups or populations. More recently some work has been developed in traditional machine learning and deep learning that address such challenges in different subdomains. With the commercialization of these systems, researchers are becoming more aware of the biases that these applications can contain and are attempting to address them. In this survey, we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and ways they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.


2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


2018 ◽  
Vol 10 (8) ◽  
pp. 1243 ◽  
Author(s):  
Xu Tang ◽  
Xiangrong Zhang ◽  
Fang Liu ◽  
Licheng Jiao

Due to the specific characteristics and complicated contents of remote sensing (RS) images, remote sensing image retrieval (RSIR) is always an open and tough research topic in the RS community. There are two basic blocks in RSIR, including feature learning and similarity matching. In this paper, we focus on developing an effective feature learning method for RSIR. With the help of the deep learning technique, the proposed feature learning method is designed under the bag-of-words (BOW) paradigm. Thus, we name the obtained feature deep BOW (DBOW). The learning process consists of two parts, including image descriptor learning and feature construction. First, to explore the complex contents within the RS image, we extract the image descriptor in the image patch level rather than the whole image. In addition, instead of using the handcrafted feature to describe the patches, we propose the deep convolutional auto-encoder (DCAE) model to deeply learn the discriminative descriptor for the RS image. Second, the k-means algorithm is selected to generate the codebook using the obtained deep descriptors. Then, the final histogrammic DBOW features are acquired by counting the frequency of the single code word. When we get the DBOW features from the RS images, the similarities between RS images are measured using L1-norm distance. Then, the retrieval results can be acquired according to the similarity order. The encouraging experimental results counted on four public RS image archives demonstrate that our DBOW feature is effective for the RSIR task. Compared with the existing RS image features, our DBOW can achieve improved behavior on RSIR.


Sign in / Sign up

Export Citation Format

Share Document