Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval

Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.

Download Full-text

Non-Local and Multi-Scale Mechanisms for Image Inpainting

Sensors ◽

10.3390/s21093281 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3281

Author(s):

Xu He ◽

Yong Yin

Keyword(s):

Markov Random Fields ◽

Receptive Fields ◽

Image Inpainting ◽

Long Distance ◽

Visual Appearance ◽

Fine Grained ◽

Multi Scale ◽

Non Local ◽

Relationship Of

Recently, deep learning-based techniques have shown great power in image inpainting especially dealing with squared holes. However, they fail to generate plausible results inside the missing regions for irregular and large holes as there is a lack of understanding between missing regions and existing counterparts. To overcome this limitation, we combine two non-local mechanisms including a contextual attention module (CAM) and an implicit diversified Markov random fields (ID-MRF) loss with a multi-scale architecture which uses several dense fusion blocks (DFB) based on the dense combination of dilated convolution to guide the generative network to restore discontinuous and continuous large masked areas. To prevent color discrepancies and grid-like artifacts, we apply the ID-MRF loss to improve the visual appearance by comparing similarities of long-distance feature patches. To further capture the long-term relationship of different regions in large missing regions, we introduce the CAM. Although CAM has the ability to create plausible results via reconstructing refined features, it depends on initial predicted results. Hence, we employ the DFB to obtain larger and more effective receptive fields, which benefits to predict more precise and fine-grained information for CAM. Extensive experiments on two widely-used datasets demonstrate that our proposed framework significantly outperforms the state-of-the-art approaches both in quantity and quality.

Download Full-text

Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001420500342 ◽

2020 ◽

Vol 34 (13) ◽

pp. 2050034

Author(s):

Ali Salim Rasheed ◽

Davood Zabihzadeh ◽

Sumia Abdulhussien Razooqi Al-Obaidi

Keyword(s):

Information Retrieval ◽

Image Classification ◽

Large Scale ◽

Learning Algorithm ◽

State Of The Art ◽

Metric Learning ◽

Random Projection ◽

Linear Projection ◽

Modal Data ◽

Related Data

Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.

Download Full-text

Behavioral Genetics: Concepts for Research and Practice in Language Development and Disorders

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3805.1126 ◽

1995 ◽

Vol 38 (5) ◽

pp. 1126-1142 ◽

Cited By ~ 14

Author(s):

Jeffrey W. Gilger

Keyword(s):

Language Development ◽

Behavioral Genetics ◽

State Of The Art ◽

Genetic Research ◽

Great Promise ◽

Behavioral Genetic ◽

Fine Grained ◽

Future Goals ◽

Current State ◽

Research Designs

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).

Download Full-text

A multi-scale and multi-physics simulation methodology with the state-of-the-art tools for safety analysis in light water reactors applied to a turbine trip scenario (PART I)

Nuclear Engineering and Design ◽

10.1016/j.nucengdes.2019.05.008 ◽

2019 ◽

Vol 350 ◽

pp. 195-204 ◽

Cited By ~ 1

Author(s):

Patricio Hidalga ◽

Agustín Abarca ◽

Rafael Miró ◽

Abdelkrim Sekrhi ◽

Gumersindo Verdú

Keyword(s):

State Of The Art ◽

Safety Analysis ◽

The State ◽

Light Water Reactors ◽

Light Water ◽

Simulation Methodology ◽

Multi Scale ◽

Physics Simulation ◽

Water Reactors

Download Full-text

Domain randomization-enhanced deep learning models for bird detection

Scientific Reports ◽

10.1038/s41598-020-80101-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xin Mao ◽

Jun Kang Chow ◽

Pin Siang Tan ◽

Kuan-fu Liu ◽

Jimmy Wu ◽

...

Keyword(s):

Deep Learning ◽

Continuous Monitoring ◽

Bird Species ◽

Training Data ◽

Learning Models ◽

Fine Grained ◽

Bird Detection ◽

Relationship Of ◽

The Relationship

AbstractAutomatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.

Download Full-text

Ensemble-Based Out-of-Distribution Detection

Electronics ◽

10.3390/electronics10050567 ◽

2021 ◽

Vol 10 (5) ◽

pp. 567

Author(s):

Donghun Yang ◽

Kien Mai Mai Ngoc ◽

Iksoo Shin ◽

Kyong-Ha Lee ◽

Myunggwon Hwang

Keyword(s):

Detection Method ◽

State Of The Art ◽

Metric Learning ◽

Feature Space ◽

Confidence Score ◽

Distance Metric Learning ◽

Current State ◽

Overall Performance ◽

Deep Learning Model

To design an efficient deep learning model that can be used in the real-world, it is important to detect out-of-distribution (OOD) data well. Various studies have been conducted to solve the OOD problem. The current state-of-the-art approach uses a confidence score based on the Mahalanobis distance in a feature space. Although it outperformed the previous approaches, the results were sensitive to the quality of the trained model and the dataset complexity. Herein, we propose a novel OOD detection method that can train more efficient feature space for OOD detection. The proposed method uses an ensemble of the features trained using the softmax-based classifier and the network based on distance metric learning (DML). Through the complementary interaction of these two networks, the trained feature space has a more clumped distribution and can fit well on the Gaussian distribution by class. Therefore, OOD data can be efficiently detected by setting a threshold in the trained feature space. To evaluate the proposed method, we applied our method to various combinations of image datasets. The results show that the overall performance of the proposed approach is superior to those of other methods, including the state-of-the-art approach, on any combination of datasets.

Download Full-text

PCAN—Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs13071243 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1243

Author(s):

Wenxin Yin ◽

Wenhui Diao ◽

Peijin Wang ◽

Xin Gao ◽

Ya Li ◽

...

Keyword(s):

Remote Sensing ◽

Power Plants ◽

State Of The Art ◽

Thermal Power ◽

Image Interpretation ◽

Remote Sensing Image ◽

Thermal Power Plants ◽

Average Precision ◽

Deep Convolutional Neural Networks ◽

Multi Scale

The detection of Thermal Power Plants (TPPs) is a meaningful task for remote sensing image interpretation. It is a challenging task, because as facility objects TPPs are composed of various distinctive and irregular components. In this paper, we propose a novel end-to-end detection framework for TPPs based on deep convolutional neural networks. Specifically, based on the RetinaNet one-stage detector, a context attention multi-scale feature extraction network is proposed to fuse global spatial attention to strengthen the ability in representing irregular objects. In addition, we design a part-based attention module to adapt to TPPs containing distinctive components. Experiments show that the proposed method outperforms the state-of-the-art methods and can achieve 68.15% mean average precision.

Download Full-text

Semantic relationship of contents using tensor factorization for self-growth social broadcasting

2016 International Conference on Information Networking (ICOIN) ◽

10.1109/icoin.2016.7427162 ◽

2016 ◽

Author(s):

Svetlana Kim ◽

Yong-Ik Yoon

Keyword(s):

Tensor Factorization ◽

Semantic Relationship ◽

Relationship Of

Download Full-text

Metric Learning Based Fine-Grained Classification for PolSAR Imagery

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss39084.2020.9323087 ◽

2020 ◽

Author(s):

Jun Ni ◽

Yunzhe Jia ◽

Qiang Yin ◽

Yongsheng Zhou ◽

Fan Zhang

Keyword(s):

Metric Learning ◽

Fine Grained

Download Full-text

Representation Learning for Fine-Grained Change Detection

Sensors ◽

10.3390/s21134486 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4486

Author(s):

Niall O’Mahony ◽

Sean Campbell ◽

Lenka Krpalkova ◽

Anderson Carvalho ◽

Joseph Walsh ◽

...

Keyword(s):

Deep Learning ◽

Change Detection ◽

Model Calibration ◽

State Of The Art ◽

Representation Learning ◽

Machine Intelligence ◽

The State ◽

Sensor Data ◽

Fine Grained ◽

Learning Techniques

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.

Download Full-text