Large-Scale Multi-modal Distance Metric Learning with Application to Content-Based Information Retrieval and Image Classification

Author(s):  
Ali Salim Rasheed ◽  
Davood Zabihzadeh ◽  
Sumia Abdulhussien Razooqi Al-Obaidi

Metric learning algorithms aim to make the conceptually related data items closer and keep dissimilar ones at a distance. The most common approach for metric learning on the Mahalanobis method. Despite its success, this method is limited to find a linear projection and also suffer from scalability respecting both the dimensionality and the size of input data. To address these problems, this paper presents a new scalable metric learning algorithm for multi-modal data. Our method learns an optimal metric for any feature set of the multi-modal data in an online fashion. We also combine the learned metrics with a novel Passive/Aggressive (PA)-based algorithm which results in a higher convergence rate compared to the state-of-the-art methods. To address scalability with respect to dimensionality, Dual Random Projection (DRP) is adopted in this paper. The present method is evaluated on some challenging machine vision datasets for image classification and Content-Based Information Retrieval (CBIR) tasks. The experimental results confirm that the proposed method significantly surpasses other state-of-the-art metric learning methods in most of these datasets in terms of both accuracy and efficiency.

2021 ◽  
Vol 15 (3) ◽  
pp. 1-28
Author(s):  
Xueyan Liu ◽  
Bo Yang ◽  
Hechang Chen ◽  
Katarzyna Musial ◽  
Hongxu Chen ◽  
...  

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4975
Author(s):  
Fangyu Shi ◽  
Zhaodi Wang ◽  
Menghan Hu ◽  
Guangtao Zhai

Relying on large scale labeled datasets, deep learning has achieved good performance in image classification tasks. In agricultural and biological engineering, image annotation is time-consuming and expensive. It also requires annotators to have technical skills in specific areas. Obtaining the ground truth is difficult because natural images are expensive. In addition, images in these areas are usually stored as multichannel images, such as computed tomography (CT) images, magnetic resonance images (MRI), and hyperspectral images (HSI). In this paper, we present a framework using active learning and deep learning for multichannel image classification. We use three active learning algorithms, including least confidence, margin sampling, and entropy, as the selection criteria. Based on this framework, we further introduce an “image pool” to make full advantage of images generated by data augmentation. To prove the availability of the proposed framework, we present a case study on agricultural hyperspectral image classification. The results show that the proposed framework achieves better performance compared with the deep learning model. Manual annotation of all the training sets achieves an encouraging accuracy. In comparison, using active learning algorithm of entropy and image pool achieves a similar accuracy with only part of the whole training set manually annotated. In practical application, the proposed framework can remarkably reduce labeling effort during the model development and upadting processes, and can be applied to multichannel image classification in agricultural and biological engineering.


2013 ◽  
Vol 21 (1) ◽  
pp. 3-47 ◽  
Author(s):  
IDAN SZPEKTOR ◽  
HRISTO TANEV ◽  
IDO DAGAN ◽  
BONAVENTURA COPPOLA ◽  
MILEN KOUYLEKOV

AbstractEntailment recognition is a primary generic task in natural language inference, whose focus is to detect whether the meaning of one expression can be inferred from the meaning of the other. Accordingly, many NLP applications would benefit from high coverage knowledgebases of paraphrases and entailment rules. To this end, learning such knowledgebases from the Web is especially appealing due to its huge size as well as its highly heterogeneous content, allowing for a more scalable rule extraction of various domains. However, the scalability of state-of-the-art entailment rule acquisition approaches from the Web is still limited. We present a fully unsupervised learning algorithm for Web-based extraction of entailment relations. We focus on increased scalability and generality with respect to prior work, with the potential of a large-scale Web-based knowledgebase. Our algorithm takes as its input a lexical–syntactic template and searches the Web for syntactic templates that participate in an entailment relation with the input template. Experiments show promising results, achieving performance similar to a state-of-the-art unsupervised algorithm, operating over an offline corpus, but with the benefit of learning rules for different domains with no additional effort.


Author(s):  
Zhizheng Zhang ◽  
Cuiling Lan ◽  
Wenjun Zeng ◽  
Zhibo Chen ◽  
Shih-Fu Chang

Few-shot image classification learns to recognize new categories from limited labelled data. Metric learning based approaches have been widely investigated, where a query sample is classified by finding the nearest prototype from the support set based on their feature similarities. A neural network has different uncertainties on its calculated similarities of different pairs. Understanding and modeling the uncertainty on the similarity could promote the exploitation of limited samples in few-shot optimization. In this work, we propose Uncertainty-Aware Few-Shot framework for image classification by modeling uncertainty of the similarities of query-support pairs and performing uncertainty-aware optimization. Particularly, we exploit such uncertainty by converting observed similarities to probabilistic representations and incorporate them to the loss for more effective optimization. In order to jointly consider the similarities between a query and the prototypes in a support set, a graph-based model is utilized to estimate the uncertainty of the pairs. Extensive experiments show our proposed method brings significant improvements on top of a strong baseline and achieves the state-of-the-art performance.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 466 ◽  
Author(s):  
Yan Hua ◽  
Yingyun Yang ◽  
Jianhe Du

Multi-modal retrieval is a challenge due to heterogeneous gap and a complex semantic relationship between different modal data. Typical research map different modalities into a common subspace with a one-to-one correspondence or similarity/dissimilarity relationship of inter-modal data, in which the distances of heterogeneous data can be compared directly; thus, inter-modal retrieval can be achieved by the nearest neighboring search. However, most of them ignore intra-modal relations and complicated semantics between multi-modal data. In this paper, we propose a deep multi-modal metric learning method with multi-scale semantic correlation to deal with the retrieval tasks between image and text modalities. A deep model with two branches is designed to nonlinearly map raw heterogeneous data into comparable representations. In contrast to binary similarity, we formulate semantic relationship with multi-scale similarity to learn fine-grained multi-modal distances. Inter-modal and intra-modal correlations constructed on multi-scale semantic similarity are incorporated to train the deep model in an end-to-end way. Experiments validate the effectiveness of our proposed method on multi-modal retrieval tasks, and our method outperforms state-of-the-art methods on NUS-WIDE, MIR Flickr, and Wikipedia datasets.


1999 ◽  
Vol 09 (06) ◽  
pp. 1041-1074 ◽  
Author(s):  
TAO YANG ◽  
LEON O. CHUA

In a programmable (multistage) cellular neural network (CNN) structure, the CPU is a CNN universal chip which supports massively parallel computations on patterns and images, including videos. In this paper, we decompose the structure of a class of simultaneous recurrent networks (SRN) into a CNN program and run it on a von Neumann-like stored program CNN structure. To train the SRN, we map the back-propagation-through-time (BTT) learning algorithm into a sequence of CNN subroutines to achieve real-time performance via a CNN universal chip. By computing in parallel, the CNN universal chip can be programmed to implement in real time the BTT learning algorithm, which has a very high time complexity. An estimate of the time complexity of the BTT learning algorithm based on the CNN universal chip is presented. For small-scale problems, our simulation results show that a CNN implementation of the BTT learning algorithm for a two-dimensional SRN is at least 10,000 times faster than that based on state-of-the-art sequential workstations. For the few large-scale problems which we have so far simulated, the CNN implemented BTT learning algorithm maintained virtually the same time complexity with a learning time of a few seconds, while those implemented on state-of-the-art sequential workstations dramatically increased their time complexity, often requiring several days of running time. Several examples are presented to demonstrate how efficiently a CNN universal chip can speed up the learning algorithm for both off-line and on-line applications.


2019 ◽  
Vol 53 (2) ◽  
pp. 104-105
Author(s):  
Hamed Zamani

Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval [9]. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or weakly supervised solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, in our weak supervision approach, neural models are trained on a large set of noisy and biased training data obtained from external resources, existing models, or heuristics. We first introduce relevance-based embedding models [3] that learn distributed representations for words and queries. We show that the learned representations can be effectively employed for a set of IR tasks, including query expansion, pseudo-relevance feedback, and query classification [1, 2]. We further propose a standalone learning to rank model based on deep neural networks [5, 8]. Our model learns a sparse representation for queries and documents. This enables us to perform efficient retrieval by constructing an inverted index in the learned semantic space. Our model outperforms state-of-the-art retrieval models, while performing as efficiently as term matching retrieval models. We additionally propose a neural network framework for predicting the performance of a retrieval model for a given query [7]. Inspired by existing query performance prediction models, our framework integrates several information sources, such as retrieval score distribution and term distribution in the top retrieved documents. This leads to state-of-the-art results for the performance prediction task on various standard collections. We finally bridge the gap between retrieval and recommendation models, as the two key components in most information systems. Search and recommendation often share the same goal: helping people get the information they need at the right time. Therefore, joint modeling and optimization of search engines and recommender systems could potentially benefit both systems [4]. In more detail, we introduce a retrieval model that is trained using user-item interaction (e.g., recommendation data), with no need to query-document relevance information for training [6]. Our solutions and findings in this dissertation smooth the path towards learning efficient and effective models for various information retrieval and related tasks, especially when large-scale training data is not available.


Author(s):  
Emanuele Fumeo ◽  
Luca Oneto ◽  
Giorgio Clerico ◽  
Renzo Canepa ◽  
Federico Papa ◽  
...  

Current Train Delay Prediction Systems (TDPSs) do not take advantage of state-of-the-art tools and techniques for extracting useful insights from large amounts of historical data collected by the railway information systems. Instead, these systems rely on static rules, based on classical univariate statistic, built by experts of the railway infrastructure. The purpose of this book chapter is to build a data-driven TDPS for large-scale railway networks, which exploits the most recent big data technologies, learning algorithms, and statistical tools. In particular, we propose a fast learning algorithm for Shallow and Deep Extreme Learning Machines that fully exploits the recent in-memory large-scale data processing technologies for predicting train delays. Proposal has been compared with the current state-of-the-art TDPSs. Results on real world data coming from the Italian railway network show that our proposal is able to improve over the current state-of-the-art TDPSs.


Sign in / Sign up

Export Citation Format

Share Document