Retrieving Images Using Cross-Language Text and Image Features

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.

Download Full-text

CROSS-LANGUAGE TEXT CLASSIFICATION WITH CONVOLUTIONAL NEURAL NETWORKS FROM SCRATCH

EUREKA Physics and Engineering ◽

10.21303/2461-4262.2017.00304 ◽

2017 ◽

Vol 2 ◽

pp. 24-33 ◽

Cited By ~ 1

Author(s):

Musbah Zaid Enweiji ◽

Taras Lehinevych ◽

Аndrey Glybovets

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Current Method ◽

Classification Model ◽

The Novel ◽

Novel Approach ◽

Multilingual Learning ◽

Cross Language ◽

Language Text ◽

Language Classification

Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach.

Download Full-text

An Automatic Measure of Cross-Language Text Structures

Technology Knowledge and Learning ◽

10.1007/s10758-017-9320-5 ◽

2017 ◽

Vol 23 (2) ◽

pp. 301-314 ◽

Cited By ~ 4

Author(s):

Kyung Kim

Keyword(s):

Text Structures ◽

Cross Language ◽

Language Text

Download Full-text

Text-Image Retrieval With Salient Features

Journal of Database Management ◽

10.4018/jdm.2021100101 ◽

2021 ◽

Vol 32 (4) ◽

pp. 1-13

Author(s):

Xia Feng ◽

Zhiyi Hu ◽

Caihua Liu ◽

W. H. Ip ◽

Huiying Chen

Keyword(s):

Image Retrieval ◽

Recall Rate ◽

Image Features ◽

Feature Representation ◽

Image Feature ◽

Text And Image ◽

Retrieval Task ◽

Retrieval Method ◽

Salient Features ◽

Object Level

In recent years, deep learning has achieved remarkable results in the text-image retrieval task. However, only global image features are considered, and the vital local information is ignored. This results in a failure to match the text well. Considering that object-level image features can help the matching between text and image, this article proposes a text-image retrieval method that fuses salient image feature representation. Fusion of salient features at the object level can improve the understanding of image semantics and thus improve the performance of text-image retrieval. The experimental results show that the method proposed in the paper is comparable to the latest methods, and the recall rate of some retrieval results is better than the current work.

Download Full-text