Retrieving Images Using Cross-Language Text and Image Features

Author(s):  
Mirna Adriani ◽  
Framadhan Arnely
Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1502
Author(s):  
Ben Wilkes ◽  
Igor Vatolkin ◽  
Heinrich Müller

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.


2017 ◽  
Vol 2 ◽  
pp. 24-33 ◽  
Author(s):  
Musbah Zaid Enweiji ◽  
Taras Lehinevych ◽  
Аndrey Glybovets

Cross language classification is an important task in multilingual learning, where documents in different languages often share the same set of categories. The main goal is to reduce the labeling cost of training classification model for each individual language. The novel approach by using Convolutional Neural Networks for multilingual language classification is proposed in this article. It learns representation of knowledge gained from languages. Moreover, current method works for new individual language, which was not used in training. The results of empirical study on large dataset of 21 languages demonstrate robustness and competitiveness of the presented approach.


2021 ◽  
Vol 32 (4) ◽  
pp. 1-13
Author(s):  
Xia Feng ◽  
Zhiyi Hu ◽  
Caihua Liu ◽  
W. H. Ip ◽  
Huiying Chen

In recent years, deep learning has achieved remarkable results in the text-image retrieval task. However, only global image features are considered, and the vital local information is ignored. This results in a failure to match the text well. Considering that object-level image features can help the matching between text and image, this article proposes a text-image retrieval method that fuses salient image feature representation. Fusion of salient features at the object level can improve the understanding of image semantics and thus improve the performance of text-image retrieval. The experimental results show that the method proposed in the paper is comparable to the latest methods, and the recall rate of some retrieval results is better than the current work.


Sign in / Sign up

Export Citation Format

Share Document