multimedia retrieval
Recently Published Documents


TOTAL DOCUMENTS

372
(FIVE YEARS 56)

H-INDEX

22
(FIVE YEARS 6)

2022 ◽  
Vol 40 (2) ◽  
pp. 1-36
Author(s):  
Lei Zhu ◽  
Chaoqun Zheng ◽  
Xu Lu ◽  
Zhiyong Cheng ◽  
Liqiang Nie ◽  
...  

Multi-modal hashing supports efficient multimedia retrieval well. However, existing methods still suffer from two problems: (1) Fixed multi-modal fusion. They collaborate the multi-modal features with fixed weights for hash learning, which cannot adaptively capture the variations of online streaming multimedia contents. (2) Binary optimization challenge. To generate binary hash codes, existing methods adopt either two-step relaxed optimization that causes significant quantization errors or direct discrete optimization that consumes considerable computation and storage cost. To address these problems, we first propose a Supervised Multi-modal Hashing with Online Query-adaption method. A self-weighted fusion strategy is designed to adaptively preserve the multi-modal features into hash codes by exploiting their complementarity. Besides, the hash codes are efficiently learned with the supervision of pair-wise semantic labels to enhance their discriminative capability while avoiding the challenging symmetric similarity matrix factorization. Further, we propose an efficient Unsupervised Multi-modal Hashing with Online Query-adaption method with an adaptive multi-modal quantization strategy. The hash codes are directly learned without the reliance on the specific objective formulations. Finally, in both methods, we design a parameter-free online hashing module to adaptively capture query variations at the online retrieval stage. Experiments validate the superiority of our proposed methods.


Information ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 28
Author(s):  
Saïd Mahmoudi ◽  
Mohammed Amin Belarbi

Multimedia applications deal, in most cases, with an extremely high volume of multimedia data (2D and 3D images, sounds, videos). That is why efficient algorithms should be developed to analyze and process these large datasets. On the other hand, multimedia management is based on efficient representation of knowledge which allows efficient data processing and retrieval. The main challenge in this era is to achieve clever and quick access to these huge datasets to allow easy access to the data and in a reasonable time. In this context, large-scale image retrieval is a fundamental task. Many methods have been developed in the literature to achieve fast and efficient navigating in large databases by using the famous content-based image retrieval (CBIR) methods associated with these methods allowing a decrease in the computing time, such as dimensional reduction and hashing methods. More recently, these methods based on convolutional neural networks (CNNs) for feature extraction and image classification are widely used. In this paper, we present a comprehensive review of recent multimedia retrieval methods and algorithms applied to large datasets of 2D/3D images and videos. This editorial paper discusses the mains challenges of multimedia retrieval in a context of large databases.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Zhongke Wang

This paper briefly introduces the characteristics of content-based multimedia retrieval under the information background, analyzes the implementation process of these technologies in the multimedia archives retrieval system including video and image information of digital archives, and points out that the content-based multimedia retrieval technology is bound to be organically combined with the traditional text retrieval methods. The information retrieval technologies in the past can only comply with the specific requirements of customers. Due to their characteristics of universality, they can hardly meet the demands of different environments, various purposes, and different times at the same time yet. Researchers have put forward personalized retrieval of multimedia files based on the BP neural network computing. In this way, the interest model of customers can be analyzed based on the characteristics of the different classification areas of users. Subsequently, the corresponding calculations are carried out, and the model is updated accordingly. Through the experiments, it is verified that the probability model put forward in this paper is the optimal solution to express the interest of customers and its changes.


2021 ◽  
Vol 11 (22) ◽  
pp. 10803
Author(s):  
Jiagang Song ◽  
Yunwu Lin ◽  
Jiayu Song ◽  
Weiren Yu ◽  
Leyuan Zhang

Mass multimedia data with geographical information (geo-multimedia) are collected and stored on the Internet due to the wide application of location-based services (LBS). How to find the high-level semantic relationship between geo-multimedia data and construct efficient index is crucial for large-scale geo-multimedia retrieval. To combat this challenge, the paper proposes a deep cross-modal hashing framework for geo-multimedia retrieval, termed as Triplet-based Deep Cross-Modal Retrieval (TDCMR), which utilizes deep neural network and an enhanced triplet constraint to capture high-level semantics. Besides, a novel hybrid index, called TH-Quadtree, is developed by combining cross-modal binary hash codes and quadtree to support high-performance search. Extensive experiments are conducted on three common used benchmarks, and the results show the superior performance of the proposed method.


2021 ◽  
Vol 465 ◽  
pp. 1-14
Author(s):  
Xize Wu ◽  
Lei Zhu ◽  
Liang Xie ◽  
Zheng Zhang ◽  
Huaxiang Zhang

2021 ◽  
Author(s):  
Xu Lu ◽  
Lei Zhu ◽  
Li Liu ◽  
Liqiang Nie ◽  
Huaxiang Zhang
Keyword(s):  

Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2499
Author(s):  
Farhat Abbas ◽  
Mussarat Yasmin ◽  
Muhammad Fayyaz ◽  
Mohamed Abd Elaziz ◽  
Songfeng Lu ◽  
...  

Pedestrian gender classification is one of the key assignments of pedestrian study, and it finds practical applications in content-based image retrieval, population statistics, human–computer interaction, health care, multimedia retrieval systems, demographic collection, and visual surveillance. In this research work, gender classification was carried out using a deep learning approach. A new 64-layer architecture named 4-BSMAB derived from deep AlexNet is proposed. The proposed model was trained on CIFAR-100 dataset utilizing SoftMax classifier. Then, features were obtained from applied datasets with this pre-trained model. The obtained feature set was optimized with ant colony system (ACS) optimization technique. Various classifiers of SVM and KNN were used to perform gender classification utilizing the optimized feature set. Comprehensive experimentation was performed on gender classification datasets, and proposed model produced better results than the existing methods. The suggested model attained highest accuracy, i.e., 85.4%, and 92% AUC on MIT dataset, and best classification results, i.e., 93% accuracy and 96% AUC, on PKU-Reid dataset. The outcomes of extensive experiments carried out on existing standard pedestrian datasets demonstrate that the proposed framework outperformed existing pedestrian gender classification methods, and acceptable results prove the proposed model as a robust model.


2021 ◽  
Vol 11 (18) ◽  
pp. 8769
Author(s):  
Jun Long ◽  
Longzhi Sun ◽  
Liujie Hua ◽  
Zhan Yang

Cross-modal hashing technology is a key technology for real-time retrieval of large-scale multimedia data in real-world applications. Although the existing cross-modal hashing methods have achieved impressive accomplishment, there are still some limitations: (1) some cross-modal hashing methods do not make full consider the rich semantic information and noise information in labels, resulting in a large semantic gap, and (2) some cross-modal hashing methods adopt the relaxation-based or discrete cyclic coordinate descent algorithm to solve the discrete constraint problem, resulting in a large quantization error or time consumption. Therefore, in order to solve these limitations, in this paper, we propose a novel method, named Discrete Semantics-Guided Asymmetric Hashing (DSAH). Specifically, our proposed DSAH leverages both label information and similarity matrix to enhance the semantic information of the learned hash codes, and the ℓ2,1 norm is used to increase the sparsity of matrix to solve the problem of the inevitable noise and subjective factors in labels. Meanwhile, an asymmetric hash learning scheme is proposed to efficiently perform hash learning. In addition, a discrete optimization algorithm is proposed to fast solve the hash code directly and discretely. During the optimization process, the hash code learning and the hash function learning interact, i.e., the learned hash codes can guide the learning process of the hash function and the hash function can also guide the hash code generation simultaneously. Extensive experiments performed on two benchmark datasets highlight the superiority of DSAH over several state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Zhichao Zhang ◽  
Hui Chen ◽  
Xiaoqing Yin ◽  
Jinsheng Deng

Image deblurring is a classic and important problem in industrial fields, such as aviation photo restoration, object recognition in robotics, and autonomous vehicles. Blurry images in real-world scenarios consist of mixed blurring types, such as a natural motion blurring owing to shaking of the camera. Fast deblurring does not deblur the entire image because it is not the best option. Considering the computational costs, it is also better to have an alternative kernel to deblur different objects at a high-semantic level. To achieve better image restoration quality, it is also beneficial to combine the blurring category location and important structural information in terms of specific artifacts and degree of blurring. The goal of blind image deblurring is to restore sharpness from the unknown blurring kernel of an image. Recent deblurring methods tend to reconstruct prior knowledge, neglecting the influence of blur estimation and visual fidelity on image details and structure. Generative adversarial networks(GANs) have recently been attracting considerable attention from both academia and industry because GAN can perfectly generate new data with the same statistics as the training set. Therefore, this study proposes a generative neural architecture and an edge attention algorithm developed to restore vivid multimedia patches. Joint edge generation and image restoration techniques are designed to solve the low-level multimedia retrieval. This multipath refinement fusion network (MRFNet) can not only perform deblurring of images directly but also individual the frames separately from videos. Ablation experiments validate that our generative adversarial network MRFNet performs better in joint training than in multimodel. Compared to other GAN methods, our two-phase method exhibited state-of-the-art performance in terms of speed and accuracy as well as has a significant visual improvement.


Sign in / Sign up

Export Citation Format

Share Document