scholarly journals The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval

2021 ◽  
Vol 7 (5) ◽  
pp. 76
Author(s):  
Giuseppe Amato ◽  
Paolo Bolettieri ◽  
Fabio Carrara ◽  
Franca Debole ◽  
Fabrizio Falchi ◽  
...  

This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users’ needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.

Author(s):  
К. Жукова ◽  
K. Zhukova ◽  
С. Ляшева ◽  
S. Lyasheva ◽  
Михаил Шлеймович ◽  
...  
Keyword(s):  

2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Chen Zhang ◽  
Bin Hu ◽  
Yucong Suo ◽  
Zhiqiang Zou ◽  
Yimu Ji

In this paper, we study the challenge of image-to-video retrieval, which uses the query image to search relevant frames from a large collection of videos. A novel framework based on convolutional neural networks (CNNs) is proposed to perform large-scale video retrieval with low storage cost and high search efficiency. Our framework consists of the key-frame extraction algorithm and the feature aggregation strategy. Specifically, the key-frame extraction algorithm takes advantage of the clustering idea so that redundant information is removed in video data and storage cost is greatly reduced. The feature aggregation strategy adopts average pooling to encode deep local convolutional features followed by coarse-to-fine retrieval, which allows rapid retrieval in the large-scale video database. The results from extensive experiments on two publicly available datasets demonstrate that the proposed method achieves superior efficiency as well as accuracy over other state-of-the-art visual search methods.


Author(s):  
Loris Sauter ◽  
Mahnaz Amiri Parian ◽  
Ralph Gasser ◽  
Silvan Heller ◽  
Luca Rossetto ◽  
...  

2021 ◽  
Author(s):  
D. Chandra Mouli ◽  
G. Varun Kumar ◽  
S. V. Kiran ◽  
Sanjeev Kumar

2021 ◽  
Author(s):  
Chen Jiang ◽  
Kaiming Huang ◽  
Sifeng He ◽  
Xudong Yang ◽  
Wei Zhang ◽  
...  

Author(s):  
Kimiaki Shirahama ◽  
Kuniaki Uehara

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.


2020 ◽  
Vol 14 (5) ◽  
Author(s):  
Ling Shen ◽  
Richang Hong ◽  
Yanbin Hao

2015 ◽  
Vol 3 (3) ◽  
pp. 1-13 ◽  
Author(s):  
Hiroki Nomiya ◽  
Atsushi Morikuni ◽  
Teruhisa Hochin

A lifelog video retrieval framework is proposed for the better utilization of a large amount of lifelog video data. The proposed method retrieves emotional scenes such as the scenes in which a person in the video is smiling, considering that a certain important event could happen in most of emotional scenes. The emotional scene is detected on the basis of facial expression recognition using a wide variety of facial features. The authors adopt an unsupervised learning approach called ensemble clustering in order to recognize the facial expressions because supervised learning approaches require sufficient training data, which make it quite troublesome to apply to large-scale video databases. The retrieval performance of the proposed method is evaluated by means of an emotional scene detection experiment from the viewpoints of accuracy and efficiency. In addition, a prototype retrieval system is implemented based on the proposed emotional scene detection method.


Sign in / Sign up

Export Citation Format

Share Document