The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval

This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users’ needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.

Download Full-text

VIDEO SEARCH SYSTEM BY SAMPLE

CAD/EDA, Modeling and Simulation in Modern Electronics. Part 1 ◽

10.30987/conferencearticle_5c19e5ff73d187.68026421 ◽

2018 ◽

Author(s):

К. Жукова ◽

K. Zhukova ◽

С. Ляшева ◽

S. Lyasheva ◽

Михаил Шлеймович ◽

...

Keyword(s):

Search System ◽

Video Search

Download Full-text

Large-Scale Video Retrieval via Deep Local Convolutional Features

Advances in Multimedia ◽

10.1155/2020/7862894 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Chen Zhang ◽

Bin Hu ◽

Yucong Suo ◽

Zhiqiang Zou ◽

Yimu Ji

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Video Data ◽

Query Image ◽

Key Frame Extraction ◽

Key Frame ◽

Storage Cost ◽

Extraction Algorithm ◽

Feature Aggregation ◽

And Storage

In this paper, we study the challenge of image-to-video retrieval, which uses the query image to search relevant frames from a large collection of videos. A novel framework based on convolutional neural networks (CNNs) is proposed to perform large-scale video retrieval with low storage cost and high search efficiency. Our framework consists of the key-frame extraction algorithm and the feature aggregation strategy. Specifically, the key-frame extraction algorithm takes advantage of the clustering idea so that redundant information is removed in video data and storage cost is greatly reduced. The feature aggregation strategy adopts average pooling to encode deep local convolutional features followed by coarse-to-fine retrieval, which allows rapid retrieval in the large-scale video database. The results from extensive experiments on two publicly available datasets demonstrate that the proposed method achieves superior efficiency as well as accuracy over other state-of-the-art visual search methods.

Download Full-text

Supervised Recurrent Hashing for Large Scale Video Retrieval

Proceedings of the 2016 ACM on Multimedia Conference - MM '16 ◽

10.1145/2964284.2967225 ◽

2016 ◽

Cited By ~ 18

Author(s):

Yun Gu ◽

Chao Ma ◽

Jie Yang

Keyword(s):

Large Scale ◽

Video Retrieval

Download Full-text

Combining Boolean and Multimedia Retrieval in vitrivr for Large-Scale Video Search

MultiMedia Modeling - Lecture Notes in Computer Science ◽

10.1007/978-3-030-37734-2_66 ◽

2019 ◽

pp. 760-765 ◽

Cited By ~ 6

Author(s):

Loris Sauter ◽

Mahnaz Amiri Parian ◽

Ralph Gasser ◽

Silvan Heller ◽

Luca Rossetto ◽

...

Keyword(s):

Large Scale ◽

Multimedia Retrieval ◽

Video Search

Download Full-text

A Method for New Word Extraction on Chinese Large-scale Query Logs

2011 Seventh International Conference on Computational Intelligence and Security ◽

10.1109/cis.2011.278 ◽

2011 ◽

Author(s):

Rui Sun ◽

Peng Jin ◽

Juan Lai

Keyword(s):

Large Scale ◽

Query Logs

Download Full-text

Video Retrieval Queries of Large Scale Images: An Efficient Approach

10.1109/ispcc53510.2021.9609382 ◽

2021 ◽

Author(s):

D. Chandra Mouli ◽

G. Varun Kumar ◽

S. V. Kiran ◽

Sanjeev Kumar

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Efficient Approach

Download Full-text

Learning Segment Similarity and Alignment in Large-Scale Content Based Video Retrieval

10.1145/3474085.3475301 ◽

2021 ◽

Author(s):

Chen Jiang ◽

Kaiming Huang ◽

Sifeng He ◽

Xudong Yang ◽

Wei Zhang ◽

...

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Content Based Video Retrieval

Download Full-text

Constructing and Utilizing Video Ontology for Accurate and Fast Retrieval

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2011100104 ◽

2011 ◽

Vol 2 (4) ◽

pp. 59-75 ◽

Cited By ~ 1

Author(s):

Kimiaki Shirahama ◽

Kuniaki Uehara

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Video Retrieval ◽

Computational Cost ◽

Semantic Content ◽

Video Data ◽

Experimental Results ◽

Huge Number ◽

Dempster Shafer Theory ◽

Shafer Theory

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.

Download Full-text

Advance on large scale near-duplicate video retrieval

Frontiers of Computer Science ◽

10.1007/s11704-019-8229-7 ◽

2020 ◽

Vol 14 (5) ◽

Cited By ~ 2

Author(s):

Ling Shen ◽

Richang Hong ◽

Yanbin Hao

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Duplicate Video

Download Full-text

An Emotional Scene Retrieval Framework for Lifelog Videos Using Ensemble Clustering

International Journal of Software Innovation ◽

10.4018/ijsi.2015070101 ◽

2015 ◽

Vol 3 (3) ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Hiroki Nomiya ◽

Atsushi Morikuni ◽

Teruhisa Hochin

Keyword(s):

Large Scale ◽

Video Retrieval ◽

Video Data ◽

Training Data ◽

Expression Recognition ◽

Learning Approaches ◽

Ensemble Clustering ◽

Retrieval Performance ◽

Scene Detection ◽

Emotional Scenes

A lifelog video retrieval framework is proposed for the better utilization of a large amount of lifelog video data. The proposed method retrieves emotional scenes such as the scenes in which a person in the video is smiling, considering that a certain important event could happen in most of emotional scenes. The emotional scene is detected on the basis of facial expression recognition using a wide variety of facial features. The authors adopt an unsupervised learning approach called ensemble clustering in order to recognize the facial expressions because supervised learning approaches require sufficient training data, which make it quite troublesome to apply to large-scale video databases. The retrieval performance of the proposed method is evaluated by means of an emotional scene detection experiment from the viewpoints of accuracy and efficiency. In addition, a prototype retrieval system is implemented based on the proposed emotional scene detection method.

Download Full-text