scholarly journals Object Priors for Classifying and Localizing Unseen Actions

Author(s):  
Pascal Mettes ◽  
William Thong ◽  
Cees G. M. Snoek

AbstractThis work strives for the classification and localization of human actions in videos, without the need for any labeled video training examples. Where existing work relies on transferring global attribute or object information from seen to unseen action videos, we seek to classify and spatio-temporally localize unseen actions in videos from image-based object information only. We propose three spatial object priors, which encode local person and object detectors along with their spatial relations. On top we introduce three semantic object priors, which extend semantic matching through word embeddings with three simple functions that tackle semantic ambiguity, object discrimination, and object naming. A video embedding combines the spatial and semantic object priors. It enables us to introduce a new video retrieval task that retrieves action tubes in video collections based on user-specified objects, spatial relations, and object size. Experimental evaluation on five action datasets shows the importance of spatial and semantic object priors for unseen actions. We find that persons and objects have preferred spatial relations that benefit unseen action localization, while using multiple languages and simple object filtering directly improves semantic matching, leading to state-of-the-art results for both unseen action classification and localization.

Author(s):  
Wenzhe Wang ◽  
Mengdan Zhang ◽  
Runnan Chen ◽  
Guanyu Cai ◽  
Penghao Zhou ◽  
...  

Multi-modal cues presented in videos are usually beneficial for the challenging video-text retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of multi-modal cues by aggregating them to holistic high-level semantics for matching with text representations in a global view. In contrast to this global alignment, the local alignment of detailed semantics encoded within both multi-modal cues and distinct phrases is still not well conducted. Thus, in this paper, we leverage the hierarchical video-text alignment to fully explore the detailed diverse characteristics in multi-modal cues for fine-grained alignment with local semantics from phrases, as well as to capture a high-level semantic correspondence. Specifically, multi-step attention is learned for progressively comprehensive local alignment and a holistic transformer is utilized to summarize multi-modal cues for global alignment. With hierarchical alignment, our model outperforms state-of-the-art methods on three public video retrieval datasets.


2020 ◽  
Vol 34 (07) ◽  
pp. 12524-12531
Author(s):  
Ruicong Xu ◽  
Li Niu ◽  
Jianfu Zhang ◽  
Liqing Zhang

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.


Author(s):  
Zein Al Abidin Ibrahim ◽  
Siba Haidar ◽  
Ihab Sbeity

The production of video has increased and expanded dramatically. There is a need to reach accurate video classification. In our work, we use deep learning as a mean to accelerate the video retrieval task by classifying them into categories. We classify a video depending on the text extracted from it. We trained our model using fastText, a library for efficient text classification and representation learning, and tested our model on 15000 videos. Experimental results show that our approach is efficient and has good performance. Our technique can be used on huge datasets. It produces a model that can be used to classify any video into a specific category very quickly.


2009 ◽  
Vol 34 (10) ◽  
pp. 1243-1249
Author(s):  
Hua-Bei LI ◽  
Wei-Ming HU ◽  
Guan LUO

Author(s):  
G. M. Cohen ◽  
J. S. Grasso ◽  
M. L. Domeier ◽  
P. T. Mangonon

Any explanation of vestibular micromechanics must include the roles of the otolithic and cupular membranes. However, micromechanical models of vestibular function have been hampered by unresolved questions about the microarchitectures of these membranes and their connections to stereocilia and supporting cells. Otolithic membranes are notoriously difficult to preserve because of severe shrinkage and loss of soluble components. We have empirically developed fixation procedures that reduce shrinkage artifacts and more accurately depict the spatial relations between the otolithic membranes and the ciliary bundles and supporting cells.We used White Leghorn chicks, ranging in age from newly hatched to one week. The inner ears were fixed for 3-24 h in 1.5-1.75% glutaraldehyde in 150 mM KCl, buffered with potassium phosphate, pH 7.3; when postfixed, it was for 30 min in 1% OsO4 alone or mixed with 1% K4Fe(CN)6. The otolithic organs (saccule, utricle, lagenar macula) were embedded in Araldite 502. Semithin sections (1 μ) were stained with toluidine blue.


Author(s):  
Raksha Anand ◽  
John Hart ◽  
Patricia S. Moore ◽  
Sandra B. Chapman

Abstract Purpose: Frontotemporal lobar degeneration (FTLD) encompasses a group of neurodegenerative disorders characterized by gradual and progressive decline in behavior and/or language. Identifying the subtypes of FTLD can be challenging with traditional assessment tools. Growing empirical evidence suggests that language measures might be useful in differentiating FTLD subtypes. Method: In this paper, we examined the performance of five individuals with FTLD (two with frontotemporal dementia, two with semantic dementia, and one with progressive nonfluent aphasia) and 10 cognitively normal older adults on measures of semantic binding (Semantic Object Retrieval Test and semantic problem solving) and abstracted meaning (generation of interpretive statement and proverb interpretation). Results and Conclusion: A differential profile of impairment was observed in the three FTLD subtypes on these four measures. Further examination of these measures in larger groups will establish their clinical utility in differentiating the FTLD subtypes.


Sign in / Sign up

Export Citation Format

Share Document