Managing Multimedia Semantics
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781591405696, 9781591405436

Author(s):  
Anne H.H. Ngu ◽  
Jialie Shen ◽  
John Shepherd

The optimized distance-based access methods currently available for multimedia databases are based on two major assumptions: a suitable distance function is known a priori, and the dimensionality of image features is low. The standard approach to building image databases is to represent images via vectors based on low-level visual features and make retrieval based on these vectors. However, due to the large gap between the semantic notions and low-level visual content, it is extremely difficult to define a distance function that accurately captures the similarity of images as perceived by humans. Furthermore, popular dimension reduction methods suffer from either the inability to capture the nonlinear correlations among raw data or very expensive training cost. To address the problems, in this chapter we introduce a new indexing technique called Combining Multiple Visual Features (CMVF) that integrates multiple visual features to get better query effectiveness. Our approach is able to produce low-dimensional image feature vectors that include not only low-level visual properties but also high-level semantic properties. The hybrid architecture can produce feature vectors that capture the salient properties of images yet are small enough to allow the use of existing high-dimensional indexing methods to provide efficient and effective retrieval.


2011 ◽  
pp. 351-362
Author(s):  
Viranga Ratnaike ◽  
Bala Srinivasan ◽  
Surya Nepal

The semantic gap is recognized as one of the major problems in managing multimedia semantics. It is the gap between sensory data and semantic models. Often the sensory data and associated context compose situations which have not been anticipated by system architects. Emergence is a phenomenon that can be employed to deal with such unanticipated situations. In the past, researchers and practitioners paid little attention to applying the concepts of emergence to multimedia information retrieval. Recently, there have been attempts to use emergent semantics as a way of dealing with the semantic gap. This chapter aims to provide an overview of the field as it applies to multimedia. We begin with the concepts behind emergence, cover the requirements of emergent systems, and survey the existing body of research.


2011 ◽  
pp. 333-350
Author(s):  
Isabel F. Cruz ◽  
Olga Sayenko

Semantics can play an important role in multimedia content retrieval and presentation. Although a complete semantic description of a multimedia object may be difficult to generate, we show that even a limited description can be explored so as to provide significant added functionality in the retrieval and presentation of multimedia. In this chapter we describe the DelaunayView that supports distributed and heterogeneous multimedia sources and proposes a flexible semantically driven approach to the selection and display of multimedia content.


2011 ◽  
pp. 223-245
Author(s):  
Brett Adams ◽  
Svetha Venkatesh

This chapter takes a look at the task of creating multimedia authoring tools for the amateur media creator, and the problems unique to the undertaking. It argues that a deep understanding of both the media creation process, together with insight into the precise nature of the relative strengths of computers and users, given the domain of application, is needed before this gap can be bridged by software technology. These issues are further demonstrated within the context of a novel media collection environment, including a real- world example of an occasion filmed in order to automatically create two movies of distinctly different styles. The authors hope that such tools will enable amateur videographers to produce technically polished and aesthetically effective media, regardless of their level of expertise.


2011 ◽  
pp. 135-159
Author(s):  
Uma Srinivasan ◽  
Surya Nepal

In order to manage large collections of video content, we need appropriate video content models that can facilitate interaction with the content. The important issue for video applications is to accommodate different ways in which a video sequence can function semantically. This requires that the content be described at several levels of abstraction. In this chapter we propose a video metamodel called VIMET and describe an approach to modeling video content such that video content descriptions can be developed incrementally, depending on the application and video genre. We further define a data model to represent video objects and their relationships at several levels of abstraction. With the help of an example, we then illustrate the process of developing a specific application model that develops incremental descriptions of video semantics using our proposed video metamodel (VIMET).


Author(s):  
Qi Tian ◽  
Ying Wu ◽  
Jie Yu ◽  
Thomas S. Huang

For learning-based tasks such as image classification and object recognition, the feature dimension is usually very high. The learning is afflicted by the curse of dimensionality as the search space grows exponentially with the dimension. Discriminant expectation maximization (DEM) proposed a framework by applying self-supervised learning in a discriminating subspace. This paper extends the linear DEM to a nonlinear kernel algorithm, Kernel DEM (KDEM), and evaluates KDEM extensively on benchmark image databases and synthetic data. Various comparisons with other state-of-the-art learning techniques are investigated for several tasks of image classification, hand posture recognition and fingertip tracking. Extensive results show the effectiveness of our approach.


2011 ◽  
pp. 193-222
Author(s):  
Qi Tian ◽  
Baback Moghaddam ◽  
Neal Lesh ◽  
Chia Shen ◽  
Thomas S. Huang

Recent advances in technology have made it possible to easily amass large collections of digital media. These media offer new opportunities and place great demands for new digital content user-interface and management systems which can help people construct, organize, navigate, and share digital collections in an interactive, face-to-face social setting. In this chapter, we have developed a user-centric algorithm for visualization and layout for content-based image retrieval (CBIR) in large photo libraries. Optimized layouts reflect mutual similarities as displayed on a two-dimensional (2D) screen, hence providing a perceptually intuitive visualization as compared to traditional sequential one-dimensional (1D) content-based image retrieval systems. A frameworkfor user modeling also allows our system to learn and adapt to a user’s preferences. The resulting retrieval, browsing and visualization can adapt to the user’s (time-varying) notions of content, context and preferences in style and interactive navigation.


2011 ◽  
pp. 160-181 ◽  
Author(s):  
Silvia Pfeifer ◽  
Conrad Parker ◽  
André Pang

The Continuous Media Web project has developed a technology to extend the Web to time-continuously sampled data enabling seamless searching and surfing with existing Web tools. This chapter discusses requirements for such an extension of the Web, contrasts existing technologies and presents the Annodex technology, which enables the creation of Webs of audio and video documents. To encourage uptake, the specifications of the Annodex technology have been submitted to the IETF for standardisation and open source software is made available freely. The Annodex technology permits an integrated means of searching, surfing, and managing a World Wide Web of textual and media resources.


2011 ◽  
pp. 99-134
Author(s):  
Changsheng Xu ◽  
Xi Shao ◽  
Namunu C. Maddage ◽  
Jesse S. Jin ◽  
Qi Tian

This chapter aims to provide a comprehensive survey of the technical achievements in the area of content-based music summarization and classification and to present our recent achievements. In order to give a full picture of the current status, the chapter covers the aspects of music summarization in compressed domain and uncompressed domain, music video summarization, music genre classification, and semantic region detection in acoustical music signals. By reviewing the current technologies and the demands from practical applications in music summarization and classification, the chapter identifies the directions for future research.


Author(s):  
Ankush Mittal ◽  
Cheong Loong Fah ◽  
Ashraf Kassim ◽  
Krishnan V. Pagalthivarthi

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.


Sign in / Sign up

Export Citation Format

Share Document