Multimedia Storage and Retrieval Innovations for Digital Library Systems

Comparing Repository Types

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch017 ◽

2012 ◽

pp. 329-341

Author(s):

Chris Armbruster ◽

Laurent Romary

Keyword(s):

Scholarly Communication ◽

Institutional Repository ◽

High Quality ◽

World Regions ◽

New Knowledge ◽

Use Of Services ◽

The Subject ◽

Key Dimensions

After two decades of repository development, some conclusions may be drawn as to which type of repository and what kind of service best supports digital scholarly communication. In this regard, four types of publication repository may be distinguished, namely the subject-based repository, research repository, national repository system, and institutional repository. Two important shifts in the role of repositories may be noted and in regard to content, a well-defined and high quality corpus is essential. This implies that repository services are likely to be most successful when constructed with the user and reader in mind. With regard to service, high value to specific scholarly communities is essential. This implies that repositories are likely to be most useful to scholars when they offer dedicated services supporting the production of new knowledge. Along these lines, challenges and barriers to repository development may be identified in three key dimensions, i.e., identification and deposit of content, access and use of services, and preservation of content and sustainability of service. An indicative comparison of challenges and barriers in some major world regions is offered.

Download Full-text

Nested Partitions Properties for Spatial Content Image Retrieval

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch013 ◽

2012 ◽

pp. 240-269

Author(s):

Dmitry Kinoshenko ◽

Vladimir Mashtalir ◽

Vladislav Shlyakhov ◽

Elena Yegorova

Keyword(s):

Image Retrieval ◽

Hierarchical Models ◽

Search Algorithms ◽

Content Based Image Retrieval ◽

Image Search ◽

Nested Partitions ◽

Specific Search ◽

Spatial Content

In this paper, a metric on partitions of arbitrary measurable sets and its special properties for metrical content-based image retrieval based on the ‘spatial’ semantic of images is proposed. This approach considers images represented in the form of nested partitions produced by any segmentations, which are used to express a degree of information refinement or roughening. In doing so, this not only corresponds to rational content control but also ensures creation of specific search algorithms (e.g., invariant to image background) and synthesizes hierarchical models of image search by reducing the number of query and database elements match operations.

Download Full-text

3D Face Reconstruction from Two Orthogonal Images for Face Recognition Applications

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch012 ◽

2012 ◽

pp. 223-239

Author(s):

Stefano Berretti ◽

Alberto Del Bimbo ◽

Pietro Pala

Keyword(s):

Face Recognition ◽

Control Points ◽

3D Face Reconstruction ◽

3D Face ◽

Face Reconstruction ◽

Face Images ◽

Model Control ◽

3D Geometry ◽

The Face ◽

Manual Assistance

In this paper, an original hybrid 2D-3D face recognition approach is proposed using two orthogonal face images, frontal and side views of the face, to reconstruct the complete 3D geometry of the face. This is obtained using a model based solution, in which a 3D template face model is morphed according to the correspondence of a limited set of control points identified on the frontal and side images in addition to the model. Control points identification is driven by an Active Shape Model applied to the frontal image, whereas subsequent manual assistance is required for control points localization on the side view. The reconstructed 3D model is finally matched, using the iso-geodesic regions approach against a gallery of 3D face scans for the purpose of face recognition. Preliminary experimental results are provided on a small database showing the viability of the approach.

Download Full-text

A Content-Driven System Architecture for Tackling Automatic Cataloging of Animated Movie Databases

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch006 ◽

2012 ◽

pp. 101-123

Author(s):

Bogdan Ionescu ◽

Alexandru Marin ◽

Patrick Lambert ◽

Didier Coquin ◽

Constantin Vertan

Keyword(s):

Interactive Video ◽

Video Retrieval ◽

Semantic Content ◽

Detection Methods ◽

Temporal Segmentation ◽

Video Information ◽

Driven System ◽

Linguistic Approach ◽

Content Annotation ◽

Fuzzy Linguistic

This article discusses content-based access to video information in large video databases and particularly, to retrieve animated movies. The authors examine temporal segmentation, and propose cut, fade and dissolve detection methods adapted to the constraints of this domain. Further, the authors discuss a fuzzy linguistic approach for deriving automatic symbolic/semantic content annotation in terms of color techniques and action content. The proposed content descriptions are then used with several data mining techniques (SVM, k-means) to automatically retrieve the animation genre and to classify animated movies according to some color techniques. The authors integrate all the previous techniques to constitute a prototype client-server architecture for a 3D virtual environment for interactive video retrieval.

Download Full-text

First Person Singular

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch002 ◽

2012 ◽

pp. 22-40

Author(s):

Shaoqun Wu ◽

Ian H. Witten

Keyword(s):

Second Language ◽

Language Learning ◽

Language Learners ◽

Digital Library ◽

Second Language Learning ◽

First Person ◽

Library Collection ◽

The Impact ◽

Library Technology ◽

The Web

We use digital library technology to help language learners express themselves by capitalizing on the human-generated text available on the Web. From a massive collection of n-grams and their occurrence frequencies we extract sequences that begin with the word “I”, sequences that begin a question, and sequences containing statistically significant collocations. These are preprocessed, filtered, and organized as a digital library collection using the Greenstone software. Users can search the collection to see how particular words are typically used and browse by syntactic class. The digital library is richly interconnected to other resources. It includes links to external vocabularies and thesauri so that users can retrieve words related to any term of interest, and links the collection to the web by locating sample sentences containing these patterns and presenting them to the user. We have conducted an evaluation of how useful the system is in helping students, and the impact it has on their writing. Finally, language activities generated from the digital library content have been designed to help learners master important emotion related vocabulary and expressions. We predict that the application of digital library technology to assist language students will revolutionize second language learning.

Download Full-text

Beyond Institutional Repositories

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch003 ◽

2012 ◽

pp. 41-58

Author(s):

Laurent Romary ◽

Chris Armbruster

Keyword(s):

Current System ◽

Critical Mass ◽

Scientific Communication ◽

Research Field ◽

Scholarly Community ◽

Specific Research ◽

Institutional Repositories ◽

Research Organisation ◽

Academic Settings ◽

Large Research

The current system of so-called institutional repositories, even if it was a sensible response at an earlier stage, may not answer the needs of the scholarly community, scientific communication and accompanied stakeholders in a sustainable way. However, having a robust repository infrastructure is essential to academic work. Yet, current institutional solutions, even when networked in a country or across Europe, have largely failed to deliver. Consequently, a new path for a more robust infrastructure and larger repositories is explored to create superior services that support the academy. A future organisation of publication repositories is advocated that is based on macroscopic academic settings providing a critical mass of interest as well as organisational coherence. Such a macro-unit may be geographical (a coherent national scheme), institutional (a large research organisation or a consortium thereof) or thematic (a specific research field organising itself in the domain of publication repositories).

Download Full-text

Sampling the Web as Training Data for Text Classification

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch015 ◽

2012 ◽

pp. 293-310

Author(s):

Wei-Yen Day ◽

Chun-Yi Chi ◽

Ruey-Cheng Chen ◽

Pu-Jen Cheng

Keyword(s):

Data Acquisition ◽

Text Classification ◽

Document Classification ◽

The Other ◽

Training Data ◽

Class A ◽

Classifier Performance ◽

Series Of Experiments ◽

The Common ◽

The Web

Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always be available to research workers. In this paper, the authors look into possibilities to automatically collect training data by sampling the Web with a set of given class names. The basic idea is to populate appropriate keywords and submit them as queries to search engines for acquiring training data. The first of two methods presented in this paper is based on sampling the common concepts among classes and the other is based on sampling the discriminative concepts for each class. A series of experiments were carried out independently on two different datasets and results show that the proposed methods significantly improve classifier performance even without using manually labeled training data. The authors’ strategy for retrieving Web samples substantially helps in the conventional document classification in terms of accuracy and efficiency.

Download Full-text

Investigating Language Skills and Field of Knowledge on Multilingual Information Access in Digital Libraries

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch005 ◽

2012 ◽

pp. 85-100

Author(s):

Paul Clough ◽

Irene Eleta

Keyword(s):

Digital Libraries ◽

Linguistic Diversity ◽

Language Skills ◽

Information Access ◽

Arts And Humanities ◽

The Arts ◽

Cross Language Information Retrieval ◽

Physical Barriers ◽

Retrieval Language ◽

Cross Language

Digital libraries remove physical barriers to accessing information, but the language barrier still remains due to multilingual collections and the linguistic diversity of users. This study aims at understanding the effect of users’ language skills and field of knowledge on their language preferences when searching for information online and to provide new insights on the access to multilingual digital libraries. Both quantitative and qualitative data were gathered using a questionnaire and results show that the language skills and the field of knowledge have an impact on the language choice for searching online. These factors also determine the interest in cross-language information retrieval: language-related fields constitute the best potential group of users, followed by the Arts and Humanities and the Social Sciences.

Download Full-text

Logical Structure Recovery in Scholarly Articles with Rich Document Features

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch014 ◽

2012 ◽

pp. 270-292 ◽

Cited By ~ 11

Author(s):

Minh-Thang Luong ◽

Thuy Dung Nguyen ◽

Min-Yen Kan

Keyword(s):

Digital Library ◽

Digital Libraries ◽

Character Recognition ◽

Optical Character Recognition ◽

Conditional Random Fields ◽

Logical Structure ◽

Text Representation ◽

Font Size ◽

Document Structure ◽

Structure Detection

Scholarly digital libraries increasingly provide analytics to information within documents themselves. This includes information about the logical document structure of use to downstream components, such as search, navigation, and summarization. In this paper, the authors describe SectLabel, a module that further develops existing software to detect the logical structure of a document from existing PDF files, using the formalism of conditional random fields. While previous work has assumed access only to the raw text representation of the document, a key aspect of this work is to integrate the use of a richer representation of the document that includes features from optical character recognition (OCR), such as font size and text position. Experiments reveal that using such rich features improves logical structure detection by a significant 9 F1 points, over a suitable baseline, motivating the use of richer document representations in other digital library applications.

Download Full-text

Understanding Digital Documents Using Gestalt Properties of Isothetic Components

Multimedia Storage and Retrieval Innovations for Digital Library Systems ◽

10.4018/978-1-4666-0900-6.ch010 ◽

2012 ◽

pp. 183-207

Author(s):

Shyamosree Pal ◽

Partha Bhowmick ◽

Arindam Biswas ◽

Bhargab B. Bhattacharya

Keyword(s):

Holistic Approach ◽

Document Analysis ◽

Human Mind ◽

Document Image ◽

Regions Of Interest ◽

Major Constituent ◽

Digital Documents ◽

Benchmark Datasets ◽

Gestalt Laws ◽

Document Page

This paper introduces how Gestalt properties can be used for identifying various components in a document image. That the human mind makes a holistic approach to vision rather than a disintegrated approach is shown to be useful for document analysis. Since the major constituent components (textual or non-textual) in a document page are arranged in a rectilinear fashion, rectilinear/isothetic decomposition of different components are made on a document page. After representing the page as a feature set of its polygonal covers corresponding to the distinct regions of interest, each polygon is iteratively decomposed into the sub-polygons tightly enclosing the corresponding sub-components to capture the overall information as well as the necessary details to the desired level of precision. Subsequently, these components and sub-components are analyzed using Gestalt laws/properties, which have been explained in detail in the context of this work. Text regions, tabular structures, and various graphic objects readily admit some of the Gestalt properties. We have tested our algorithm on several benchmark datasets, and some relevant results have been produced here to demonstrate the effectiveness and elegance of the proposed method.

Download Full-text

Multimedia Storage and Retrieval Innovations for Digital Library Systems
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Comparing Repository Types

Nested Partitions Properties for Spatial Content Image Retrieval

3D Face Reconstruction from Two Orthogonal Images for Face Recognition Applications

A Content-Driven System Architecture for Tackling Automatic Cataloging of Animated Movie Databases

First Person Singular

Beyond Institutional Repositories

Sampling the Web as Training Data for Text Classification

Investigating Language Skills and Field of Knowledge on Multilingual Information Access in Digital Libraries

Logical Structure Recovery in Scholarly Articles with Rich Document Features

Understanding Digital Documents Using Gestalt Properties of Isothetic Components

Export Citation Format

Multimedia Storage and Retrieval Innovations for Digital Library SystemsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Comparing Repository Types

Nested Partitions Properties for Spatial Content Image Retrieval

3D Face Reconstruction from Two Orthogonal Images for Face Recognition Applications

A Content-Driven System Architecture for Tackling Automatic Cataloging of Animated Movie Databases

First Person Singular

Beyond Institutional Repositories

Sampling the Web as Training Data for Text Classification

Investigating Language Skills and Field of Knowledge on Multilingual Information Access in Digital Libraries

Logical Structure Recovery in Scholarly Articles with Rich Document Features

Understanding Digital Documents Using Gestalt Properties of Isothetic Components

Multimedia Storage and Retrieval Innovations for Digital Library Systems
Latest Publications