Structure- and Content-Based Retrieval for XML Documents

Recently it is common for users to acquire through the World Wide Web a variety of multimedia documents. As the number of Web documents is dramatically increasing, we need to develop a multimedia document retrieval system that can support both structure-based retrieval and content-based retrieval. In order to support structure-based retrieval, we design efficient index structures (i.e., keyword, structure, element and attribute) and implement those by using the o2store storage system. For the content-based retrieval, we implement high-dimensional index structure for color and shape feature that is based on X-tree. Finally, we do the performance evaluation of our multimedia document retrieval system in terms of system efficiency, such as retrieval time, insertion time and storage overhead, as well as system effectiveness, such as recall and precision measures.

Download Full-text

A Multimedia Document Retrieval System Supporting Structure- and Content-Based Retrieval

Design and Management of Multimedia Information Systems ◽

10.4018/978-1-930708-00-6.ch008 ◽

2011 ◽

pp. 152-164

Author(s):

Du-Seok Jin ◽

Jae-Woo Chang

Keyword(s):

Retrieval System ◽

Storage System ◽

Document Retrieval ◽

Index Structure ◽

Multimedia Document ◽

Content Based Retrieval ◽

Web Documents ◽

Storage Overhead ◽

And Storage ◽

System Effectiveness

Recently it is common for users to acquire through the World Wide Web a variety of multimedia documents. As the number of Web documents is dramatically increasing, we need to develop a multimedia document retrieval system that can support both structure-based retrieval and content-based retrieval. In order to support structure-based retrieval, we design efficient index structures (i.e., keyword, structure, element and attribute) and implement those by using the o2store storage system. For the content-based retrieval, we implement high-dimensional index structure for color and shape feature that is based on X-tree. Finally, we do the performance evaluation of our multimedia document retrieval system in terms of system efficiency, such as retrieval time, insertion time and storage overhead, as well as system effectiveness, such as recall and precision measures.

Download Full-text

Structure- and Content-Based Retrieval for XML Documents

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch473 ◽

2005 ◽

pp. 2662-2664

Author(s):

Jae-Woo Chang

Keyword(s):

Expressive Power ◽

Basic Element ◽

Multimedia Data ◽

Index Structure ◽

Markup Language ◽

Content Based Retrieval ◽

Document Structure ◽

Xml Documents ◽

Xml Document ◽

Extensible Markup

The XML was proposed as a standard markup language to make Web documents in 1996 (Extensible Markup Language, 2000). It has as good an expressive power as SGML and is easy to use like HTML. Recently, it has been common for users to acquire through the Web a variety of multimedia documents written by XML. Meanwhile, because the number of XML documents is dramatically increasing, it is difficult to reach a specific XML document required by users. Moreover, an XML document not only has a logical and hierarchical structure in common, but also contains its multimedia data, such as image and video. Thus, it is necessary to retrieve XML documents based on both document structure and image content. For supporting the structure-based retrieval, it is necessary to design four efficient index structures, that is, keyword, structure, element, and attribute index, by indexing XML documents using a basic element unit. For supporting the content-based retrieval, it is necessary to design a high-dimensional index structure so as to store and retrieve both color and shape feature vectors efficiently.

Download Full-text

Developing an XML Document Retrieval System for a Digital Museum

Computational Science and Its Applications – ICCSA 2005 - Lecture Notes in Computer Science ◽

10.1007/11424758_9 ◽

2005 ◽

pp. 77-86

Author(s):

Jae-Woo Chang

Keyword(s):

Retrieval System ◽

Document Retrieval ◽

Digital Museum ◽

Xml Document

Download Full-text

XML Document Retrieval System Based on Document Structure and Image Content for Digital Museum

Lecture Notes in Computer Science - Advanced Web and Network Technologies, and Applications ◽

10.1007/11610496_12 ◽

2006 ◽

pp. 107-111 ◽

Cited By ~ 1

Author(s):

Jae-Woo Chang ◽

Yeon-Jung Kim

Keyword(s):

Retrieval System ◽

Document Retrieval ◽

Image Content ◽

Digital Museum ◽

Document Structure ◽

Xml Document

Download Full-text

XML Document Retrieval System Supporting Multimedia Web Service for Digital Museum

IEEE International Conference on Web Services (ICWS 2007) ◽

10.1109/icws.2007.198 ◽

2007 ◽

Author(s):

Jae-Woo Chang ◽

Young-jin Kim

Keyword(s):

Web Service ◽

Retrieval System ◽

Document Retrieval ◽

Digital Museum ◽

Xml Document

Download Full-text

An XML Document Retrieval System Supporting Structure- and Content-Based Queries

Conceptual Modeling for New Information Systems Technologies - Lecture Notes in Computer Science ◽

10.1007/3-540-46140-x_25 ◽

2002 ◽

pp. 320-333

Author(s):

Jae-Woo Chang

Keyword(s):

Retrieval System ◽

Document Retrieval ◽

Supporting Structure ◽

Xml Document

Download Full-text

Efficient Compression and Storage of XML OLAP Cubes

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015070101 ◽

2015 ◽

Vol 11 (3) ◽

pp. 1-25

Author(s):

Doulkifli Boukraa ◽

Mohammed Amin Bouchoukh ◽

Omar Boussaid

Keyword(s):

The Other ◽

Compression Technique ◽

Xml Documents ◽

The Third ◽

Query Response Time ◽

Xml Document ◽

Before And After ◽

The One ◽

And Storage ◽

Basic Configuration

In this paper, the authors present an approach to efficiently compress XML OLAP cubes. They propose a multidimensional snowflake schema of the cube as the basic physical configuration. The cube is then composed of one XML fact document and as many XML documents as the dimension hierarchy members. The basic configuration is reorganized into two ways by adding data redundancy on purpose in order to achieve a better compression ratio on the one hand and to improve query response time on the other hand. In the second configuration, all the documents of the cube are merged into one single XML document. In the third configuration, each reference between the fact and the dimensions or between the members of a dimension hierarchy is replaced by the whole XML referenced fragments. To the three physical configurations of the cube, the authors apply a new compression technique named XCC. They demonstrate the efficiency of the third configuration before and after compression and they also show the efficiency of their compression technique when applied to XML OLAP cubes.

Download Full-text

Improving Document Retrieval with Spelling Correction for Weak and Fabricated Indonesian-Translated Hadith

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1913 ◽

2020 ◽

Vol 4 (3) ◽

pp. 551-557

Author(s):

Muhammad zaky ramadhan ◽

Kemas Muslim Lhaksmana

Keyword(s):

Retrieval System ◽

Islamic Law ◽

Vector Space Model ◽

Document Retrieval ◽

The Internet ◽

Average Precision ◽

Spelling Correction ◽

Space Model ◽

The Mean ◽

The Web

Hadith has several levels of authenticity, among which are weak (dhaif), and fabricated (maudhu) hadith that may not originate from the prophet Muhammad PBUH, and thus should not be considered in concluding an Islamic law (sharia). However, many such hadiths have been commonly confused as authentic hadiths among ordinary Muslims. To easily distinguish such hadiths, this paper proposes a method to check the authenticity of a hadith by comparing them with a collection of fabricated hadiths in Indonesian. The proposed method applies the vector space model and also performs spelling correction using symspell to check whether the use of spelling check can improve the accuracy of hadith retrieval, because it has never been done in previous works and typos are common on Indonesian-translated hadiths on the Web and social media raw text. The experiment result shows that the use of spell checking improves the mean average precision and recall to become 81% (from 73%) and 89% (from 80%), respectively. Therefore, the improvement in accuracy by implementing spelling correction make the hadith retrieval system more feasible and encouraged to be implemented in future works because it can correct typos that are common in the raw text on the Internet.

Download Full-text

Document Retrieval System Tolerant of Segmentation Errors of Document Images

Ninth International Workshop on Frontiers in Handwriting Recognition ◽

10.1109/iwfhr.2004.36 ◽

2004 ◽

Cited By ~ 3

Author(s):

T. Nagasaki ◽

T. Takahashi ◽

K. Marukawa

Keyword(s):

Retrieval System ◽

Document Retrieval ◽

Document Images

Download Full-text