scholarly journals An Approach to Extracting Knowledge From Legacy Documents

Author(s):  
Richard Crowder ◽  
Yee-Wie Sim

Organisations are increasingly information intensive; hence providing access to data that is trapped in various proprietary forms including catalogues, databases, human resource systems and internally generated documents is now becoming a significant and challenging task. The authors have undertaken research into approaches to capture relevant knowledge from legacy documents. This is achieved by converting the legacy documents to XML, (eXtensible Markup Language), documents where the output is semantically tagged. Once in an XML form, the data can be easily transformed. This paper describes the development of tools to automate the process of converting legacy documents to XML documents. The purpose of this work is improve the efficiency and reliability of Expertise Finder suitable for use within an engineering design environment. We will also show that by querying the resultant XML versions of legacy documents provides better results than a basic text search over the identical documents when applied used within an Expertise Finder.

Author(s):  
Badya Al-Hamadani ◽  
Joan Lu

The eXtensible Markup Language (XML) is a World Wide Web Consortium (W3C) recommendation which has widely been used in both commerce and research. As the importance of XML documents increase, the need to deal with these documents increases as well. This chapter illustrates the methodology that has been used throughout the research, discussing all its parts and how these parts were adopted in the research.


Author(s):  
Mohammad Moradi ◽  
MohammadReza Keyvanpour

Since the early days of introducing eXtensible Markup Language (XML), owing to its expressive capabilities and flexibilities, it became the defacto standard for representing, storing, and interchanging data on the Web. Such features have made XML one of the building blocks of the Semantic Web. From another viewpoint, since XML documents could be considered from content, structural, and semantic aspects, leveraging their semantics is very useful and applicable in different domains. However, XML does not by itself introduce any built-in mechanisms for governing semantics. For this reason, many studies have been conducted on the representation of semantics within/from XML documents. This paper studies and discusses different aspects of the mentioned topic in the form of an overview with an emphasis on the state of semantics in XML and its presentation methods.


2012 ◽  
Author(s):  
Ren Hui Gong ◽  
Ziv Yaniv

The Insight Segmentation and Registration Toolkit (ITK) previously provided a framework for parsing Extensible Markup Language (XML) documents using the Simple API for XML (SAX) framework. While this programming model is memory efficient, it places most of the implementation burden on the user. We provide an implementation of the Document Object Model (DOM) framework for parsing XML documents. Using this model, user code is greatly simplified, shifting most of the implementation burden from the user to the framework. The provided implementation consists of two tiers. The lower level tier provides functionality for parsing XML documents and loading the tree structure into memory. It then allows the user to query and retrieve specific entries. The upper tier uses this functionality to provide an interface for mimicking a serialization and de-serialization mechanism for ITK objects. The implementation described in this document was incorporated into ITK as part of release 4.2.


2000 ◽  
Vol 39 (01) ◽  
pp. 50-55 ◽  
Author(s):  
S. Yamazaki ◽  
Y. Satomura

Abstract:A Template Definition Language (TDL) was developed to share knowledge of how to construct an electronic patient record (EPR) template. Based on the extensible markup language XML, TDL has been designed to be independent of EPR platforms or databases. Our research of TDL was conducted through evaluation of the description of various templates in the currently available EPRs and through comparisons with some electronic clinical guidelines. We conclude that TDL is sufficient for the objective but still needs improvement of the algorithm for describing dynamic changes.


Sign in / Sign up

Export Citation Format

Share Document