Identifying Logical Structure and Content Structure in Loosely-Structured Documents

2003 ◽

Vol 11 (01) ◽

pp. 67-86 ◽

Cited By ~ 2

Author(s):

M. LALMAS ◽

T. ROLLEKE

Keyword(s):

Formal Model ◽

Logical Structure ◽

Document Retrieval ◽

Content Based Retrieval ◽

Structured Document ◽

Structured Documents ◽

Structured Document Retrieval

Structured documents are composed of objects with a content and a logical structure. The effective retrieval of structured documents requires models that provide for a content-based retrieval of objects that takes into account their logical structure, so that the relevance of an object is not solely based on its content, but also on the logical structure among objects. This paper proposes a formal model for representing structured documents where the content of an object is viewed as the knowledge contained in that object, and the logical structure among objects is capture by a process of knowledge augmentation: the knowledge contained in an object is augmented with that of its structurally related objects. The knowledge augmentation process takes into account the fact that knowledge can be incomplete and become inconsistent.

Download Full-text

Semi-Structured Document Classification

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch271 ◽

2011 ◽

pp. 1779-1786

Author(s):

Ludovic Denoyer

Keyword(s):

Machine Learning ◽

Information Sources ◽

Major Change ◽

Logical Structure ◽

Document Classification ◽

Document Collections ◽

Heterogeneous Information ◽

Structured Document ◽

Structured Documents ◽

Different Content

Document classification developed over the last ten years, using techniques originating from the pattern recognition and machine learning communities. All these methods do operate on flat text representations where word occurrences are considered independents. The recent paper (Sebastiani, 2002) gives a very good survey on textual document classification. With the development of structured textual and multimedia documents, and with the increasing importance of structured document formats like XML, the document nature is changing. Structured documents usually have a much richer representation than flat ones. They have a logical structure. They are often composed of heterogeneous information sources (e.g. text, image, video, metadata, etc). Another major change with structured documents is the possibility to access document elements or fragments. The development of classifiers for structured content is a new challenge for the machine learning and IR communities. A classifier for structured documents should be able to make use of the different content information sources present in an XML document and to classify both full documents and document parts. It should easily adapt to a variety of different sources (e.g. to different Document Type Definitions). It should be able to scale with large document collections.

Download Full-text

Semi-Structured Document Classification

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch191 ◽

2011 ◽

pp. 1015-1021

Author(s):

Ludovic Denoyer ◽

Patrick Gallinari

Keyword(s):

Machine Learning ◽

Information Sources ◽

Major Change ◽

Logical Structure ◽

Document Classification ◽

Document Collections ◽

Heterogeneous Information ◽

Structured Document ◽

Structured Documents ◽

Different Content

Document classification developed over the last 10 years, using techniques originating from the pattern recognition and machine-learning communities. All these methods operate on flat text representations, where word occurrences are considered independents. The recent paper by Sebastiani (2002) gives a very good survey on textual document classification. With the development of structured textual and multimedia documents and with the increasing importance of structured document formats like XML, the document nature is changing. Structured documents usually have a much richer representation than flat ones. They have a logical structure. They are often composed of heterogeneous information sources (e.g., text, image, video, metadata, etc.). Another major change with structured documents is the possibility to access document elements or fragments. The development of classifiers for structured content is a new challenge for the machine-learning and IR communities. A classifier for structured documents should be able to make use of the different content information sources present in an XML document and to classify both full documents and document parts. It should adapt easily to a variety of different sources (e.g., different document type definitions). It should be able to scale with large document collections.

Download Full-text

An indexing model for structured documents to support queries on content, structure and attributes

Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98- ◽

10.1109/adl.1998.670383 ◽

2002 ◽

Author(s):

Tuong Dao

Keyword(s):

Content Structure ◽

Indexing Model ◽

Structured Documents

Download Full-text

Logical structure analysis and generation for structured documents: A syntactic approach

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2003.1232278 ◽

2003 ◽

Vol 15 (5) ◽

pp. 1277-1294 ◽

Cited By ~ 7

Author(s):

Kyong-Ho Lee ◽

Yoon-Chul Choy ◽

Sung-Bae Cho

Keyword(s):

Structure Analysis ◽

Logical Structure ◽

Syntactic Approach ◽

Structured Documents

Download Full-text

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

10.18653/v1/2021.nllp-1.15 ◽

2021 ◽

Author(s):

Yuta Koreeda ◽

Christopher Manning

Keyword(s):

Logical Structure ◽

Structured Documents

Download Full-text

Logical structure based semantic relationship extraction from semi-structured documents

Proceedings of the 15th international conference on World Wide Web - WWW '06 ◽

10.1145/1135777.1136016 ◽

2006 ◽

Cited By ~ 1

Author(s):

Zhang Kuo ◽

Wu Gang ◽

Li JuanZi

Keyword(s):

Logical Structure ◽

Semantic Relationship ◽

Relationship Extraction ◽

Structured Documents

Download Full-text

Marking Up is Not Enough

Methods of Information in Medicine ◽

10.1055/s-0038-1634939 ◽

1993 ◽

Vol 32 (04) ◽

pp. 272-273 ◽

Cited By ~ 3

Author(s):

A. L. Rector

Keyword(s):

Health Care ◽

Intelligent Processing ◽

Structured Documents ◽

Electronic Health ◽

Health Care Records

Response to: Essin DJ. Intelligent processing of loosely structured documents as a strategy for organizing electronic health care records. Meth Inform Med 1993; 32: 265.

Download Full-text

Intelligent Processing of Loosely Structured Documents as a Strategy for Organizing Electronic Health Care Records

Methods of Information in Medicine ◽

10.1055/s-0038-1634938 ◽

1993 ◽

Vol 32 (04) ◽

pp. 265-268 ◽

Cited By ~ 11

Author(s):

D. J. Essin

Keyword(s):

Health Care ◽

Health Care Organization ◽

Database Systems ◽

Relevant Information ◽

Full Potential ◽

Care Organization ◽

Intelligent Processing ◽

Data Interchange ◽

Structured Documents ◽

Health Care Records

AbstractLoosely structured documents can capture more relevant information about medical events than is possible using today’s popular databases. In order to realize the full potential of this increased information content, techniques will be required that go beyond the static mapping of stored data into a single, rigid data model. Through intelligent processing, loosely structured documents can become a rich source of detailed data about actual events that can support the wide variety of applications needed to run a health-care organization, document medical care or conduct research. Abstraction and indirection are the means by which dynamic data models and intelligent processing are introduced into database systems. A system designed around loosely structured documents can evolve gracefully while preserving the integrity of the stored data. The ability to identify and locate the information contained within documents offers new opportunities to exchange data that can replace more rigid standards of data interchange.

Download Full-text

Library Blogs of Selected Kendriya Vidyalayas of Kerala Region : a State-of-the-Art Study of School Library Blogs in Terms of Theme, Content, Structure and Web 2.0 Tools Used

International Journal of Scientific Research ◽

10.15373/22778179/feb2014/77 ◽

2012 ◽

Vol 3 (2) ◽

pp. 231-234

Author(s):

Ramasamy, K Ramasamy, K ◽

◽

Padma, P Padma, P

Keyword(s):

Web 2.0 ◽

State Of The Art ◽

School Library ◽

Web 2.0 Tools ◽

Content Structure

Download Full-text

Identifying Logical Structure and Content Structure in Loosely-Structured Documents

FOUR-VALUED KNOWLEDGE AUGMENTATION FOR STRUCTURED DOCUMENT RETRIEVAL

Semi-Structured Document Classification

Semi-Structured Document Classification

An indexing model for structured documents to support queries on content, structure and attributes

Logical structure analysis and generation for structured documents: A syntactic approach

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

Logical structure based semantic relationship extraction from semi-structured documents

Marking Up is Not Enough

Intelligent Processing of Loosely Structured Documents as a Strategy for Organizing Electronic Health Care Records

Library Blogs of Selected Kendriya Vidyalayas of Kerala Region : a State-of-the-Art Study of School Library Blogs in Terms of Theme, Content, Structure and Web 2.0 Tools Used

Export Citation Format