HEURISTIC CLASSIFICATION OF OFFICE DOCUMENTS

1994 ◽  
Vol 03 (02) ◽  
pp. 233-265 ◽  
Author(s):  
XIAOLONG HAO ◽  
JASON T.L. WANG ◽  
MICHAEL P. BIEBER ◽  
PETER A. NG

Document Processing Systems (DPSs) support office workers to manage information. Document classification is a major function of DPSs. By analyzing a document’s layout and conceptual structures, we present in this paper a sample-based approach to document classification. We represent a document’s layout structure by an ordered labeled tree through a procedure known as nested segmentation and represent the document’s conceptual structure by a set of attribute type pairs. The layout similarities between the document to be classified and sample documents are determined by a previously developed approximate tree matching toolkit. The conceptual similarities between the documents are determined by analyzing their contents and by calculating the degree of conceptual closeness. The document type is identified by computing both the layout and conceptual similarities between the document to be classified and the samples in the document sample base. Some experimental results are presented, which demonstrate the effectiveness of the proposed techniques.

2019 ◽  
Vol 39 (5) ◽  
pp. 767-781 ◽  
Author(s):  
Liang Wei ◽  
Chonggang Xu ◽  
Steven Jansen ◽  
Hang Zhou ◽  
Bradley O Christoffersen ◽  
...  

1995 ◽  
Vol 18 (1) ◽  
pp. 202-203
Author(s):  
James Steele

AbstractUnderstanding how conceptual structures inform stone tool production and use would help us resolve the issue of a pongid-hominid dichotomy in brain organisation and cognitive ability. Evidence from ideational apraxia suggests that the planning of linguistic and manipulative behaviours is not colocalized in homologous circuits. An alternative account in terms of the evolutionary expansion of the whole prefrontal-premotor area may be more plausible.


2020 ◽  
Author(s):  
Fatimah Alshamari ◽  
Abdou Youssef

Document classification is a fundamental task for many applications, including document annotation, document understanding, and knowledge discovery. This is especially true in STEM fields where the growth rate of scientific publications is exponential, and where the need for document processing and understanding is essential to technological advancement. Classifying a new publication into a specific domain based on the content of the document is an expensive process in terms of cost and time. Therefore, there is a high demand for a reliable document classification system. In this paper, we focus on classification of mathematics documents, which consist of English text and mathematics formulas and symbols. The paper addresses two key questions. The first question is whether math-document classification performance is impacted by math expressions and symbols, either alone or in conjunction with the text contents of documents. Our investigations show that Text-Only embedding produces better classification results. The second question we address is the optimization of a deep learning (DL) model, the LSTM combined with one dimension CNN, for math document classification. We examine the model with several input representations, key design parameters and decision choices, and choices of the best input representation for math documents classification.


Author(s):  
Elena Poltavskaya

The need for structural systematization to reveal and compare the conceptual framework for library forms separated into the theoretical type reflected in the ideal construct of “the Stolyarov’s library” is substantiated. The library form structure is determined in a vicarious manner through conceptual schemes. The concepts that correspond to appropriate library forms are represented as logical systems (as if the library is being established in reality) and through the schemes. The groups of the library type four elements reflect the conceptual schemes: libraries as a social institution (corresponds to public libraries) and personal libraries (individually and family used libraries). Using conceptual schemes for systematization enables to divide all the libraries, according to their structure, into two groups that differ significantly in their social mission (serving communities, or the society; and serving individuals, or individual families). Differentiating existent libraries by their conceptual structure would further enable to design a general and consistent hierarchical library classification. Structural systematization is the essential intermediate stage when developing natural classification.


Author(s):  
Collin F. Baker ◽  
Josef Ruppenhofer

The classification of verbs in Levin's (1993) English Verb Classes and Alternations: A preliminary Investigation, on the basis of both intuitive semantic grouping and their participation in valence alternations, is often used by the NLP community as evidence of the semantic similarity of verbs (Jing & McKeown 1998; Lapata & Brew 1999; Kohl et al. 1998). In this paper, we compare the Levin classification with the work of the FrameNet project (Fillmore & Baker 2001), where words (not just verbs) are grouped according to the conceptual structures (frames) that underlie them and their combinatorial patterns are inductively derived from corpus evidence. This means that verbs grouped together in FrameNet (FN) might be semantically similar but have different (or no) alternations, and that verbs which share the same alternation might be represented in two different semantic frames.


2016 ◽  
Vol 8 (2) ◽  
pp. 141-165 ◽  
Author(s):  
Karen Sullivan

Conceptual Metaphor Theory (CMT) aims to represent the conceptual structure of metaphors rather than the structure of metaphoric language. The theory does not explain which aspects of metaphoric language evoke which conceptual structures, for example. However, other theories within cognitive linguistics may be better suited to this task. These theories, once integrated, should make building a unified model of both the conceptual and linguistic aspects of metaphor possible. First, constructional approaches to syntax provide an explanation of how particular constructional slots are associated with different functions in evoking metaphor. Cognitive Grammar is especially effective in this regard. Second, Frame Semantics helps explain how the words or phrases that fill the relevant constructional slots evoke the source and target domains of metaphor. Though these theories do not yet integrate seamlessly, their combination already offers explanatory benefits, such as allowing generalizations across metaphoric and non-metaphoric language, and identifying the words that play a role in evoking metaphors, for example.


Sign in / Sign up

Export Citation Format

Share Document