ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

2009 ◽  
Vol 18 (05) ◽  
pp. 673-695 ◽  
Author(s):  
ERMELINDA ORO ◽  
MASSIMO RUFFOLO ◽  
DOMENICO SACCÀ

Information extraction is of paramount importance in several real world applications in the areas of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that pose many issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.

2020 ◽  
Vol 34 (04) ◽  
pp. 5331-5338
Author(s):  
Urvashi Oswal ◽  
Aniruddha Bhargava ◽  
Robert Nowak

This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback which can achieve regret over time horizon T that scales like k√T, without prior knowledge of which features are relevant nor the number k of relevant features. In comparison, the regret of traditional linear bandits is d√T, where d is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if k ≪ d. The computational complexity of the algorithm is proportional to k rather than d, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the algorithm with synthetic and real human-labeled data.


Crystals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 256
Author(s):  
Christian Rodenbücher ◽  
Kristof Szot

Transition metal oxides with ABO3 or BO2 structures have become one of the major research fields in solid state science, as they exhibit an impressive variety of unusual and exotic phenomena with potential for their exploitation in real-world applications [...]


Sign in / Sign up

Export Citation Format

Share Document