Machine learning for information extraction from topographic maps

Author(s):  
Donato Malerba ◽  
Floriana Esposito ◽  
Antonietta Lanza ◽  
Francesca A Lisi
Author(s):  
Neil Ireson ◽  
Fabio Ciravegna ◽  
Mary Elaine Califf ◽  
Dayne Freitag ◽  
Nicholas Kushmerick ◽  
...  

2020 ◽  
pp. 1-21 ◽  
Author(s):  
Clément Dalloux ◽  
Vincent Claveau ◽  
Natalia Grabar ◽  
Lucas Emanuel Silva Oliveira ◽  
Claudia Maria Cabral Moro ◽  
...  

Abstract Automatic detection of negated content is often a prerequisite in information extraction systems in various domains. In the biomedical domain especially, this task is important because negation plays an important role. In this work, two main contributions are proposed. First, we work with languages which have been poorly addressed up to now: Brazilian Portuguese and French. Thus, we developed new corpora for these two languages which have been manually annotated for marking up the negation cues and their scope. Second, we propose automatic methods based on supervised machine learning approaches for the automatic detection of negation marks and of their scopes. The methods show to be robust in both languages (Brazilian Portuguese and French) and in cross-domain (general and biomedical languages) contexts. The approach is also validated on English data from the state of the art: it yields very good results and outperforms other existing approaches. Besides, the application is accessible and usable online. We assume that, through these issues (new annotated corpora, application accessible online, and cross-domain robustness), the reproducibility of the results and the robustness of the NLP applications will be augmented.


Author(s):  
SANDA M. HARABAGIU

This paper presents a novel methodology of disambiguating prepositional phrase attachments. We create patterns of attachments by classifying a collection of prepositional relations derived from Treebank parses. As a by-product, the arguments of every prepositional relation are semantically disambiguated. Attachment decisions are generated as the result of a learning process, that builds upon some of the most popular current statistical and machine learning techniques. We have tested this methodology on (1) Wall Street Journal articles, (2) textual definitions of concepts from a dictionary and (3) an ad hoc corpus of Web documents, used for conceptual indexing and information extraction.


Author(s):  
Felicitas Löffler ◽  
Birgitta König-Ries

Semantic annotations of datasets are very useful to support quality assurance, discovery, interpretability, linking and integration of datasets. However, providing such annotations manually is often a time-consuming task . If the process is to be at least partially automated and still provide good semantic annotations, precise information extraction is needed. The recognition of entity names (e.g., person, organization, location) from textual resources is the first step before linking the identified term or phrase to other semantic resources such as concepts in ontologies. A multitude of tools and techniques have been developed for information extraction. One of the big players is the text mining framework GATE (Cunningham et al. 2013) that supports annotation rules, semantic techniques and machine learning approaches. We will run GATE's default ANNIE pipeline on collection datasets to automatically detect persons, locations and time. We will also present extensions to extract organisms (Naderi et al. 2011), environmental terms, data parameters and biological processes and how to link them to ontologies and LOD resources, e.g., DBPedia (Sateli and Witte 2015). We would like to discuss the results with the conference participants and welcome comments and feedbacks on the current solution. The audience is also welcome to provide their own datasets in preparation for this session.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Philipp Maximilian Müller ◽  
Philipp Päuser ◽  
Björn-Martin Kurzrock

PurposeThis research provides fundamentals for generating (partially) automated standardized due diligence reports. Based on original digital building documents from (institutional) investors, the potential for automated information extraction through machine learning algorithms is demonstrated. Preferred sources for key information of technical due diligence reports are presented. The paper concludes with challenges towards an automated information extraction in due diligence processes.Design/methodology/approachThe comprehensive building documentation including n = 8,339 digital documents of 14 properties and 21 technical due diligence reports serve as a basis for identifying key information. To structure documents for due diligence, 410 document classes are derived and documents principally checked for machine readability. General rules are developed for prioritized document classes according to relevance and machine readability of documents.FindingsThe analysis reveals that a substantial part of all relevant digital building documents is poorly suited for automated information extraction. The availability and content of documents vary greatly from owner to owner and between document classes. The prioritization of document classes according to machine readability reveals potentials for using artificial intelligence in due diligence processes.Practical implicationsThe paper includes recommendations for improving the machine readability of documents and indicates the potential for (partially) automating due diligence processes. Therefore, document classes are derived, reviewed and prioritized. Transaction risks can be countered by an automated check for completeness of relevant documents.Originality/valueThis paper is the first published (empirical) research to specifically assess the automated digital processing of due diligence reports. The findings are helpful for improving due diligence processes and, more generally, promoting the use of machine learning in the property sector.


Sign in / Sign up

Export Citation Format

Share Document