Semantic Annotation of Text Using Open Semantic Resources

Author(s):  
Stefano Pacifico ◽  
Janez Starc ◽  
Janez Brank ◽  
Luka Bradesko ◽  
Marko Grobelnik
Author(s):  
Felicitas Löffler ◽  
Birgitta König-Ries

Semantic annotations of datasets are very useful to support quality assurance, discovery, interpretability, linking and integration of datasets. However, providing such annotations manually is often a time-consuming task . If the process is to be at least partially automated and still provide good semantic annotations, precise information extraction is needed. The recognition of entity names (e.g., person, organization, location) from textual resources is the first step before linking the identified term or phrase to other semantic resources such as concepts in ontologies. A multitude of tools and techniques have been developed for information extraction. One of the big players is the text mining framework GATE (Cunningham et al. 2013) that supports annotation rules, semantic techniques and machine learning approaches. We will run GATE's default ANNIE pipeline on collection datasets to automatically detect persons, locations and time. We will also present extensions to extract organisms (Naderi et al. 2011), environmental terms, data parameters and biological processes and how to link them to ontologies and LOD resources, e.g., DBPedia (Sateli and Witte 2015). We would like to discuss the results with the conference participants and welcome comments and feedbacks on the current solution. The audience is also welcome to provide their own datasets in preparation for this session.


Author(s):  
Stefano Pacifico ◽  
Janez Starc ◽  
Janez Brank ◽  
Luka Bradesko ◽  
Marko Grobelnik

Author(s):  
Dominik Röpert ◽  
Fabian Reimeier ◽  
Jörg Holetschek ◽  
Anton Güntsch

Herbarium specimens have been digitized at the Botanical Garden and Botanical Museum, Berlin (BGBM) since the year 2000. As part of the digitization process, specimen data have been recorded manually for specific basic data elements. Additional elements were usually added later based on the digital images. During the last twenty years, data were transcribed exactly as they were written on the labels, a widely used procedure in European herbaria. This approach led to a large number of orthographic variations especially with regard to person and place names. To improve interoperability between records within our own collection database and across collection databases provided by the community, we have started to enrich our metadata with Linked Open Data (LOD)-based links to semantic resources starting with collectors and geographic entities. Preferred resources for semantic enrichment (e.g., WikiData, GeoNames) have been agreed on by members of the Consortium of European Taxonomic Facilities (CETAF) in order to exploit the potential of semantically enriched collection data in the best possible way. To be able to annotate many collection records in a relatively short time, priority was given to concepts (e.g., specific collector names) that occur on many specimen labels and that have an existing and easy-to-find semantic representation in an external resource. With this approach, we were able to annotate 52,000 specimen records in just a few weeks of working time of a student assistant. The integration of our semantic annotation workflows with other data integration, cleaning, and import processes at the BGBM is carried out using an OpenRefine-based platform with specific extensions for services and functions related to label transcription activities (Kirchhoff et al. 2018). Our semantically enriched collection data will contribute to a “Botany Pilot,” which is presently being developed by member organizations of CETAF to demonstrate the potential of Linked Open Collection Data and their integration with existing semantic resources.


2015 ◽  
Author(s):  
Yudai Kamioka ◽  
Kazuya Narita ◽  
Junta Mizuno ◽  
Miwa Kanno ◽  
Kentaro Inui
Keyword(s):  

2014 ◽  
Vol 24 (10) ◽  
pp. 2405-2418 ◽  
Author(s):  
Feng TIAN ◽  
Xu-Kun SHEN

Author(s):  
Stephen Dill ◽  
Nadav Eiron ◽  
David Gibson ◽  
Daniel Gruhl ◽  
R. Guha ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document