Semantic Annotation of Text Using Open Semantic Resources

Semantic annotations of datasets are very useful to support quality assurance, discovery, interpretability, linking and integration of datasets. However, providing such annotations manually is often a time-consuming task . If the process is to be at least partially automated and still provide good semantic annotations, precise information extraction is needed. The recognition of entity names (e.g., person, organization, location) from textual resources is the first step before linking the identified term or phrase to other semantic resources such as concepts in ontologies. A multitude of tools and techniques have been developed for information extraction. One of the big players is the text mining framework GATE (Cunningham et al. 2013) that supports annotation rules, semantic techniques and machine learning approaches. We will run GATE's default ANNIE pipeline on collection datasets to automatically detect persons, locations and time. We will also present extensions to extract organisms (Naderi et al. 2011), environmental terms, data parameters and biological processes and how to link them to ontologies and LOD resources, e.g., DBPedia (Sateli and Witte 2015). We would like to discuss the results with the conference participants and welcome comments and feedbacks on the current solution. The audience is also welcome to provide their own datasets in preparation for this session.

Download Full-text

Semantic Annotation of Text Using Open Semantic Resources

Encyclopedia of Machine Learning and Data Mining ◽

10.1007/978-1-4899-7502-7_903-1 ◽

2016 ◽

pp. 1-6

Author(s):

Stefano Pacifico ◽

Janez Starc ◽

Janez Brank ◽

Luka Bradesko ◽

Marko Grobelnik

Keyword(s):

Semantic Annotation ◽

Semantic Resources

Download Full-text

Semantic Annotation of Botanical Collection Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.36187 ◽

2019 ◽

Vol 3 ◽

Author(s):

Dominik Röpert ◽

Fabian Reimeier ◽

Jörg Holetschek ◽

Anton Güntsch

Keyword(s):

Semantic Representation ◽

Semantic Annotation ◽

Open Data ◽

Botanical Garden ◽

Herbarium Specimens ◽

Botanical Collection ◽

Collection Data ◽

Data Elements ◽

Semantic Resources ◽

Year 2000

Herbarium specimens have been digitized at the Botanical Garden and Botanical Museum, Berlin (BGBM) since the year 2000. As part of the digitization process, specimen data have been recorded manually for specific basic data elements. Additional elements were usually added later based on the digital images. During the last twenty years, data were transcribed exactly as they were written on the labels, a widely used procedure in European herbaria. This approach led to a large number of orthographic variations especially with regard to person and place names. To improve interoperability between records within our own collection database and across collection databases provided by the community, we have started to enrich our metadata with Linked Open Data (LOD)-based links to semantic resources starting with collectors and geographic entities. Preferred resources for semantic enrichment (e.g., WikiData, GeoNames) have been agreed on by members of the Consortium of European Taxonomic Facilities (CETAF) in order to exploit the potential of semantically enriched collection data in the best possible way. To be able to annotate many collection records in a relatively short time, priority was given to concepts (e.g., specific collector names) that occur on many specimen labels and that have an existing and easy-to-find semantic representation in an external resource. With this approach, we were able to annotate 52,000 specimen records in just a few weeks of working time of a student assistant. The integration of our semantic annotation workflows with other data integration, cleaning, and import processes at the BGBM is carried out using an OpenRefine-based platform with specific extensions for services and functions related to label transcription activities (Kirchhoff et al. 2018). Our semantically enriched collection data will contribute to a “Botany Pilot,” which is presently being developed by member organizations of CETAF to demonstrate the potential of Linked Open Collection Data and their integration with existing semantic resources.

Download Full-text