scholarly journals Using Wikibase as a Platform to Develop a Semantic TDWG Standard

Author(s):  
David Fichtmüller ◽  
Fabian Reimeier ◽  
Anton Güntsch

In the ABCD 3.0 Project the ABCD (Access to Biological Collection Data) Standard (Access to Biological Collections Data task group 2007) was transformed from a classic XML Schema into an OWL (Web Ontology Language) ontology (along side an updated semantic-aware XML version). While it was initially planned to use the established TDWG Terms wiki as the editing and development platform for the ABCD ontology, the rise of Wikidata and its underlying platform Wikibase have caused us to reconsider this decision and switch to a Wikibase installation instead. This proved to be a crucial decision, as Wikibase turned out to be a well-suited platform to collaboratively import, develop and export this complex semantic standard. This experience is potentially of interest to maintainers of other Biodiversity Information Standards (TDWG) standards and the Technical Architecture Group. In this presentation we will explain our technical setup and how we used Wikibase, alongside its related tools, to model the ABCD Ontology. We will introduce the tools we used for importing existing concepts from the previous ABCD versions, running maintenance queries (e.g. for checking the ontology for consistency or missing information about concepts), and exporting the ontology into the OWL/XML format. Finally we will discuss the lessons we learned and how our setup can be improved for future uses.

Author(s):  
Matt Woodburn ◽  
Deborah L Paul ◽  
Wouter Addink ◽  
Steven J Baskauf ◽  
Stanley Blum ◽  
...  

Digitisation and publication of museum specimen data is happening worldwide, but far from complete. Museums can start by sharing what they know about their holdings at a higher level, long before each object has its own record. Information about what is held in collections worldwide is needed by many stakeholders including collections managers, funders, researchers, policy-makers, industry, and educators. To aggregate this information from collections, the data need to be standardised (Johnston and Robinson 2002). So, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Task Group is developing a data standard for describing collections, which gives the ability to provide: automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) and a global registry of physical collections (i.e., digitised or non-digitised). automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) and a global registry of physical collections (i.e., digitised or non-digitised). Outputs will include a data model to underpin the new standard, and guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures. The Task Group employs a community-driven approach to standard development. With international participation, workshops at the Natural History Museum (London 2019) and the MOBILISE workshop (Warsaw 2020) allowed over 50 people to contribute this work. Our group organized online "barbecues" (BBQs) so that many more could contribute to standard definitions and address data model design challenges. Cloud-based tools (e.g., GitHub, Google Sheets) are used to organise and publish the group's work and make it easy to participate. A Wikibase instance is also used to test and demonstrate the model using real data. There are a range of global, regional, and national initiatives interested in the standard (see Task Group charter). Some, like GRSciColl (now at the Global Biodiversity Information Facility (GBIF)), Index Herbariorum (IH), and the iDigBio US Collections List are existing catalogues. Others, including the Consortium of European Taxonomic Facilities (CETAF) and the Distributed System of Scientific Collections (DiSSCo), include collection descriptions as a key part of their near-term development plans. As part of the EU-funded SYNTHESYS+ project, GBIF organized a virtual workshop: Advancing the Catalogue of the World's Natural History Collections to get international input for such a resource that would use this CD standard. Some major complexities present themselves in designing a standardised approach to represent collection descriptions data. It is not the first time that the natural science collections community has tried to address them (see the TDWG Natural Collections Description standard). Beyond natural sciences, the library community in particular gave thought to this (Heaney 2001, Johnston and Robinson 2002), noting significant difficulties. One hurdle is that collections may be broken down into different degrees of granularity according to different criteria, and may also overlap so that a single object can be represented in more than one collection description. Managing statistics such as numbers of objects is complex due to data gaps and variable degrees of certainty about collection contents. It also takes considerable effort from collections staff to generate structured data about their undigitised holdings. We need to support simple, high-level collection summaries as well as detailed quantitative data, and to be able to update as needed. We need a simple approach, but one that can also handle the complexities of data, scope, and social needs, for digitised and undigitised collections. The data standard itself is a defined set of classes and properties that can be used to represent groups of collection objects and their associated information. These incorporate common characteristics ('dimensions') by which we want to describe, group and break down our collections, metrics for quantifying those collections, and properties such as persistent identifiers for tracking collections and managing their digital counterparts. Existing terms from other standards (e.g. Darwin Core, ABCD) are re-used if possible. The data model (Fig. 1) underpinning the standard defines the relationships between those different classes, and ensures that the structure as well as the content are comparable across different datasets. It centres around the core concept of an 'object group', representing a set of physical objects that is defined by one or more dimensions (e.g., taxonomy and geographic origin), and linked to other entities such as the holding institution. To the object group, quantitative data about its contents are attached (e.g. counts of objects or taxa), along with more qualitative information describing the contents of the group as a whole. In this presentation, we will describe the draft standard and data model with examples of early adoption for real-world and example data. We will also discuss the vision of how the new standard may be adopted and its potential impact on collection discoverability across the collections community.


2011 ◽  
pp. 782-808
Author(s):  
Paavo Kotinurmi ◽  
Armin Haller ◽  
Eyal Oren

RosettaNet is an industry-driven e-business process standard that defines common inter-company public processes and their associated business documents. RosettaNet is based on the Service-oriented architecture (SOA) paradigm and all business documents are expressed in DTD or XML Schema. Our “ontologically-enhanced RosettaNet” effort translates RosettaNet business documents into a Web ontology language, allowing business reasoning based on RosettaNet message exchanges. This chapter describes our extension to RosettaNet and shows how it can be used in business integrations for better interoperability. The usage of a Web ontology language in RosettaNet collaborations can help accommodate partner heterogeneity in the setup phase and can ease the back-end integration, enabling for example more competition in the purchasing processes. It provides also a building block to adopt a semantic SOA with richer discovery, selection and composition capabilities.


Author(s):  
Paavo Kotinurmi ◽  
Armin Haller ◽  
Eyal Oren

RosettaNet is an industry-driven e-business process standard that defines common inter-company public processes and their associated business documents. RosettaNet is based on the Service-oriented architecture (SOA) paradigm and all business documents are expressed in DTD or XML Schema. Our “ontologically-enhanced RosettaNet” effort translates RosettaNet business documents into a Web ontology language, allowing business reasoning based on RosettaNet message exchanges. This chapter describes our extension to RosettaNet and shows how it can be used in business integrations for better interoperability. The usage of a Web ontology language in RosettaNet collaborations can help accommodate partner heterogeneity in the setup phase and can ease the back-end integration, enabling for example more competition in the purchasing processes. It provides also a building block to adopt a semantic SOA with richer discovery, selection and composition capabilities.


Author(s):  
Matt Woodburn ◽  
Deborah L Paul ◽  
William Ulate ◽  
Niels Raes

Aggregating content of museum and scientific collections worldwide offers us the opportunity to realize a virtual museum of our planet and the life upon it through space and time. By mapping specimen-level data records to standards and publishing this information, an increasing number of collections contribute to a digitally accessible wealth of knowledge. Visualizing these digital records by parameters such as collection type and geographic origin, helps collections and institutions to better understand their digital holdings and compare them to other such collections, as well as enabling researchers to find specimens and specimen data quickly (Singer et al. 2018). At the higher level of collections, related people and their activities, and especially the great majority of material that is yet to be digitised, we know much less. Many collections hold material not yet digitally discoverable in any form. For those that do publish collection-level data, it is commonly text-based data without the Global Unique Identifiers (GUIDs) or the controlled vocabularies that would support quantitative collection metrics and aid discovery of related expertise and publications. To best understand and plan for our world’s bio- and geodiversity represented in collections, we need standardised, quantitative collections-level metadata. Various groups planet-wide are actively developing tools to capture this much-needed metadata, including information about the backlog, and more detailed information about institutions and their activities (e.g. staffing, space, species-level inventories, geographic and taxonomic expertise, and related publications) (Smith et al. 2018). The Biodiversity Information Standards organization (TDWG) Collection Descriptions (CD) Data Standard Task Group aims to provide a data standard for describing natural scientific collections, which enables the ability to provide: automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) and a global registry of physical collections (either digitised or non-digitised). automated metrics, using standardised collection descriptions and/or data derived from specimen datasets (e.g., counts of specimens) and a global registry of physical collections (either digitised or non-digitised). The group will also produce a data model to underpin the new standard, and provide guidance and reference implementations for the practical use of the standard in institutional and collaborative data infrastructures. Our task group includes members from a myriad of groups with a stake in mobilizing such data at local, regional, domain-specific and global levels. With such a standard adopted, it will be possible to effectively share data across different community resources. So far, we have carried out landscape analyses of existing collection description frameworks, and amassed a portfolio of use cases from the group as well as from a range of other sources, including the Collection Descriptions Dashboard working group of ICEDIG ("Innovation and consolidation for large scale digitisation of natural heritage"), iDigBio (Integrated Digitized Biocollections), Smithsonian, Index Herbariorum, the Field Museum, GBIF (Global Biodiversity Information Facility), GRBio (Global Registry of Biodiversity Repositories) and fishfindR.net. These were used to develop a draft data model, and between them inform the first iteration of CD draft data standard. A variety of challenges present themselves in developing this standard. Some relate to the standard development process itself, such as identifying (often learning) effective tools and methods for collaborative working and communication across globally distributed volunteers. Others concern the scope and gaining consensus from stakeholders, across a wide range of disciplines, while maintaining achievable goals. Further challenges arise from the requirement to develop a data model and standard that support such a variety of use cases and priorities, while retaining interoperability and manageability of the data. We will present some of these challenges and methods for addressing them, and summarise the progress and draft outputs of the group so far. We will also discuss the vision of how the new standard may be adopted and its potential impact on collections discoverability across the natural science collections community.


Author(s):  
V. Milea ◽  
F. Frasincar ◽  
U. Kaymak

2014 ◽  
Vol 548-549 ◽  
pp. 1504-1509 ◽  
Author(s):  
Jun Ji ◽  
Wei Yan Chai ◽  
Gan Xin Xue ◽  
Ge Yang Li

According to the business characteristics and technical requirements of heavy military vehicles digital collaborative development, the paper proposes the dynamic federal collaborative development method for heavy military vehicles. To realize the method, the key is constructing the three-tier dynamic federal collaborative development environment. So the paper focuses on the research of the business, function and technical architecture of dynamic federal collaborative development environment. Applying the Web Service-Based cross-system collaborative flow controlling method and developing the integrated middleware, the three-tier federal collaborative environment realizes tightly integration. Finally, the application demonstration of a type-product inter-enterprise collaborative development based on the developed digital collaborative development platform verifies the effectiveness and feasibility of the method.


Author(s):  
Katharine Barker ◽  
Jonas Astrin ◽  
Gabriele Droege ◽  
Jonathan Coddington ◽  
Ole Seberg

Most successful research programs depend on easily accessible and standardized research infrastructures. Until recently, access to tissue or DNA samples with standardized metadata and of a sufficiently high quality, has been a major bottleneck for genomic research. The Global Geonome Biodiversity Network (GGBN) fills this critical gap by offering standardized, legal access to samples. Presently, GGBN’s core activity is enabling access to searchable DNA and tissue collections across natural history museums and botanic gardens. Activities are gradually being expanded to encompass all kinds of biodiversity biobanks such as culture collections, zoological gardens, aquaria, arboreta, and environmental biobanks. Broadly speaking, these collections all provide long-term storage and standardized public access to samples useful for molecular research. GGBN facilitates sample search and discovery for its distributed member collections through a single entry point. It stores standardized information on mostly geo-referenced, vouchered samples, their physical location, availability, quality, and the necessary legal information on over 50,000 species of Earth’s biodiversity, from unicellular to multicellular organisms. The GGBN Data Portal and the GGBN Data Standard are complementary to existing infrastructures such as the Global Biodiversity Information Facility (GBIF) and International Nucleotide Sequence Database (INSDC). Today, many well-known open-source collection management databases such as Arctos, Specify, and Symbiota, are implementing the GGBN data standard. GGBN continues to increase its collections strategically, based on the needs of the research community, adding over 1.3 million online records in 2018 alone, and today two million sample data are available through GGBN. Together with Consortium of European Taxonomic Facilities (CETAF), Society for the Preservation of Natural History Collections (SPNHC), Biodiversity Information Standards (TDWG), and Synthesis of Systematic Resources (SYNTHESYS+), GGBN provides best practices for biorepositories on meeting the requirements of the Nagoya Protocol on Access and Benefit Sharing (ABS). By collaboration with the Biodiversity Heritage Library (BHL), GGBN is exploring options for tagging publications that reference GGBN collections and associated specimens, made searchable through GGBN’s document library. Through its collaborative efforts, standards, and best practices GGBN aims at facilitating trust and transparency in the use of genetic resources.


Sign in / Sign up

Export Citation Format

Share Document