data publication Latest Research Papers

Kadaster, the Dutch National Land Registry and Mapping Agency, has been actively publishing their base registries as linked (open) spatial data for several years. To date, a number of these base registers as well as a number of external datasets have been successfully published as linked data and are publicly available. Increasing demand for linked data products and the availability of new linked data technologies have highlighted the need for a new, innovative approach to linked data publication within the organisation in the interest of reducing the time and costs associated with said publication. The new approach to linked data publication is novel in both its approach to dataset modelling, transformation, and publication architecture. In modelling whole datasets, a clear distinction is made between the Information Model and the Knowledge Model to capture both the organisation-specific requirements and to support external, community standards in the publication process. The publication architecture consists of several steps where instance data are loaded from their source as GML and transformed using an Enhancer and published in the triple store. Both the modelling and publication architecture form part of Kadaster’s larger vision for the development of the Kadaster Knowledge Graph through the integration of the various linked datasets.

Download Full-text

MGD: A Utility Metric for Private Data Publication

10.1145/3491371.3491385 ◽

2021 ◽

Author(s):

Zitao Li ◽

Trung Dang ◽

Tianhao Wang ◽

Ninghui Li

Keyword(s):

Data Publication ◽

Private Data

Download Full-text

Precise sensitivity recognizing, privacy preserving, knowledge graph-based method for trajectory data publication

Frontiers of Computer Science ◽

10.1007/s11704-021-0417-6 ◽

2021 ◽

Vol 16 (4) ◽

Author(s):

Xianxian Li ◽

Bing Cai ◽

Li-e Wang ◽

Lei Lei

Keyword(s):

Privacy Preserving ◽

Knowledge Graph ◽

Trajectory Data ◽

Data Publication

Download Full-text

Resources for Ethics in Data Publication

Editorial Office News ◽

10.18243/eon/2021.14.7.5 ◽

2021 ◽

Vol 14 (7) ◽

Author(s):

Kristen Overstreet

Keyword(s):

Data Publication

Download Full-text

Technical Considerations for a Transactional Model to Realize the Digital Extended Specimen

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73812 ◽

2021 ◽

Vol 5 ◽

Author(s):

Nelson Rios ◽

Sharif Islam ◽

James Macklin ◽

Andrew Bentley

Keyword(s):

Natural Science ◽

Building Blocks ◽

Management Systems ◽

Collection Management ◽

Transactional Model ◽

Biodiversity Data ◽

Data Repositories ◽

Data Publication ◽

Darwin Core ◽

Data Elements

Technological innovations over the past two decades have given rise to the online availability of more than 150 million specimen and species-lot records from biological collections around the world through large-scale biodiversity data-aggregator networks. In the present landscape of biodiversity informatics, collections data are captured and managed locally in a wide variety of databases and collection management systems and then shared online as point-in-time Darwin Core archive snapshots. Data providers may publish periodic revisions to these data files, which are retrieved, processed and re-indexed by data aggregators. This workflow has resulted in data latencies and lags of months to years for some data providers. The Darwin Core Standard Wieczorek et al. (2012) provides guidelines for representing biodiversity information digitally, yet varying institutional practices and lack of interoperability between Collection Management Systems continue to limit semantic uniformity, particularly with regard to the actual content of data within each field. Although some initiatives have begun to link data elements, our ability to comprehensively link all of the extended data associated with a specimen, or related specimens, is still limited due to the low uptake and usage of persistent identifiers. The concept now under consideration is to create a Digital Extended Specimen (DES) that adheres to the tenets of Findable, Accessible, Interoperable and Reusable (FAIR) data management of stewardship principles and is the cumulative digital representation of all data, derivatives and products associated with a physical specimen, which are individually distinguished and linked by persistent identifiers on the Internet to create a web of knowledge. Biodiversity data aggregators that mobilize data across multiple institutions routinely perform data transformations in an attempt to provide a clean and consistent interpretation of the data. These aggregators are typically unable to interact directly with institutional data repositories, thereby limiting potentially fruitful opportunities for annotation, versioning, and repatriation. The ability to track such data transactions and satisfy the accompanying legal implications (e.g. Nagoya Protocol) is becoming a necessary component of data publication which existing standards do not adequately address. Furthermore, no mechanisms exist to assess the “trustworthiness” of data, critical to scientific integrity, reproducibility or to provide attribution metrics for collections to advocate for their contribution or effectiveness in supporting such research. Since the introduction of Darwin Core Archives Wieczorek et al. (2012) little has changed in the underlying mechanisms for publishing natural science collections data and we are now at a point where new innovations are required to meet current demand for continued digitization, access, research and management. One solution may involve changing the biodiversity data publication paradigm to one based on the atomized transactions relevant to each individual data record. These transactions, when summed over time, allows us us to realize the most recently accepted revision as well as historical and alternative perspectives. In order to realize the Digital Extended Specimen ideals and the linking of data elements, this transactional model combined with open and FAIR data protocols, application programming interfaces (APIs), repositories, and workflow engines can provide the building blocks for the next generation of natural science collections and biodiversity data infrastructures and services. These and other related topics have been the focus of phase 2 of the global consultation on converging Digital Specimens and Extended Specimens. Based on these discussions, this presentation will explore a conceptual solution leveraging elements from distributed version control, cryptographic ledgers and shared redundant storage to overcome many of the shortcomings of contemporary approaches.

Download Full-text