Technical Note: The Linked Paleo Data framework – a common tongue for paleoclimatology

Abstract. Paleoclimatology is a highly collaborative scientific endeavor, increasingly reliant on online databases for data sharing. Yet, there is currently no universal way to describe, store and share paleoclimate data: in other words, no standard. Data standards are often regarded by scientists as mere technicalities, though they underlie much scientific and technological innovation, as well as facilitating collaborations between research groups. In this article, we propose a preliminary data standard for paleoclimate data, general enough to accommodate all the proxy and measurement types encountered in a large international collaboration (PAGES2K). We also introduce a vehicle for such structured data (Linked Paleo Data, or LiPD), leveraging recent advances in knowledge representations (Linked Open Data). The LiPD framework enables quick querying and extraction, and we expect that it will facilitate the writing of open-source, community codes to access, analyze, model and visualize paleoclimate observations. We welcome community feedback on this standard, and encourage paleoclimatologists to experiment with the format for their own purposes.

Download Full-text

Technical note: The Linked Paleo Data framework – a common tongue for paleoclimatology

Climate of the Past ◽

10.5194/cp-12-1093-2016 ◽

2016 ◽

Vol 12 (4) ◽

pp. 1093-1100 ◽

Cited By ~ 23

Author(s):

Nicholas P. McKay ◽

Julien Emile-Geay

Keyword(s):

Open Data ◽

Technical Note ◽

Structured Data ◽

Research Groups ◽

Data Standards ◽

Standard Data ◽

Data Framework ◽

Online Databases ◽

Scientific Endeavor ◽

Paleoclimate Data

Abstract. Paleoclimatology is a highly collaborative scientific endeavor, increasingly reliant on online databases for data sharing. Yet there is currently no universal way to describe, store and share paleoclimate data: in other words, no standard. Data standards are often regarded by scientists as mere technicalities, though they underlie much scientific and technological innovation, as well as facilitating collaborations between research groups. In this article, we propose a preliminary data standard for paleoclimate data, general enough to accommodate all the archive and measurement types encountered in a large international collaboration (PAGES 2k). We also introduce a vehicle for such structured data (Linked Paleo Data, or LiPD), leveraging recent advances in knowledge representation (Linked Open Data).The LiPD framework enables quick querying and extraction, and we expect that it will facilitate the writing of open-source community codes to access, analyze, model and visualize paleoclimate observations. We welcome community feedback on this standard, and encourage paleoclimatologists to experiment with the format for their own purposes.

Download Full-text

ODArchive – Creating an Archive for Structured Data from Open Data Portals

Lecture Notes in Computer Science - The Semantic Web – ISWC 2020 ◽

10.1007/978-3-030-62466-8_20 ◽

2020 ◽

pp. 311-327

Author(s):

Thomas Weber ◽

Johann Mitöhner ◽

Sebastian Neumaier ◽

Axel Polleres

Keyword(s):

Open Data ◽

Structured Data

Download Full-text

The Pensoft Data Publishing Workflow: The FAIRway from articles to Linked Open Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35902 ◽

2019 ◽

Vol 3 ◽

Author(s):

Lyubomir Penev ◽

Teodor Georgiev ◽

Viktor Senderov ◽

Mariya Dimitrova ◽

Pavel Stoev

Keyword(s):

Open Data ◽

Structured Data ◽

Linked Open Data ◽

Data Publishing ◽

Knowledge Graph ◽

Supplementary File ◽

Biodiversity Data ◽

Text Format ◽

Biodiversity Knowledge ◽

Data Elements

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany

Research Ideas and Outcomes ◽

10.3897/rio.6.e55852 ◽

2020 ◽

Vol 6 ◽

Cited By ~ 3

Author(s):

Christoph Steinbeck ◽

Oliver Koepler ◽

Felix Bach ◽

Sonja Herres-Pawlis ◽

Nicole Jung ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Open Data ◽

Research Data ◽

Data Standards ◽

Data Repositories ◽

Data Infrastructure ◽

Research Data Management ◽

Wide Range ◽

Chemistry Community

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.

Download Full-text

Structured data, standards, and indexes

The Indexer The International Journal of Indexing ◽

10.3828/indexer.2013.45 ◽

2013 ◽

Vol 31 (4) ◽

pp. 133-137

Author(s):

Sally Goodenough

Keyword(s):

Structured Data ◽

Data Standards

Download Full-text

Semantic Web Standards for Publishing and Integrating Open Data

Advances in Electronic Government, Digital Divide, and Regional Development - Handbook of Research on Advanced ICT Integration for Governance and Policy Modeling ◽

10.4018/978-1-4666-6236-0.ch003 ◽

2014 ◽

pp. 28-47

Author(s):

Axel Polleres ◽

Simon Steyskal

Keyword(s):

Semantic Web ◽

World Wide ◽

Open Data ◽

Standard Data ◽

Web Standards ◽

Web Of Data ◽

Structured Information ◽

Potential Risks ◽

Rdf Data ◽

The Web

The World Wide Web Consortium (W3C) as the main standardization body for Web standards has set a particular focus on publishing and integrating Open Data. In this chapter, the authors explain various standards from the W3C's Semantic Web activity and the—potential—role they play in the context of Open Data: RDF, as a standard data format for publishing and consuming structured information on the Web; the Linked Data principles for interlinking RDF data published across the Web and leveraging a Web of Data; RDFS and OWL to describe vocabularies used in RDF and for describing mappings between such vocabularies. The authors conclude with a review of current deployments of these standards on the Web, particularly within public Open Data initiatives, and discuss potential risks and challenges.

Download Full-text

CNSA: a data repository for archiving omics data

Database ◽

10.1093/database/baaa055 ◽

2020 ◽

Vol 2020 ◽

Cited By ~ 13

Author(s):

Xueqin Guo ◽

Fengzhen Chen ◽

Fei Gao ◽

Ling Li ◽

Ke Liu ◽

...

Keyword(s):

High Throughput Sequencing ◽

Analytical Data ◽

Academic Research ◽

Open Data ◽

Data Repository ◽

Free Access ◽

Omics Data ◽

Sequencing Data ◽

Data Standards ◽

Sample Information

Abstract With the application and development of high-throughput sequencing technology in life and health sciences, massive multi-omics data brings the problem of efficient management and utilization. Database development and biocuration are the prerequisites for the reuse of these big data. Here, relying on China National GeneBank (CNGB), we present CNGB Sequence Archive (CNSA) for archiving omics data, including raw sequencing data and its further analyzed results which are organized into six objects, namely Project, Sample, Experiment, Run, Assembly and Variation at present. Moreover, CNSA has created a correlation model of living samples, sample information and analytical data on some projects. Both living samples and analytical data are directly correlated with the sample information. From either one, information or data of the other two can be obtained, so that all data can be traced throughout the life cycle from the living sample to the sample information to the analytical data. Complying with the data standards commonly used in the life sciences, CNSA is committed to building a comprehensive and curated data repository for storing, managing and sharing of omics data. We will continue to improve the data standards and provide free access to open-data resources for worldwide scientific communities to support academic research and the bio-industry. Database URL: https://db.cngb.org/cnsa/.

Download Full-text

An Unsupervised Approach for Determining Link Specifications

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2018100106 ◽

2018 ◽

Vol 13 (4) ◽

pp. 104-123

Author(s):

Khayra Bencherif ◽

Mimoun Malki ◽

Djamel Amar Bensaber

Keyword(s):

Linked Data ◽

Open Data ◽

Real Data ◽

Knowledge Bases ◽

Structured Data ◽

Data Sets ◽

Novel Approach ◽

Link Discovery ◽

Unsupervised Approach

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.

Download Full-text