scholarly journals Towards Interlinked FAIR Biodiversity Knowledge: The BiCIKL perspective

Author(s):  
Lyubomir Penev ◽  
Dimitrios Koureas ◽  
Quentin Groom ◽  
Jerry Lanfear ◽  
Donat Agosti ◽  
...  

The Horizon 2020 project Biodiversity Community Integrated Knowledge Library (BiCIKL) (started 1st of May 2021, duration 3 years) will build a new European community of key research infrastructures, researchers, citizen scientists and other stakeholders in biodiversity and life sciences. Together, the BiCIKL 14 partners will solidify open science practices by providing access to data, tools and services at each stage of, and along the entire biodiversity research and data life cycle (specimens, sequences, taxon names, analytics, publications, biodiversity knowledge graph) (Fig. 1, see also the BiCIKL kick-off presentation through Suppl. material 1), in compliance with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. The existing services provided by the participating infrastructures will expand through development and adoption of shared, common or interoperable domain standards, resulting in liberated and enhanced flows of data and knowledge across these domains. BiCIKL puts a special focus on the biodiversity literature. Over the span of the project, BiCIKL will develop new methods and workflows for semantic publishing and integrated access to harvesting, liberating, linking, and re-using sub-article-level data extracted from literature (i.e., specimens, material citations, sequences, taxonomic names, taxonomic treatments, figures, tables). Data linkages may be realised with different technologies (e.g., data warehousing, linking between FAIR Data Objects, Linked Open Data) and can be bi-lateral (between two data infrastructures) or multi-lateral (among multiple data infrastructures). The main challenge of BiCIKL is to design, develop and implement a FAIR Data Place (FDP), a central tool for search, discovery and management of interlinked FAIR data across different domains. The key final output of BiCIKL will the future Biodiversity Knowledge Hub (BKH), a one-stop portal, providing access to the BiCIKL services, tools and workflows, beyond the lifetime of the project.

2019 ◽  
Vol 2 ◽  
Author(s):  
Lyubomir Penev

"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph In combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, these approaches show different angles to the future of biodiversity data publishing and, lay the foundations of an entire data publishing ecosystem in the field, while also supplying FAIR (Findable, Accessible, Interoperable and Reusable) data to several interoperable overarching infrastructures, such as Global Biodiversity Information Facility (GBIF), Biodiversity Literature Repository (BLR), Plazi TreatmentBank, OpenBiodiv, as well as to various end users.


2020 ◽  
Author(s):  
Mohan Ramamurthy

<p>The geoscience disciplines are either gathering or generating data in ever-increasing volumes. To ensure that the science community and society reap the utmost benefits in research and societal applications from such rich and diverse data resources, there is a growing interest in broad-scale, open data sharing to foster myriad scientific endeavors. However, open access to data is not sufficient; research outputs must be reusable and reproducible to accelerate scientific discovery and catalyze innovation.</p><p>As part of its mission, Unidata, a geoscience cyberinfrastructure facility, has been developing and deploying data infrastructure and data-proximate scientific workflows and analysis tools using cloud computing technologies for accessing, analyzing, and visualizing geoscience data.</p><p>Specifically, Unidata has developed techniques that combine robust access to well-documented datasets with easy-to-use tools, using workflow technologies. In addition to fostering the adoption of technologies like pre-configured virtual machines through Docker containers and Jupyter notebooks, other computational and analytic methods are enabled via “Software as a Service” and “Data as a Service” techniques with the deployment of the Cloud IDV, AWIPS Servers, and the THREDDS Data Server in the cloud. The collective impact of these services and tools is to enable scientists to use the Unidata Science Gateway capabilities to not only conduct their research but also share and collaborate with other researchers and advance the intertwined goals of Reproducibility of Science and Open Science, and in the process, truly enabling “Science as a Service”.</p><p>Unidata has implemented the aforementioned services on the Unidata Science Gateway ((http://science-gateway.unidata.ucar.edu), which is hosted on the Jetstream cloud, a cloud-computing facility that is funded by the U. S. National Science Foundation. The aim is to give geoscientists an ecosystem that includes data, tools, models, workflows, and workspaces for collaboration and sharing of resources.</p><p>In this presentation, we will discuss our work to date in developing the Unidata Science Gateway and the hosted services therein, as well as our future directions toward increasing expectations from funders and scientific communities that they will be Open and FAIR (Findable, Accessible, Interoperable, Reusable). In particular, we will discuss how Unidata is advancing data and software transparency, open science, and reproducible research. We will share our experiences in how the geoscience and information science communities are using the data, tools and services provided through the Unidata Science Gateway to advance research and education in the geosciences.</p>


Impact ◽  
2020 ◽  
Vol 2020 (8) ◽  
pp. 46-47
Author(s):  
Lucy Annette

The Social Sciences & Humanities Open Cloud (SSHOC) is a 40-month-long project under the umbrella of the European Open Science Cloud (EOSC) and funded by Horizon 2020. This project unites 20 partner organisations as well as their 27 associates. SSHOC seeks to create interconnected data infrastructures focused on an integrated, cloud-based network structure.


Publications ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 38 ◽  
Author(s):  
Lyubomir Penev ◽  
Mariya Dimitrova ◽  
Viktor Senderov ◽  
Georgi Zhelezov ◽  
Teodor Georgiev ◽  
...  

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.


2019 ◽  
Author(s):  
Rachael Gallagher ◽  
Daniel Stein Falster ◽  
Brian Maitner ◽  
Rob Salguero-Gomez ◽  
Vigdis Vandvik ◽  
...  

Synthesising trait observations and knowledge across the Tree of Life remains a grand challenge for biodiversity science. Despite the well-recognised importance of traits for addressing ecological and evolutionary questions, trait-based approaches still struggle with several basic data requirements to deliver openly accessible, reproducible, and transparent science. Here, we introduce the Open Traits Network (OTN) – a decentralised alliance of international researchers and institutions focused on collaborative integration and standardisation of the exponentially increasing availability of trait data across all organisms. The OTN embraces the use of Open Science principles in trait research, particularly open data, open source, and open methodology protocols and workflows, to accelerate the synthesis of trait data across the Tree of Life. Increased efforts at all levels – from individual scientists, research networks, scientific societies, funding agencies, to publishers – are necessary to fully exploit the opportunities offered by Open Science in trait research. Democratising access to data, tools and resources will facilitate rapid advances in the biological sciences and our ability to address pressing environmental and societal demands.


2021 ◽  
Vol 50 (1) ◽  
pp. 15
Author(s):  
Matthias Reiter-Pázmándy

Open science and open access to research data are important aspects of research policy in Austria. In the last years, the social sciences have seen the building of research infrastructures that generate data and archives that store data. Data standards have been established, several working groups exist and a number of activities aim to further develop various aspects of open science, open data and access to data. However, some barriers and challenges still exist in the practice of sharing research data. One aspect that should be emphasised and incentivised is the re-use of research data.


2015 ◽  
Vol 10 (1) ◽  
pp. 111-122 ◽  
Author(s):  
Liz Lyon ◽  
Aaron Brenner

This paper examines the role, functions and value of the “iSchool” as an agent of change in the data informatics and data curation arena. A brief background to the iSchool movement is given followed by a brief review of the data decade, which highlights key data trends from the iSchool perspective: open data and open science, big data and disciplinary data diversity. The growing emphasis on the shortage of data talent is noted and a family of data science roles identified. The paper moves on to describe three primary functions of iSchools: education, research intelligence and professional practice, which form the foundations of a new Capability Ramp Model. The model is illustrated by mini-case studies from the School of Information Sciences, University of Pittsburgh: the immersive (laboratory-based) component of two new Research Data Management and Research Data Infrastructures graduate courses, a new practice partnership with the University Library System centred on RDM, and the mapping of disciplinary data practice using the Community Capability Model Profile Tool. The paper closes with a look to the future and, based on the assertion that data is mission-critical for iSchools, some steps are proposed for the next data decade: moving data education programs into the mainstream core curriculum, adopting a translational data science perspective and strengthening engagement with the Research Data Alliance.


2021 ◽  
Vol 3 (1) ◽  
pp. 79-87
Author(s):  
Atif Latif ◽  
Fidan Limani ◽  
Klaus Tochtermann

Federated Research Data Infrastructures aim to provide seamless access to research data along with services to facilitate the researchers in performing their data management tasks. During our research on Open Science (OS), we have built cross-disciplinary federated infrastructures for different types of (open) digital resources: Open Data (OD), Open Educational Resources (OER), and open access documents. In each case, our approach targeted only the resource “metadata”. Based on this experience, we identified some challenges that we had to overcome again and again: lack of (i) harvesters, (ii) common metadata models and (iii) metadata mapping tools. In this paper, we report on the challenges we faced in the federated infrastructure projects we were involved with. We structure the report based on the three challenges listed above.


2020 ◽  
Author(s):  
Kyle Copas

<p>GBIF—the Global Biodiversity Information Facility—and its network of more than 1,500 institutions maintain the world's largest index of biodiversity data (https://www.gbif.org), containing nearly 1.4 billion species occurrence records. This infrastructure offers a model of best practices, both technological and cultural, that other domains may wish to adapt or emulate to ensure that its users have free, FAIR and open access to data.</p><p>The availability of community-supported data and metadata standards in the biodiversity informatics community, combined with the adoption (in 2014) of open Creative Commons licensing for data shared with GBIF, established the necessary preconditions for the network's recent growth.</p><p>But GBIF's development of a data citation system based on the uses of DOIs—Digital Object Identifiers—has established an approach for using unique identifiers to establish direct links between scientific research and the underlying data on which it depends. The resulting state-of-the-art system tracks uses and reuses of data in research and credits data citations back to individual datasets and publishers, helping to ensure the transparency of biodiversity-related scientific analyses.</p><p>In 2015, GBIF began issuing a unique Digital Object Identifier (DOI) for every data download. This system resolves each download to a landing page containing 1) the taxonomic, geographic, temporal and other search parameters used to generate the download; 2) a quantitative map of the underlying datasets that contributed to the download; and 3) a simple citation to be included in works that rely on the data.</p><p>When authors cite these download DOIs, they in effect assert direct links between scientific papers and underlying data. Crossref registers these links through Event Data, enabling GBIF to track citation counts automatically for each download, dataset and publisher. These counts expand to display a bibliography of all research reuses of the data.This system improves the incentives for institutions to share open data by providing quantifiable measures demonstrating the value and impact of sharing data for others' research.</p><p>GBIF is a mature infrastructure that supports a wide pool of researchers publish two peer-reviewed journal articles that rely on this data every day. That said, the citation-tracking and -crediting system has room for improvement. At present, 21% of papers using GBIF-mediated data provide DOI citations—which represents a 30% increase over 2018. Through outreach to authors and collaboration with journals, GBIF aims to continue this trend.</p><p>In addition, members of the GBIF network are seeking to extend citation credits to individuals through tools like Bloodhound Tracker (https://www.bloodhound-tracker.net) using persistent identifiers from ORCID and Wikidata IDs. This approach provides a compelling model for the scientific and scholarly benefits of treating individual data records from specimens as micro- or nanopublications—first-class research objects that advancing both FAIR data and open science.</p>


2019 ◽  
Author(s):  
Lígia Ribeiro ◽  
Maria Manuel Borges ◽  
Diana Silva

On March 24, 2016, through the Resolution of the Council of Ministers nº 21/2016, the Government of Portugal, through the Ministry of Science, Technology and Higher Education (MCTES), announced the commitment of science to the principles and practices of Science Open. The same resolution mandates MCTES to create an Interministerial Working Group with the mission of presenting a proposal for a Strategic Plan for the implementation of a National Open Science Policy. The organization of the Working Group included four subgroups - Open Access and Open Data, Infrastructures and Digital Preservation, Scientific Evaluation, and Scientific Social Responsibility.The objective of this work is to present the set of recommendations that the subgroup of Scientific Assessment, in articulation with the others, considers being fundamental for the implementation of Open Science practices targeting political agents, as well as producing entities, evaluators and financiers of Science.


Sign in / Sign up

Export Citation Format

Share Document