scholarly journals Data Standards for the Phenology of Plant Specimens

Author(s):  
Katelin Pearson ◽  
Libby Ellwood ◽  
Edward Gilbert ◽  
Rob Guralnick ◽  
James Macklin ◽  
...  

Phenological data (i.e., data on growth and reproductive events of organisms) are increasingly being used to study the effects of climate change, and biodiversity specimens have arisen as important sources of phenological data. However, phenological data are not expressly treated by the Darwin Core standard (Wieczorek et al. 2012), and specimen-based phenological data have been codified and stored in various Darwin Core fields using different vocabularies, making phenological data difficult to access, aggregate, and therefore analyze at scale across data sources. The California Phenology Network, an herbarium digitization collaboration launched in 2018, has harvested phenological data from over 1.4 million angiosperm specimens from California herbaria (Yost et al. 2020). We developed interim standards by which to score and store these data, but further development is needed for adoption of ideal phenological data standards into the Darwin Core. To this end, we are forming a Plant Specimen Phenology Task Group to develop a phenology extension for the Darwin Core standard. We will create fields into which phenological data can be entered and recommend a standardized vocabulary for use in these fields using the Plant Phenology Ontology (Stucky et al. 2018, Brenskelle et al. 2019). We invite all interested parties to become part of this Task Group and thereby contribute to the accesibility and use of these valuable data. In this talk, we will describe the need for plant phenological data standards, current challenges to developing such standards, and outline the next steps of the Task Group toward providing this valuable resource to the data user community.

2018 ◽  
Vol 2 ◽  
pp. e25608 ◽  
Author(s):  
Lee Belbin ◽  
Arthur Chapman ◽  
John Wieczorek ◽  
Paula Zermoglio ◽  
Alex Thompson ◽  
...  

Task Group 2 of the TDWG Data Quality Interest Group aims to provide a standard suite of tests and resulting assertions that can assist with filtering occurrence records for as many applications as possible. Currently ‘data aggregators’ such as the Global Biodiversity Information Facility (GBIF), the Atlas of Living Australia (ALA) and iDigBio run their own suite of tests over records received and report the results of these tests (the assertions): there is, however, no standard reporting mechanisms. We reasoned that the availability of an internationally agreed set of tests would encourage implementations by the aggregators, and at the data sources (museums, herbaria and others) so that issues could be detected and corrected early in the process. All the tests are limited to Darwin Core terms. The ~95 tests refined from over 250 in use around the world, were classified into four output types: validations, notifications, amendments and measures. Validations test one of more Darwin Core terms, for example, that dwc:decimalLatitude is in a valid range (i.e. between -90 and +90 inclusive). Notifications report a status that a user of the record should know about, for example, if there is a user-annotation associated with the record. Amendments are made to one or more Darwin Core terms when the information across the record can be improved, for example, if there is no value for dwc:scientificName, it can be filled in from a valid dwc:taxonID. Measures report values that may be useful for assessing the overall quality of a record, for example, the number of validation tests passed. Evaluation of the tests was complex and time-consuming, but the important parameters of each test have been consistently documented. Each test has a globally unique identifier, a label, an output type, a resource type, the Darwin Core terms used, a description, a dimension (from the Framework on Data Quality from TG1), an example, references, implementations (if any), test-prerequisites and notes. For each test, generic code is being written that should be easy for institutions to implement – be they aggregators or data custodians. A valuable product of the work of TG2 has been a set of general principles. One example is “Darwin Core terms are either: literal verbatim (e.g., dwc:verbatimLocality) and cannot be assumed capable of validation, open-ended (e.g., dwc:behavior) and cannot be assumed capable of validation, or bounded by an agreed vocabulary or extents, and therefore capable of validation (e.g., dwc:countryCode)”. Another is “criteria for including tests is that they are informative, relatively simple to implement, mandatory for amendments and have power in that they will not likely result in 0% or 100% of all record hits.” A third: “Do not ascribe precision where it is unknown.” GBIF, the ALA and iDigBio have committed to implementing the tests once they have been finalized. We are confident that many museums and herbaria will also implement the tests over time. We anticipate that demonstration code and a test dataset that will validate the code will be available on project completion.


2017 ◽  
Vol 39 (3) ◽  
Author(s):  
Ian Bruno ◽  
Jeremy G. Frey

AbstractThe new millennium, now almost 20 years old, has been characterised by a recognition within the research community of the importance of the free flow of research data; not simply in the ability to access the data, but also in the understanding that this valuable resource needs to be reused and built upon. We believe there have been at least two main drivers for this. First, those who pay for the research want to know it is leading to useful outcomes with impact–the transparency and accountability agenda. Second is an appreciation that the major global concerns (food, health, climate, economy) are extraordinarily complex (‘wicked’) problems, [


Author(s):  
Yanina Sica ◽  
Paula Zermoglio

Biodiversity inventories, i.e., recording multiple species at a specific place and time, are routinely performed and offer high-quality data for characterizing biodiversity and its change. Digitization, sharing and reuse of incidental point records (i.e., records that are not readily associated with systematic sampling or monitoring, typically museum specimens and many observations from citizen science projects) has been the focus for many years in the biodiversity data community. Only more recently, attention has been directed towards mobilizing data from both new and longstanding inventories and monitoring efforts. These kinds of studies provide very rich data that can enable inferences about species absence, but their reliability depends on the methodology implemented, the survey effort and completeness. The information about these elements has often been regarded as metadata and captured in an unstructured manner, thus making their full use very challenging. Unlocking and integrating inventory data requires data standards that can facilitate capture and sharing of data with the appropriate depth. The Darwin Core standard (Wieczorek et al. 2012) currently enables reporting some of the information contained in inventories, particularly using Darwin Core Event terms such as samplingProtocol, sampleSizeValue, sampleSizeUnit, samplingEffort. However, it is limited in its ability to accommodate spatial, temporal, and taxonomic scopes, and other key aspects of the inventory sampling process, such as direct or inferred measures of sampling effort and completeness. The lack of a standardized way to share inventory data has hindered their mobilization, integration, and broad reuse. In an effort to overcome these limitations, a framework was developed to standardize inventory data reporting: Humboldt Core (Guralnick et al. 2018). Humboldt Core identified three types of inventories (single, elementary, and summary inventories) and proposed a series of terms to report their content. These terms were organized in six categories: dataset and identification; geospatial and habitat scope; temporal scope; taxonomic scope; methodology description; and completeness and effort. While originally planned as a new TDWG standard and being currently implemented in Map of Life (https://mol.org/humboldtcore/), ratification was not pursued at the time, thus limiting broader community adoption. In 2021 the TDWG Humboldt Core Task Group was established to review how to best integrate the terms proposed in the original publication with existing standards and implementation schemas. The first goal of the task group was to determine whether a new, separate standard was needed or if an extension to Darwin Core could accommodate the terms necessary to describe the relevant information elements. Since the different types of inventories can be thought of as Events with different nesting levels (events within events, e.g., plots within sites), and after an initial mapping to existing Darwin Core terms, it was deemed appropriate to start from a Darwin Core Event Core and build an extension to include Humboldt Core terms. The task group members are currently revising all original Humboldt Core terms, reformulating definitions, comments, and examples, and discarding or adding new terms where needed. We are also gathering real datasets to test the use of the extension once an initial list of revised terms is ready, before undergoing a public review period as established by the TDWG process. Through the ratification of Humboldt Core as a TDWG extension, we expect to provide the community with a solution to share and use inventory data, which improves biodiversity data discoverability, interoperability and reuse while lowering the reporting burden at different levels (data collection, integration and sharing).


Metabolomics ◽  
2015 ◽  
Vol 11 (4) ◽  
pp. 782-783 ◽  
Author(s):  
Reza M. Salek ◽  
Masanori Arita ◽  
Saravanan Dayalan ◽  
Timothy Ebbels ◽  
Andrew R. Jones ◽  
...  

Author(s):  
Martha Elena Núñez ◽  
◽  
Miguel X. Rodríguez-Paz

Real-time remote courses offer the students the valuable resource of cultural diversity and the access to professors who are leaders in their field of knowledge. It also encourages the practice of several skills with technology as a learning tool. We have studied how courses taught by internationally experienced leaders, relying on communication technologies, impact the students and can be a valuable complement to traditional classes offering them a global perspective in its formation. This paper focuses on the validation of the experience looking at student satisfaction for a teaching methodology in a real-time remote model. It also takes into account the recommendations from their professors. This information can provide valuable data to consider in order to further promote this scheme of classes and successfully continue teaching through real-time remote classes.


Author(s):  
Erica Krimmel ◽  
Talia Karim ◽  
Holly Little ◽  
Lindsay Walker ◽  
Roger Burkhalter ◽  
...  

The Paleo Data Working Group was launched in May 2020 as a driving force for broader conversations about paleontologic data standards. Here, we present an overview of the “community of practice” model used by this group to evaluate and implement data standards such as those stewarded by Biodiversity Information Standards (TDWG). A community of practice is defined by regular and ongoing interaction among individual members, who find enough value in participating, so that the group achieves a self-sustaining level of activity (Wenger 1998, Wenger and Snyder 2000, Wenger et al. 2002). Communities of practice are not a new phenomenon in biodiversity science, and were recommended by the recent United States National Academies report on biological collections (National Academies of Sciences, Engineering, and Medicine 2020) as a way to support workforce training, data-driven discoveries, and transdisciplinary collaboration. Our collective aim to digitize specimens and mobilize the data presents new opportunities to foster communities of practice that are circumscribed not by research agendas but rather by the need for better data management practices to facilitate research. Paleontology collections professionals in the United States have been meeting to discuss digitization semi-consistently in both virtual and in-person spaces for nearly a decade, largely thanks to support from the iDigBio Paleo Digitization Working Group. The need for a community of practice within this group focused on data management in paleo collections became apparent at the biodiversity_next Conference in October 2019, where we realized that work being done in the biodiversity standards community was not being informed by or filtering back to digitization and data mobilization efforts occurring in the paleo collections community. A virtual workshop focused on georeferencing for paleo in April 2020 was conceived as an initial pathway to bridge these two communities and provided a concrete example of how useful it can be to interweave practical digitization experience with conceptual data standards. In May 2020, the Paleo Data Working Group began meeting biweekly on Zoom, with discussion topics collaboratively developed, presented, and discussed by members and supplemented with invited speakers when appropriate. Topics centered on implementation of data standards (e.g., Darwin Core) by collections staff, and how standards can evolve to better represent data. An associated Slack channel facilitated continuing conversations asynchronously. Engaging domain experts (e.g., paleo collections staff) in the conceptualization of information throughout the data lifecycle helped to pinpoint issues and gaps within the existing standards and revealed opportunities for increasing accessibility. Additionally, when domain experts gained a better understanding of the information science framework underlying the data standards they were better able to apply them to their own data. This critical step of standards implementation at the collections level has often been slow to follow standards development, except in the few collections that have the funds and/or expertise to do so. Overall, we found the Paleo Data Working Group model of knowledge sharing to be mutually beneficial for standards developers and collections professionals, and it has led to a community of practice where informatics and paleo domain expertise intersect with a low barrier to entry for new members of both groups. Serving as a loosely organized voice for the needs of the paleo collections community, the Paleo Data Working Group has contributed to several initiatives in the broader biodiversity community. For example, during the 2021 public review of Darwin Core maintenance proposals, the Paleo Data Working Group shared the workload of evaluating and commenting on issues among its members. Not only was this efficient for us, but it was also effective for the TDWG review process, which sought to engage a broad audience while also reaching consensus. The Paleo Data Working Group has also served as a coordinated point of contact for adjacent and intersecting activities related to both data standards (e.g., those led by the TDWG Earth Sciences and Paleobiology Interest Group and the TDWG Collections Description Interest Group) and paleontological research (e.g., those led by the Paleobiology Database and the Integrative Paleobotany Portal project). Sustaining activities, like those of the Paleo Data Working Group, require consideration and regular attention. Support staff at iDigBio and collections staff focusing on digitization or data projects at their own institutions, as well as a consistent pool of drop-in and occasional participants, have been instrumental in maintaining momentum for the community of practice. Socializing can also help build the personal relationships necessary for maintaining momentum. To this extent, the Paleo Data Working Group Slack encourages friendly banter (e.g., the #pets-of-paleo channel), more general collections-related conversations (e.g., the #physical-space channel), and space for those with sub-interests to connect (e.g., the #morphology channel). While the focus of the group is on data, on an individual level, our group members find it useful to network on a wide variety of topics and this usefulness is critical to sustaining the community of practice. As we look forward to Digital Extended Specimen concepts and exciting developments in cyberinfrastructure for biodiversity data, communities of practice like that exemplified by the Paleo Data Working Group are essential for success. Creating FAIR (Findable, Accessible, Interoperable and Reusable) data requires buy-in from data providers, such as those in the paleo collections community. Even beyond FAIR, considering CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) data means embracing participation from a broad spectrum of perspectives, including those without informatics experience. Here, we provide insight into one model for creating such buy-in and participation.


Author(s):  
Lee Belbin ◽  
Arthur Chapman ◽  
John Wieczorek ◽  
Paula Zermoglio ◽  
Paul Morris

‘Data Quality Test and Assertions’ Task Group 2 (https://www.tdwg.org/community/bdq/tg-2/) has taken another year to clarify the 102 tests (https://github.com/tdwg/bdq/issues?q=is%3Aissue+is%3Aopen+label%3ATest). The original mandate to develop a core suite of tests that could be widely applied from data collection to user evaluation of aggregated data seemed straight-forward. Two years down the track, we have proven that to be incorrect. Among the final tests are complexities that none of the core group anticipated. For example, the need for a definition of ‘empty’ or the ‘Expected response’ from the test under various scenarios. The record-based tests apply to Darwin Core terms (https://dwc.tdwg.org/terms/) and have been classified as of type validation (66), amendment (29), notification (3) or measure (5). Validations test one or more Darwin Core terms against known characteristics, for example, VALIDATION_MONTH_NOTSTANDARD. Amendments may be applied to Darwin Core terms where we can unambiguously offer an improvement to the record, for example, AMENDMENT_MONTH_STANDARDIZED. Notifications are made where we believe a flag will help alert users to an issue that needs evaluation, for example, NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY. Measures are summaries of test outcomes at the record level, for example, MEASURE_AMENDMENTS_PROPOSED. We note that 41 require some parameters to be established at the time of test implementation, 20 tests require access to a currently accepted vocabulary and 3 tests rely on ISO/DCMI standards. The dependency on vocabularies to circumscribe permissible values for Darwin Core terms led to the establishment by Paula Zermoglio of DQ Task Group 4 (https://github.com/tdwg/bdq/tree/master/Vocabularies). A vocabulary of 154 terms that are associated with the tests and assertions have been developed. As at the time of writing this abstract, test data and demonstration code implementation of each test are yet to be completed. We hope these will be finalized by the time of this presentation.


2018 ◽  
Vol 2 ◽  
pp. e25929
Author(s):  
Christina Byrd

The Darwin Core data standard has rapidly become the go-to standard for biological and paleontological specimens. In order to accommodate all of the timescale data for paleontology specimens, standards for geologic age were developed and incorporated into Darwin Core. At the Sternberg Museum of Natural History (FHSM), digitization of the paleontology collection has been a primary objective. The adoption of the Darwin Core standard for FHSM’s paleontology data spurred the idea to use Darwin Core for the geology collection as well. There are currently no widely accepted data standards for geology specimens, but there are some organizations who have uploaded their data management standards online. Even though Darwin Core was developed for the dissemination of biological information, many of the data fields are applicable to geology. FHSM is working to adopt and adapt Darwin Core standards for its geology collection. FHSM currently has 84 fields to record geology data. Approximately sixty percent of these data fields directly correspond with Darwin Core terms and have been adopted with the corresponding data format. Seven percent of the fields correspond with Darwin Core terms but require adaptation by adding new shared language within the term. These fields include the classification of rocks and minerals and the addition of “geologicSpecimen” for the Darwin Core term “Basis Of Record”. Fortunately, minerals have a classification system that loosely resembles animal taxonomy. For example, quartz is a mineral species that is part of a group called Tectosilicates, which is subsequently grouped into Silicates. One quarter of the FHSM fields are specific to geology and do not fit within the current Darwin Core data set. When determining terminology for these fields, FHSM staff utilized the terms and standards set by the Open Geospatial Consortium (OGC), an international organization for making open standards for the global geospatial community. The terms adopted from the OGC come from a category called “EarthMaterial.” The remaining fields are specific to FHSM recordkeeping. In order to share these terms with others and hopefully start a larger conversation about data standards for this area of natural history, the terms and definitions will be made available on the FHSM website in the geology section. Using the same terms, formats, and overall standard across the disciplines at FHSM increases usability and uniformity of the different data sets, increases workflow efficiency, and simplifies development of the relational database for paleontological and geological specimens at FHSM.


Author(s):  
Fhatani Ranwashe ◽  
Marianne Le Roux

The e-Flora of South Africa project was initiated in 2013 by the South African National Biodiversity Institute (SANBI) in support of the Global Strategy for Plant Conservation (GSPC, 2011-2020). South Africa's flora consists of ca. 21,000 taxa of which more than half are endemic. South Africa will contribute a national Flora towards Target 1 of the GSPC ("To create an online flora of all known plants by 2020"). South Africa's contribution is ca. 6% of the world’s flora of which ca. 3% are endemic and therefore unique. South Africa’s electronic Flora is comprised of previously published descriptions. South Africa’s e-Flora data forms part of the Botanical Dataset of Southern Africa (BODATSA) that is currently managed through the Botanical Research And Herbarium Management System (BRAHMS). To date, South Africa’s e-Flora data (http://ipt.sanbi.org.za/iptsanbi/resource?r=flora_descriptions) represents 19,539 indigenous taxa, 79,139 descriptions of distribution, morphological, habitat and diagnostic data, and 27,799 bibliographic records. The e-Flora data was recently published online using the Integrated Publishing Toolkit and henceforth harvested by the World Flora Online (WFO) into the portal. A series of challenges were encountered while manipulating descriptive data from BRAHMS to be ingested by the WFO portal; from taxonomic issues to data quality issues not excluding compliance to data standards. To contribute to the WFO portal, the taxa in BODATSA has to match with the taxa in the WFO taxonomic backbone. Once there is a match, a unique WFO taxon identifier is assigned to the taxa in BODATSA. This process presented various challenges because the WFO taxonomic backbone and the taxonomic classification system that is used by South Africa (South African National Plant Checklist) does not fully correlate. The schema used to store taxonomic data also does not agree between BRAHMS and WFO and had to be addressed. To enable consistency for future, a detailed guideline document was created providing all the steps and actions that should be taken when publishing an e-Flora, managed in BRAHMS, to the WFO portal. The presentation will focus on matching taxonomic classifications between BRAHMS and WFO; dealing with character encoding issues and manipulating data to meet Darwin Core standards.


Author(s):  
Christian Köhler

Automated observations of natural occurrences play a key role in monitoring biodiversity worldwide. With the development of affordable hardware like the AudioMoth (Hill et al. 2019) acoustic logger, large scale and long-term monitoring has come within reach. However, data management and dissemination of monitoring data remain challenging, as the development of software and the infrastructure for the management of monitoring data lag behind. We want to fill this gap, providing a complete audio monitoring solution from affordable audio monitoring hardware, custom data management tools and storage infrastructure based on open source hard- and software, biodiversity information standards and integrable interfaces. The Scientific Monitoring Data Management and Online Repository (SIMON) consists of a portable data collector and a connected online repository. The data collector, a device for the automated extraction of the audio data from the audio loggers in the field, stores the data and metadata in an internal cache. Once connected to the internet via WiFi or a cable connection, the data are automatically uploaded to an online repository for automated analysis, annotation, data management and dissemination. To prevent SIMON from becoming yet another proprietary storage, the FAIR principles (Findable, Accessible, Interoperable, and Re-usable) Wilkinson et al. (2016) are at the very core of data managed in the online repository. We plan to offer an API (application programming interface) to disseminate data to established data infrastructures. A second API will allow the use of external services for data enrichment. While in the planning phase, we would like to take the opportunity to discuss with domain experts the requirements and implementation of different standards—namely ABCD (Access to Biological Collections Data task group, Biodiversity Information Standards (TDWG) 2007), Darwin Core (Darwin Core Task Group, Biodiversity Information Standards (TDWG) 2009) and Darwin Core Archive (Remsen et al. 2017)—connecting to external services and targeting data infrastructures.


Sign in / Sign up

Export Citation Format

Share Document