scholarly journals Unlocking Inventory Data Capture, Sharing and Reuse: The Humboldt Extension to Darwin Core

Author(s):  
Yanina Sica ◽  
Paula Zermoglio

Biodiversity inventories, i.e., recording multiple species at a specific place and time, are routinely performed and offer high-quality data for characterizing biodiversity and its change. Digitization, sharing and reuse of incidental point records (i.e., records that are not readily associated with systematic sampling or monitoring, typically museum specimens and many observations from citizen science projects) has been the focus for many years in the biodiversity data community. Only more recently, attention has been directed towards mobilizing data from both new and longstanding inventories and monitoring efforts. These kinds of studies provide very rich data that can enable inferences about species absence, but their reliability depends on the methodology implemented, the survey effort and completeness. The information about these elements has often been regarded as metadata and captured in an unstructured manner, thus making their full use very challenging. Unlocking and integrating inventory data requires data standards that can facilitate capture and sharing of data with the appropriate depth. The Darwin Core standard (Wieczorek et al. 2012) currently enables reporting some of the information contained in inventories, particularly using Darwin Core Event terms such as samplingProtocol, sampleSizeValue, sampleSizeUnit, samplingEffort. However, it is limited in its ability to accommodate spatial, temporal, and taxonomic scopes, and other key aspects of the inventory sampling process, such as direct or inferred measures of sampling effort and completeness. The lack of a standardized way to share inventory data has hindered their mobilization, integration, and broad reuse. In an effort to overcome these limitations, a framework was developed to standardize inventory data reporting: Humboldt Core (Guralnick et al. 2018). Humboldt Core identified three types of inventories (single, elementary, and summary inventories) and proposed a series of terms to report their content. These terms were organized in six categories: dataset and identification; geospatial and habitat scope; temporal scope; taxonomic scope; methodology description; and completeness and effort. While originally planned as a new TDWG standard and being currently implemented in Map of Life (https://mol.org/humboldtcore/), ratification was not pursued at the time, thus limiting broader community adoption. In 2021 the TDWG Humboldt Core Task Group was established to review how to best integrate the terms proposed in the original publication with existing standards and implementation schemas. The first goal of the task group was to determine whether a new, separate standard was needed or if an extension to Darwin Core could accommodate the terms necessary to describe the relevant information elements. Since the different types of inventories can be thought of as Events with different nesting levels (events within events, e.g., plots within sites), and after an initial mapping to existing Darwin Core terms, it was deemed appropriate to start from a Darwin Core Event Core and build an extension to include Humboldt Core terms. The task group members are currently revising all original Humboldt Core terms, reformulating definitions, comments, and examples, and discarding or adding new terms where needed. We are also gathering real datasets to test the use of the extension once an initial list of revised terms is ready, before undergoing a public review period as established by the TDWG process. Through the ratification of Humboldt Core as a TDWG extension, we expect to provide the community with a solution to share and use inventory data, which improves biodiversity data discoverability, interoperability and reuse while lowering the reporting burden at different levels (data collection, integration and sharing).

2021 ◽  
Author(s):  
Colombine Verzat ◽  
Jasmine Harley ◽  
Rickie Patani ◽  
Raphaëlle Luisier

SUMMARYAlthough morphological attributes of cells and their substructures are recognized readouts of physiological or pathophysiological states, these have been relatively understudied in amyotrophic lateral sclerosis (ALS) research. In this study we integrate multichannel fluorescence high-content microscopy data with deep-learning imaging methods to reveal - directly from unsegmented images - novel neurite-associated morphological perturbations associated with (ALS-causing) VCP-mutant human motor neurons (MNs). Surprisingly, we reveal that previously unrecognized disease-relevant information is withheld in broadly used and often considered ‘generic’ biological markers of nuclei (DAPI) and neurons (βIII-tubulin). Additionally, we identify changes within the information content of ALS-related RNA binding protein (RBP) immunofluorescence imaging that is captured in VCP-mutant MN cultures. Furthermore, by analyzing MN cultures exposed to different extrinsic stressors, we show that heat stress recapitulates key aspects of ALS. Our study therefore reveals disease-relevant information contained in a range of both generic and more specific fluorescent markers, and establishes the use of image-based deep learning methods for rapid, automated and unbiased testing of biological hypotheses.


2018 ◽  
Vol 2 ◽  
pp. e25608 ◽  
Author(s):  
Lee Belbin ◽  
Arthur Chapman ◽  
John Wieczorek ◽  
Paula Zermoglio ◽  
Alex Thompson ◽  
...  

Task Group 2 of the TDWG Data Quality Interest Group aims to provide a standard suite of tests and resulting assertions that can assist with filtering occurrence records for as many applications as possible. Currently ‘data aggregators’ such as the Global Biodiversity Information Facility (GBIF), the Atlas of Living Australia (ALA) and iDigBio run their own suite of tests over records received and report the results of these tests (the assertions): there is, however, no standard reporting mechanisms. We reasoned that the availability of an internationally agreed set of tests would encourage implementations by the aggregators, and at the data sources (museums, herbaria and others) so that issues could be detected and corrected early in the process. All the tests are limited to Darwin Core terms. The ~95 tests refined from over 250 in use around the world, were classified into four output types: validations, notifications, amendments and measures. Validations test one of more Darwin Core terms, for example, that dwc:decimalLatitude is in a valid range (i.e. between -90 and +90 inclusive). Notifications report a status that a user of the record should know about, for example, if there is a user-annotation associated with the record. Amendments are made to one or more Darwin Core terms when the information across the record can be improved, for example, if there is no value for dwc:scientificName, it can be filled in from a valid dwc:taxonID. Measures report values that may be useful for assessing the overall quality of a record, for example, the number of validation tests passed. Evaluation of the tests was complex and time-consuming, but the important parameters of each test have been consistently documented. Each test has a globally unique identifier, a label, an output type, a resource type, the Darwin Core terms used, a description, a dimension (from the Framework on Data Quality from TG1), an example, references, implementations (if any), test-prerequisites and notes. For each test, generic code is being written that should be easy for institutions to implement – be they aggregators or data custodians. A valuable product of the work of TG2 has been a set of general principles. One example is “Darwin Core terms are either: literal verbatim (e.g., dwc:verbatimLocality) and cannot be assumed capable of validation, open-ended (e.g., dwc:behavior) and cannot be assumed capable of validation, or bounded by an agreed vocabulary or extents, and therefore capable of validation (e.g., dwc:countryCode)”. Another is “criteria for including tests is that they are informative, relatively simple to implement, mandatory for amendments and have power in that they will not likely result in 0% or 100% of all record hits.” A third: “Do not ascribe precision where it is unknown.” GBIF, the ALA and iDigBio have committed to implementing the tests once they have been finalized. We are confident that many museums and herbaria will also implement the tests over time. We anticipate that demonstration code and a test dataset that will validate the code will be available on project completion.


Author(s):  
Mathias Dillen ◽  
Elspeth Haston ◽  
Nicole Kearney ◽  
Deborah L Paul ◽  
Joaquim Santos ◽  
...  

The natural history specimens of the world have been documented on paper labels, often physically attached to the specimen itself. As we transcribe these data to make them digital and more useful for analysis, we make interpretations. Sometimes these interpretations are trivial, because the label is unambiguous, but often the meaning is not so clear, even if it is easily read. One key element that suffers from considerable ambiguity is people’s names. Though a person is indivisible, their name can change, is rarely unique and can be written in many ways. Yet knowing the people associated with data is incredibly useful. Data on people can be used to validate other data, simplify data capture, link together data across domains, reduce duplication-of-effort and facilitate data-gap-analysis. In addition, people data enable the discovery of individuals unique to our collections, the collective charting of the history of scientific researchers and the provision of credit to the people who deserve it (Groom et al. 2020). We foresee a future where the people associated with collections are not ambiguous, are shared globally, and data of all kinds are linked through the people who generate them. The TDWG People in Biodiversity Data Task Group is therefore working on a guide to the disambiguation of people in natural history collections. The ultimate goal is to connect the various strings of characters on specimen labels and other documentation to persistent identifiers (PIDs) that unambiguously link a name “string” to the identity of a person. In working towards this goal, 150 volunteers in the Bionomia project have linked 21 million specimens to persistent identifiers for their collectors and determiners. An additional 2 million specimens with links to identifiers for people have already emerged directly from collections that make use of the recently ratified Darwin Core terms recordedByID and identifiedByID. Furthermore, the CETAF Botany Pilot conducted among a group of European herbaria and museums has connected over 1.4 million specimens to disambiguated collectors (Güntsch et al. 2021). Still, given the estimated 2 billion (Ariño 2010) natural history specimens globally, there is much more disambiguation to be done. The process of disambiguation starts with a trigger, which is often the transcription of a specimen’s label data. Unambiguous identification of the collector may facilitate this transcription, as it offers knowledge of their biographical details and collecting habits, allowing us to infer missing information such as collecting date or locality. Another trigger might be the flagging of inconsistent data during data entry or resulting from data quality processes, revealing for instance that multiple collectors have been conflated. A disambiguation trigger is followed by the gathering of data, then the evaluation of the results and finally by the documentation of the new information. Disambiguation is not always straightforward and there are many pitfalls. It requires access to biographical data, and identifiers to be minted. In the case of living people, they have to cooperate with being disambiguated and we have to follow legal and ethical guidelines. In the case of dead people, particularly those long dead, disambiguation may require considerable research. We will present the progress made by the People in Biodiversity Data Task Group and their recommendations for disambiguation in collections. We want to encourage other institutions to engage with a global effort of linking people to persistent identifiers to collaboratively improve all collection data.


Author(s):  
Lauren Weatherdon

Ensuring that we have the data and information necessary to make informed decisions is a core requirement in an era of increasing complexity and anthropogenic impact. With cumulative challenges such as the decline in biodiversity and accelerating climate change, the need for spatially-explicit and methodologically-consistent data that can be compiled to produce useful and reliable indicators of biological change and ecosystem health is growing. Technological advances—including satellite imagery—are beginning to make this a reality, yet uptake of biodiversity information standards and scaling of data to ensure its applicability at multiple levels of decision-making are still in progress. The complementary Essential Biodiversity Variables (EBVs) and Essential Ocean Variables (EOVs), combined with Darwin Core and other data and metadata standards, provide the underpinnings necessary to produce data that can inform indicators. However, perhaps the largest challenge in developing global, biological change indicators is achieving consistent and holistic coverage over time, with recognition of biodiversity data as global assets that are critical to tracking progress toward the UN Sustainable Development Goals and Targets set by the international community (see Jensen and Campbell (2019) for discussion). Through this talk, I will describe some of the efforts towards producing and collating effective biodiversity indicators, such as those based on authoritative datasets like the World Database on Protected Areas (https://www.protectedplanet.net/), and work achieved through the Biodiversity Indicators Partnership (https://www.bipindicators.net/). I will also highlight some of the characteristics of effective indicators, and global biodiversity reporting and communication needs as we approach 2020 and beyond.


Database ◽  
2018 ◽  
Vol 2018 ◽  
Author(s):  
Nico M Franz ◽  
Beckett W Sterner

Abstract Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors ‘at the source.’ We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies—frequently called ‘backbones’—they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an under-appreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, potentially leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e. unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.


2008 ◽  
Vol 53 (No. 4) ◽  
pp. 139-148 ◽  
Author(s):  
J. Saborowski ◽  
J. Cancino

A large virtual population is created based on the GIS data base of a forest district and inventory data. It serves as a population where large scale inventories with systematic and simple random poststratified estimators can be simulated and the gains in precision studied. Despite their selfweighting property, systematic samples combined with poststratification can still be clearly more efficient than unstratified systematic samples, the gain in precision being close to that resulting from poststratified over simple random samples. The poststratified variance estimator for the conditional variance given the within strata sample sizes served as a satisfying estimator in the case of systematic sampling. The differences between conditional and unconditional variance were negligible for all sample sizes analyzed.


2015 ◽  
Vol 825-826 ◽  
pp. 844-851 ◽  
Author(s):  
Arne Ziebell ◽  
Oskar Schöppl ◽  
Roland Haubner ◽  
Thomas Konegger

Hybrid ball bearings, consisting of metallic washers in combination with ceramic bearing balls, feature a variety of significant advantages in comparison to standard steel bearings, including mechanical properties and reduced friction during operation. Key aspects for a successful operation are a prevention of defects of both balls and washers, as well as the knowledge of critical and optimal operation parameters. This relevant information can be obtained through test rig trials, where vibration analysis has found to be a versatile and efficient tool for the characterization of the operational status. In this contribution, hybrid thrust ball bearings with Si3N4 balls are investigated. After an introduction of defined damages in different parts of the bearing, test rig trials were conducted, and the vibration behavior during operation was compared to new, unused bearings. The characteristic vibrational frequencies, obtained through a variety of software-based filter and analysis algorithms, were correlated with materialographic investigations of failed bearings. The proposed method was shown to yield valuable information about damage morphologies and, subsequently, about the status of the bearing during operation.


2000 ◽  
Vol 40 (1) ◽  
pp. 417 ◽  
Author(s):  
R.J. Seggie ◽  
R.B. Ainsworth ◽  
D.A.Johnson ◽  
J.P.M. Koninx ◽  
B. Spaargaren ◽  
...  

The Sunrise and Troubadour fields form a complex of giant gas-condensate accumulations located in the Timor Sea some 450 km northwest of Darwin. Left unappraised for almost a quarter of a century since discovery, recently renewed attention has brought these stranded hydrocarbon accumulations to the point of comm-ercialisation.A focussed appraisal program during 1997–1999 driven by expectations of growth in LNG and domestic gas markets, involved the acquisition and processing of an extensive grid of modern 2D seismic and the drilling, coring and testing of three wells. The aim of this program was to quantify better both in-place hydrocarbon volumes (reservoir properties and their distribution) and hydrocarbon recovery efficiency (gas quality and deliverability). Maximum value has been extracted from these data via a combination of deterministic and probabilistic methods, and the integration of analyses across all disciplines.This paper provides an overview of these efforts, describes the fields and details major subsurface uncertainties. Key aspects are:3D, object-based geological modelling of the reservoir, covering the spectrum of plausible sedimentological interpretations.Convolution of rock properties, derived from seismic (AVO) inversion, with 3D geological model realisations to define reservoir properties in inter-well areas.Incorporation of faults (both seismically mapped and probabilistically modelled sub-seismic faults) into both the static 3D reservoir models and the dynamic reservoir simulations.Interpretation of a tilted gas-water contact apparently arising from flow of water in the Plover aquifer away from active tectonism to the north.Extensive gas and condensate fluid analysis and modelling.Scenario-based approach to dynamic modelling.In summary, acquisition of an extensive suite of quality data during the past two-three years coupled with novel, integrated, state-of-the-art analysis of the subsurface has led to a major increase in estimates of potentially recoverable gas and condensate. Improved volumetric confidence in conjunction with both traditional and innovative engineering design (e.g. Floating Liquefied Natural Gas technology) has made viable a range of possible commercial developments from 2005 onwards.


Author(s):  
Shenda Hong ◽  
Cao Xiao ◽  
Trong Nghia Hoang ◽  
Tengfei Ma ◽  
Hongyan Li ◽  
...  

In many situations, we need to build and deploy separate models in related environments with different data qualities. For example, an environment with strong observation equipments (e.g., intensive care units) often provides high-quality multi-modal data, which are acquired from multiple sensory devices and have rich-feature representations. On the other hand, an environment with poor observation equipment (e.g., at home) only provides low-quality, uni-modal data with poor-feature representations. To deploy a competitive model in a poor-data environment without requiring direct access to multi-modal data acquired from a rich-data environment, this paper develops and presents a knowledge distillation (KD) method (RDPD) to enhance a predictive model trained on poor data using knowledge distilled from a high-complexity model trained on rich, private data. We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PR-AUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.


2019 ◽  
Vol 11 (5) ◽  
pp. 13531-13544 ◽  
Author(s):  
Devavrat Pawar ◽  
Howard P. Nelson ◽  
Divya R.L. Pawar ◽  
Sarika Khanwilkar

Reliable population estimate of apex predators, such as the Leopard Panthera pardus fusca, is important as they indicate ecosystem health, enable evaluation of the effectiveness of conservation efforts and provide a benchmark for future management decisions.  The present study is the first to estimate abundance of Leopard along with possible prey profile in Kuno Wildlife Sanctuary (KWLS), in central Madhya Pradesh (M.P.), India.  For systematic sampling, two study habitats, 15km² each, were identified, one close to the park entrance and the other away from the park entrance.  Sampling was carried out between March and April 2017, for a period of 18 days in each of the two study habitats, ‘good’ and ‘poor’, initially based on situation in reference to park-entry.  Each habitat was divided into five blocks each, and each block subdivided into three, 1km² observation units.  In all, 16 trail cameras were placed in pairs, one set at a time in five of the blocks, over a six–day period.  The total sampling effort was 180 trap-nights.  The trigger speed was set to 3 frames per 10 seconds, and repeated only after 20 minutes interval on infra-red detection of object.  The data was analysed using closed population capture–recapture analyses in Program MARK, to estimate Leopard abundance.  Seventy-eight Leopard detections representing eight unique individuals were found in the 30km² study site.  Seven Leopards were detected in the good habitat and one in the poor habitat. The estimate for Leopard abundance for the good habitat was 11 Leopards (SE 4.6, 95% CI = 8 – 31 individuals).  Due to limited captures/recaptures in the poor habitat, abundance could not be estimated for this habitat class.


Sign in / Sign up

Export Citation Format

Share Document