metadata record
Recently Published Documents


TOTAL DOCUMENTS

31
(FIVE YEARS 13)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mark Edward Phillips ◽  
Hannah Tarver

Purpose This study furthers metadata quality research by providing complementary network-based metrics and insights to analyze metadata records and identify areas for improvement. Design/methodology/approach Metadata record graphs apply network analysis to metadata field values; this study evaluates the interconnectedness of subjects within each Hub aggregated into the Digital Public Library of America. It also reviews the effects of NACO normalization – simulating revision of values for consistency – and breaking up pre-coordinated subject headings – to simulate applying the Faceted Application of Subject Terminology to Library of Congress Subject Headings. Findings Network statistics complement count- or value-based metrics by providing context related to the number of records a user might actually find starting from one item and moving to others via shared subject values. Additionally, connectivity increases through the normalization of values to correct or adjust for formatting differences or by breaking pre-coordinated subject strings into separate topics. Research limitations/implications This analysis focuses on exact-string matches, which is the lowest-common denominator for searching, although many search engines and digital library indexes may use less stringent matching methods. In terms of practical implications for evaluating or improving subjects in metadata, the normalization components demonstrate where resources may be most effectively allocated for these activities (depending on a collection). Originality/value Although the individual components of this research are not particularly novel, network analysis has not generally been applied to metadata analysis. This research furthers previous studies related to metadata quality analysis of aggregations and digital collections in general.


2021 ◽  
Vol 50 (2) ◽  
pp. 51-64
Author(s):  
András Simon ◽  
Péter Kiszl

Abstract During this research, the catalogues of more than 200 libraries and museums of Hungary and its neighboring countries were examined. The authors calculated the amount and the size of the metadata and of the full content records in the databases of their collection management systems, as well as the size and the type of the full content data and the size of the databases. By analyzing the results, the goal was to answer the following three questions: (1) Can any significant difference be established between the results according to country, nationality, or type of institution?; (2) How large is a metadata record or a full content record?; (3) Is it possible to establish a methodology for selecting a representative sample of institutions to facilitate further research? For planning the costs of data management, the size of the databases, the number of metadata records, and the variability of metadata and media records shall all be considered. A distinction should be made between the indispensable “primary” data to be preserved for a long time, and the “secondary” data units which are derived from the primary data. It is investigated in this article how to establish the size of primary data in the databases of collection management systems.


2021 ◽  
Author(s):  
Ionut Iosifescu Enescu ◽  
Gian-Kasper Plattner ◽  
Lucia Espona Pernas ◽  
Dominik Haas-Artho ◽  
Rebecca Buchholz

<p>Environmental research data from the Swiss Federal Research Institute WSL, an Institute of the ETH Domain, is published through the environmental data portal EnviDat (https://www.envidat.ch). EnviDat actively implements the FAIR (Findability, Accessibility, Interoperability and Reusability) principles and offers guidance and support to researchers throughout the research data publication process.</p><p>WSL strives to increase the fraction of environmental data easily available for reuse in the public domain. At the same time, WSL facilitates the publication of high-quality environmental research datasets by providing an appropriate infrastructure, a formal publication process and by assigning Document Object Identifiers (DOIs) and appropriate citation information.</p><p>Within EnviDat, we conceptualize and implement data publishing workflows that include automatic validation, interactive quality checks, and iterative improvement of metadata quality. The data publication workflow encompasses a number of steps, starting from the request for a DOI, to an approval process with a double-checking principle, and the submission of the metadata-record to DataCite for the final data publication. This workflow can be viewed as a decentralized peer-review and quality improvement process for safeguarding the quality of published environmental datasets. The workflow is being further developed and refined together with partner institutions within the ETH Domain.</p><p>We have defined and implemented additional features in EnviDat, such as (i) in-depth tracing of data provenance through related datasets; (ii) the ability to augment published research data with additional resources which support open science such as model codes and software; and (iii) a DataCRediT mechanism designed for specifying data authorship (Collection, Validation, Curation, Software, Publication, Supervision).</p><p>We foresee that these developments will help to further improve approaches targeted at modern documentation and exchange of scientific information. This is timely given the increasing expectations that institutions and researchers have towards capabilities of research data portals and repositories in the environmental domain.</p>


Author(s):  
James Farrow

The Next Generation Linkage Management System (NGLMS) was designed around keeping all data in a graph database. However, this constraint, while easily achievable for greenfield projects and/or new data linkage units, may not be easily met where legacy data exists. Objectives and ApproachThe NGLMS was extended to encompassed system where data was held partially or even completely in a relational database. By grouping the data managed by the NGLMS into system metadata, record link data and record data and allowing system metadata and record data to be stored separately and independently in either a relational or a graph database, the NGLMS allows hybrid installations of mixed graph and relational data and, with some loss of functionality, purely relational installations. ResultsThe functionality of the NGLMS was expanded to allow review of existing legacy data stored in a relational database system. Through minor changes to the server used by the NGLMS Clerical Review tool (NGLMS-CR) the review tools was able to present the same interface and allow the same integration as project stored completely within the graph database. Hybrid projects where link information was stored in a graph could be accommodated with no loss of functionality.Relational-only projects allowed clerical review of identified clusters in a manner identical to the graph-only NGLMS but involved some curtailment of the advanced clustering, time-slicing, compositional and concurrent functionality of the NGLMS due to the loss of the functionality provided by a graph database. They do provide an upgrade pathway to hybrid projects and then graph-only projects. Conclusion / ImplicationsAllowing the NGLMS to be configured as a hybrid system enables a gradual adoption of the NGLMS toolset and software. purely relational data still allows the use of the NGLMS-CR with customisable workpools and workflows a hybrid relational/graph system where record data is still kept in a relational store but cluster and linkage information is kept in the graph store allows the use of legacy data without major disruption and provides a pathway to full adoption


2020 ◽  
Vol 14 (4) ◽  
pp. e020015
Author(s):  
Filipi Miranda Soares ◽  
Benildes Coura Moreira dos Santos Maculan ◽  
Debora Pignatari Drucker ◽  
Antonio Mauro Saraiva

This research aims to propose principles to creating a metadata extension to the Darwin Core standard that addresses the agrobiodiversity data, with a thematic scope on ecological interactions. These principles have been compiled from the scientific literature, giving special attention to recommendations of the DCMI Abstract Model, which outlines the principles for creating metadata. The DCMI Abstract Model governs the creation of the Dublin Core metadata standard upon which Darwin Core is based. The requirements of ISO/IEC 11179-4/2004 standard for the definition of metadata were also taken into consideration. The research is in progress, so what is exposed in this article are preliminary results. A prototype of a metadata record for the field of ecological interactions, which is the scope of research within agrobiodiversity, was created to demonstrate the format that metadata will have when the extension is finalized. This research represents an initial effort to propose more effective tools for agrobiodiversity data management, but it is necessary to mature and deepen the discussions around the conceptual aspects of the ecological interactions in agrobiodiversity and the relationship of the new metadata extension with the term set of the Darwin Core, as well a robust methodology to create DwC extensions is still pending of being developed.


2020 ◽  
Author(s):  
Eugene Burger ◽  
Benjamin Pfeil ◽  
Kevin O'Brien ◽  
Linus Kamb ◽  
Steve Jones ◽  
...  

<p>Data assembly in support of global data products, such as GLODAP, and submission of data to national data centers to support long-term preservation, demands significant effort. This is in addition to the effort required to perform quality control on the data prior to submission. Delays in data assembly can negatively affect the timely production of scientific indicators that are dependent upon these datasets, including products such as GLODAP. What if data submission, metadata assembly and quality control can all be rolled into a single application? To support more streamlined data management processes in the NOAA Ocean Acidification Program (OAP) we are developing such an application.This application has the potential for application towards a broader community.</p><p>This application addresses the need that data contributing to analysis and synthesis products are high quality, well documented, and accessible from the applications scientists prefer to use. The Scientific Data Integration System (SDIS) application developed by the PMEL Science Data Integration Group, allows scientists to submit their data in a number of formats. Submitted data are checked for common errors. Metadata are extracted from the data that can then be complemented with a complete metadata record using the integrated metadata entry tool that collects rich metadata that meets the Carbon science community requirements. Still being developed, quality control for standard biogeochemical parameters will be integrated into the application. The quality control routines will be implemented in close collaboration with colleagues from the Bjerknes Climate Data Centre (BCDC) within the Bjerknes Centre for Climate Research (BCCR).  This presentation will highlight the capabilities that are now available as well as the implementation of the archive automation workflow, and it’s potential use in support of GLODAP data assembly efforts.</p>


2019 ◽  
Vol 11 (4) ◽  
pp. 1883-1903 ◽  
Author(s):  
Melita Keywood ◽  
Paul Selleck ◽  
Fabienne Reisen ◽  
David Cohen ◽  
Scott Chambers ◽  
...  

Abstract. The Sydney Particle Study involved the comprehensive measurement of meteorology, particles and gases at a location in western Sydney during February–March 2011 and April–May 2012. The aim of this study was to increase scientific understanding of particle formation and transformations in the Sydney airshed. In this paper we describe the methods used to collect and analyse particle and gaseous samples, as well as the methods employed for the continuous measurement of particle concentrations, particle microphysical properties, and gaseous concentrations. This paper also provides a description of the data collected and is a metadata record for the data sets published in Keywood et al. (2016a, https://doi.org/10.4225/08/57903B83D6A5D) and Keywood et al. (2016b, https://doi.org/10.4225/08/5791B5528BD63).


Author(s):  
Brenda Daly

The South African National Biodiversity Institute is the custodian of numerous national level botanical and zoological datasets that have been collated over several decades and is mandated to ensure that taxonomic and ecological data are made available to the public through responsible data sharing. This study describes the nature of, and presents/discusses relevant standards for, the case study of the National Vegetation Database; the process adopted in the development of a vegetation-plot database; and current data management practices being undertaken in relation to the various stages of research data management. Phytosociological data is a record of vegetation abundance, richness, density and the associated environmental variables within a specified area or plot which usually includes a record of locality. The study aims to review the diversity of approaches in storing species-plot information in databases and to provide minimum data standards for these datasets. The surveying, classifying, and mapping of vegetation enables monitoring of ecosystems and ultimately can lead to improved conservation planning and land management. A coordinated and integrated approach is therefore needed to record, rectify, and manage these data and capture accurate metadata. Preliminary findings indicate that a lack of version control can impact the authenticity of the data if records are altered or deleted. Data affluence/abundance (currently comprised of 53 500 plots within 384 sample projects, totalling 1 064 770 species occurrence records) is a challenge because these data often differ in formats, varying methodologies, and metadata within these research projects. The curation of plot data requires a standardised approach in the different steps from data acquisition to provision of results. Species names need to coincide with currently accepted taxonomy, and although certain details are specific to a species-plot project depending on their research interest, various other data should be made consistent in terms of field names and formats to improve the quality of the resulting aggregated set of botanical records. All decisions to modify data records to achieve data consistency should be clearly explained in the metadata record for the dataset.


Sign in / Sign up

Export Citation Format

Share Document