Paper: Metadata Quality in Time of Diverse Research Outputs

Author(s):  
Martyn Rittman
Keyword(s):  
2021 ◽  
Vol 10 (1) ◽  
pp. 30
Author(s):  
Alfonso Quarati ◽  
Monica De Martino ◽  
Sergio Rosim

The Open Government Data portals (OGD), thanks to the presence of thousands of geo-referenced datasets, containing spatial information are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered as hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. In addition, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mark Edward Phillips ◽  
Hannah Tarver

Purpose This study furthers metadata quality research by providing complementary network-based metrics and insights to analyze metadata records and identify areas for improvement. Design/methodology/approach Metadata record graphs apply network analysis to metadata field values; this study evaluates the interconnectedness of subjects within each Hub aggregated into the Digital Public Library of America. It also reviews the effects of NACO normalization – simulating revision of values for consistency – and breaking up pre-coordinated subject headings – to simulate applying the Faceted Application of Subject Terminology to Library of Congress Subject Headings. Findings Network statistics complement count- or value-based metrics by providing context related to the number of records a user might actually find starting from one item and moving to others via shared subject values. Additionally, connectivity increases through the normalization of values to correct or adjust for formatting differences or by breaking pre-coordinated subject strings into separate topics. Research limitations/implications This analysis focuses on exact-string matches, which is the lowest-common denominator for searching, although many search engines and digital library indexes may use less stringent matching methods. In terms of practical implications for evaluating or improving subjects in metadata, the normalization components demonstrate where resources may be most effectively allocated for these activities (depending on a collection). Originality/value Although the individual components of this research are not particularly novel, network analysis has not generally been applied to metadata analysis. This research furthers previous studies related to metadata quality analysis of aggregations and digital collections in general.


Author(s):  
Maria Vardaki ◽  
Haralambos Papageorgiou

Quality was defined in the ISO (International Organization for Standardization) 8402-1986 standard as “the totality of features and characteristics of a product or service that bear on its ability to satisfy stated or implied needs,” which slightly changed in ISO updates. However, regarding quality in statistics, “stated or implied needs” are mainly identified by considering several quality dimensions, criteria, or components for the collection, processing, and dissemination of statistical information for the public (see, for example, Eurostat, 2002a, 2002b; Office of Management and Budget [OMB], 2002; Organization for Economic Cooperation and Development [OECD], 2003; Statistics Canada, 2003; Statistics Finland, 2002).


2020 ◽  
pp. 016555152096104
Author(s):  
Alfonso Quarati ◽  
Juliana E Raffaghelli

Open research data (ORD) have been considered a driver of scientific transparency. However, data friction, as the phenomenon of data underutilisation for several causes, has also been pointed out. A factor often called into question for ORD low usage is the quality of the ORD and associated metadata. This work aims to illustrate the use of ORD, published by the Figshare scientific repository, concerning their scientific discipline, their type and compared with the quality of their metadata. Considering all the Figshare resources and carrying out a programmatic quality assessment of their metadata, our analysis highlighted two aspects. First, irrespective of the scientific domain considered, most ORD are under-used, but with exceptional cases which concentrate most researchers’ attention. Second, there was no evidence that the use of ORD is associated with good metadata publishing practices. These two findings opened to a reflection about the potential causes of such data friction.


Sign in / Sign up

Export Citation Format

Share Document