scholarly journals Making Heterogeneous Specimen Data ‘FAIR’: Implementing a digital specimen repository

Author(s):  
Abraham Nieva de la Hidalga ◽  
Alex Hardisty

The definition of a digital specimen is proposed to encompass the digital representation(s) of physical specimens from natural science collections. The digital specimen concept is intended to define a representation (digital object) that brings together an array of heterogeneous data types, which are themselves alternative physical specimen representations. In this case, the digital specimen (DS) holds references to specimen data from a collection management system, images, 3D models, research articles, DNA sequences, collector information, among many other data types. The proposal is to create persistent relationships between the DS and other categories of digital objects (e.g. resource types mentioned above, collections, storage platforms, organisations, databases, and provenance data). Complying with FAIR data principles (findability, accessibility, interoperability, and reuse), i.e., achieving data ‘FAIRness’, eases data integration, which is needed for cross-disciplinary linking and combination of data from different domains, making the DS as a comprehensive package of information about a specimen. Implementation and access to a digital specimen repository (DSR) as a Digital Object Architecture (Sharp 2016) component demonstrates the alignment of the DS concept and FAIR data principles (Wilkinson et al. 2016, Kahn and Wilensky 2006). The DSR fulfills four roles: data producer, resource manager, data publisher, and collaboration space. As data producer, the DSR allows acquisition and curation (indexing, storage) of DSs linking primary data, models, analyses, and other digital object types. As resource manager, the DSR manages access to distributed platforms, ranging from acquisition networks (digitisation stations, museums, herbariums) to processing services, advanced computational resources, data asset storage systems, and specialised servers. As data publisher, the DSR provides access to data assets from national and transnational data archives. As collaboration space, the DSR supports users’ accessing, sharing and (re)using data assets, and derived data products and services. Adopting the collaboration space and data publisher roles, the DSR implements interfaces that expose the DSs to the research community, fulfilling the FAIR findability, accessibility, and reuse principles. Adopting the data producer and resource manager roles, the DSR creates meaningful and persistent relationships required to link DSs and other types of digital objects, fulfilling the FAIR interoperability principle. A prototype DSR based on the Cordra digital object repository has been deployed (Corporation for National Research Initiatives (CNRI) 2018, Reilly and Tupelo-Schneck 2010). The advantages of Cordra are: rapid deployment, customisable object model, creation of relations between digital objects, and application program interfaces for programmatic access. Rapid deployment of the DSR provides a tangible target for discussing the implementation of the DS concept. The customisable object model enables the refinement and enhancing of the definition of DS in response to feedback from colleagues who have accessed the DSR and used its contents. Creating relations between digital objects enables flexible linking to digital objects stored in different repositories. Accessing the DSR programmatically through APIs enables extending the use of the repository in different platforms (e.g. mobile devices) as well as integration with other repositories and services. As well as supporting a HTTP-oriented API, Cordra implements Digital Object Interface Protocol (DONA Foundation 2018), allowing the definition of operations to act directly on selected DSs in the repository. The DSR prototype has been demonstrated by providing access to the repository administrative interface and with a custom interface designed to facilitate access by different user groups, such as collection curators, researchers, teachers, and students. The client interface has been designed to demonstrate a subset of the functionalities derived from user stories, which describe software features from the end-user perspective. Demonstrating the DSR capabilities as proposed, will inform the refinement of the design of the DS model and provide early feedback about the needed software features.

2021 ◽  
Vol 3 ◽  
Author(s):  
Robert D. Stevenson ◽  
Todd Suomela ◽  
Heejun Kim ◽  
Yurong He

Data quality (DQ) is a major concern in citizen science (CS) programs and is often raised as an issue among critics of the CS approach. We examined CS programs and reviewed the kinds of data they produce to inform CS communities of strategies of DQ control. From our review of the literature and our experiences with CS, we identified seven primary types of data contributions. Citizens can carry instrument packages, invent or modify algorithms, sort and classify physical objects, sort and classify digital objects, collect physical objects, collect digital objects, and report observations. We found that data types were not constrained by subject domains, a CS program may use multiple types, and DQ requirements and evaluation strategies vary according to the data types. These types are useful for identifying structural similarities among programs across subject domains. We conclude that blanket criticism of the CS data quality is no longer appropriate. In addition to the details of specific programs and variability among individuals, discussions can fruitfully focus on the data types in a program and the specific methods being used for DQ control as dictated or appropriate for the type. Programs can reduce doubts about their DQ by becoming more explicit in communicating their data management practices.


2009 ◽  
Vol 4 (1) ◽  
pp. 71-83
Author(s):  
Ronald Jantz

In the future, a scholar or researcher will want to know that a digital object is trusted - that it is authentic and reliable.  Digital objects can be surrogates, resulting from a digitization process, or they can be objects whose only form is digital.  Much has been accomplished in existing open source digital library platforms to provide capabilities for preserving digital objects including now ubiquitous features such as persistent identifiers, integrity checks, audit trails, and versioning.  However, achieving a level of digital object authenticity will require a multi-dimensional approach involving policies, processes, and continued technological innovation.  This paper proposes steps that the institution can take to insure the availability of authentic digital objects in the future.  In this proposal, authenticity is based on definitions from archival diplomatics and relies on methods from public key cryptography for digitally signing an object with a secure time stamp. Trustworthy processes, re-definition of traditional roles, and the implementation of technologies to support authenticity are all required to meet the needs of digital scholarship.  Implementation and policy issues are discussed with specific attention to transformations required of the archival institution and the professional archivist.


Author(s):  
Wouter Addink ◽  
Alex Hardisty

In a Biodiversity_Next 2019 symposium, a vision of Digital Specimens based on the concept of a Digital Object Architecture (Kahn and Wilensky 2006) (DOA) was discussed as a new layer between data infrastructure of natural science collections and user applications for processing and interacting with information about specimens and collections. This vision would enable the transformation of institutional curatorial practises into joint community curation of the scientific data by providing seamless global access to specimens and collections spanning multiple collection-holding institutions and sources. A DOA-based implementation (Lannom et al. 2020) also offers wider, more flexible, and ‘FAIR’ (Findable, Accessible, Interoperable, Reusable) access for varied research and policy uses: recognising curatorial work, annotating with latest taxonomic treatments, understanding variations, working with DNA sequences or chemical analyses, supporting regulatory processes for health, food, security, sustainability and environmental change, inventions/products critical to the bio-economy, and educational uses. To make this vision a reality, a specification is needed that describes what a Digital Specimen is, and how to technically implement it. This specification is named 'openDS' for open Digital Specimen. It needs to describe how machines and humans can act on a Digital Specimen and gain attribution for their work; how the data can be serialized and packaged; and it needs to describe the object model (the scientific content part and its structure). The object model should describe how to include the specimen data itself as well as all data derived from the specimen, which is in principle the same as what the Extended Specimen model aims to describe. This part will therefore be developed in close collaboration with people working on that model. After the Biodiversity_Next symposium, the idea of a standard for Digital Specimens has been further discussed and detailed in a MOBILISE Workshop in Warsaw, 2020, with stakeholders like the GBIF, iDigBio, CETAF and DiSSCo. The workshop examined the technical basis of the new specification, agreed on scope and structure of the new specification and laid groundwork for future activities in the Research Data Alliance (RDA), Biodiversity Information Standards (TDWG), and technical workshops. A working group in the DiSSCo Prepare project has begun on the technical specification of the ‘open Digital Specimen’ (openDS). This specification will provide the definition of what a Digital Specimen is, its logical structure and content, and the operations permitted on that. The group is also working on a document with frequently asked questions. Realising the vision of Digital Specimen on a global level requires openDS to become a new TDWG standard and to be aligned with the vision for Extended Specimens. A TDWG Birds-of-a-Feather working session in September 2020 discusses and plans this further. The object model will include concepts from ABCD 3.0 and EFG extension for geo-sciences, and also extend from bco:MaterialSample in the OBO Foundry’s Biological Collection Ontology (BCO), which is linked to Darwin Core and from iao:InformationContentEntity in OBO Foundry's Information Artifact Ontology (IAO). openDS will also make use of the RDA/TDWG attribution metadata recommendation and other RDA recommendations. A publication is in preparation that describes the relationship with RDA recommendations in more detail, which will also be presented in the TDWG symposium.


2001 ◽  
Vol 30 (1) ◽  
Author(s):  
Stephen Chapman

The human-perceived content of any digital object is an artifact of systems, not an amalgamation of fixed physical attributes. Many of these objects are designed to have multiple renderings to optimize content for specific tasks, such as searching, sorting, analyzing, or reading. Using digital technology either to reproduce historic materials or to preserve original digital objects requires a definition of information integrity that accommodates function as well as appearance. The choice of encoding is critical to enable specified functionality, but the choice of repository is likely to be


2020 ◽  
Vol 21 (24) ◽  
pp. 9461
Author(s):  
Aurora Savino ◽  
Paolo Provero ◽  
Valeria Poli

Biological systems respond to perturbations through the rewiring of molecular interactions, organised in gene regulatory networks (GRNs). Among these, the increasingly high availability of transcriptomic data makes gene co-expression networks the most exploited ones. Differential co-expression networks are useful tools to identify changes in response to an external perturbation, such as mutations predisposing to cancer development, and leading to changes in the activity of gene expression regulators or signalling. They can help explain the robustness of cancer cells to perturbations and identify promising candidates for targeted therapy, moreover providing higher specificity with respect to standard co-expression methods. Here, we comprehensively review the literature about the methods developed to assess differential co-expression and their applications to cancer biology. Via the comparison of normal and diseased conditions and of different tumour stages, studies based on these methods led to the definition of pathways involved in gene network reorganisation upon oncogenes’ mutations and tumour progression, often converging on immune system signalling. A relevant implementation still lagging behind is the integration of different data types, which would greatly improve network interpretability. Most importantly, performance and predictivity evaluation of the large variety of mathematical models proposed would urgently require experimental validations and systematic comparisons. We believe that future work on differential gene co-expression networks, complemented with additional omics data and experimentally tested, will considerably improve our insights into the biology of tumours.


2008 ◽  
Vol 191 (1) ◽  
pp. 91-99 ◽  
Author(s):  
Marc Deloger ◽  
Meriem El Karoui ◽  
Marie-Agnès Petit

ABSTRACT The fundamental unit of biological diversity is the species. However, a remarkable extent of intraspecies diversity in bacteria was discovered by genome sequencing, and it reveals the need to develop clear criteria to group strains within a species. Two main types of analyses used to quantify intraspecies variation at the genome level are the average nucleotide identity (ANI), which detects the DNA conservation of the core genome, and the DNA content, which calculates the proportion of DNA shared by two genomes. Both estimates are based on BLAST alignments for the definition of DNA sequences common to the genome pair. Interestingly, however, results using these methods on intraspecies pairs are not well correlated. This prompted us to develop a genomic-distance index taking into account both criteria of diversity, which are based on DNA maximal unique matches (MUM) shared by two genomes. The values, called MUMi, for MUM index, correlate better with the ANI than with the DNA content. Moreover, the MUMi groups strains in a way that is congruent with routinely used multilocus sequence-typing trees, as well as with ANI-based trees. We used the MUMi to determine the relatedness of all available genome pairs at the species and genus levels. Our analysis reveals a certain consistency in the current notion of bacterial species, in that the bulk of intraspecies and intragenus values are clearly separable. It also confirms that some species are much more diverse than most. As the MUMi is fast to calculate, it offers the possibility of measuring genome distances on the whole database of available genomes.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Gianluca Solazzo ◽  
Ylenia Maruccia ◽  
Gianluca Lorenzo ◽  
Valentina Ndou ◽  
Pasquale Del Vecchio ◽  
...  

Purpose This paper aims to highlight how big social data (BSD) and analytics exploitation may help destination management organisations (DMOs) to understand tourist behaviours and destination experiences and images. Gathering data from two different sources, Flickr and Twitter, textual and visual contents are used to perform different analytics tasks to generate insights on tourist behaviour and the affective aspects of the destination image. Design/methodology/approach This work adopts a method based on a multimodal approach on BSD and analytics, considering multiple BSD sources, different analytics techniques on heterogeneous data types, to obtain complementary results on the Salento region (Italy) case study. Findings Results show that the generated insights allow DMOs to acquire new knowledge about discovery of unknown clusters of points of interest, identify trends and seasonal patterns of tourist demand, monitor topic and sentiment and identify attractive places. DMOs can exploit insights to address its needs in terms of decision support for the management and development of the destination, the enhancement of destination attractiveness, the shaping of new marketing and communication strategies and the planning of tourist demand within the destination. Originality/value The originality of this work is in the use of BSD and analytics techniques for giving DMOs specific insights on a destination in a deep and wide fashion. Collected data are used with a multimodal analytic approach to build tourist characteristics, images, attitudes and preferred destination attributes, which represent for DMOs a unique mean for problem-solving, decision-making, innovation and prediction.


2021 ◽  
Vol 2 (2) ◽  
pp. 95-103
Author(s):  
Trie Nadia Ayu Lizara ◽  
Timbul Simangunsong

The Influence of the Implementation of the Annual Entity E-SPT, Understanding Taxation, Tax Awareness Awareness of Taxpayer Compliance of the corporate taxpayers in reporting as the agency annual. In this study using quantitative data types and data sources, namely primary data. Primary data were obtained from questionnaires distributed to corporate taxpayers at random, using the purposive sampling method at the West Jakarta Middle Tax Office. The number of questionnaires distributed was 100 questionnaires. The results of this study indicate that the Application of the Annual Annual E-SPT of the Agency has a significant effect on taxpayer compliance and taxpayer awareness. While Understanding Taxation has no significant effect on Taxpayer Compliance.


Sign in / Sign up

Export Citation Format

Share Document