Are the FAIR Data Principles fair?

This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories. The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR - Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world. The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines1are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance. This practice paper is connected to a dataset(Dunning et al.,2017) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets. The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results. The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture. (1) H2020 Guidelines on FAIR Data Management:http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

Download Full-text

Improving the Usability of Organizational Data Systems

International Journal of Digital Curation ◽

10.2218/ijdc.v16i1.592 ◽

2021 ◽

Vol 16 (1) ◽

pp. 21

Author(s):

Chung-Yi Hou ◽

Matthew S. Mayernik

Keyword(s):

User Interfaces ◽

User Study ◽

Data Repository ◽

Data Systems ◽

User Interactions ◽

Data Repositories ◽

Web Interfaces ◽

Design And Implementation ◽

Usability Assessment ◽

Assessment Techniques

For research data repositories, web interfaces are usually the primary, if not the only, method that data users have to interact with repository systems. Data users often search, discover, understand, access, and sometimes use data directly through repository web interfaces. Given that sub-par user interfaces can reduce the ability of users to locate, obtain, and use data, it is important to consider how repositories’ web interfaces can be evaluated and improved in order to ensure useful and successful user interactions. This paper discusses how usability assessment techniques are being applied to improve the functioning of data repository interfaces at the National Center for Atmospheric Research (NCAR). At NCAR, a new suite of data system tools is being developed and collectively called the NCAR Digital Asset Services Hub (DASH). Usability evaluation techniques have been used throughout the NCAR DASH design and implementation cycles in order to ensure that the systems work well together for the intended user base. By applying user study, paper prototype, competitive analysis, journey mapping, and heuristic evaluation, the NCAR DASH Search and Repository experiences provide examples for how data systems can benefit from usability principles and techniques. Integrating usability principles and techniques into repository system design and implementation workflows helps to optimize the systems’ overall user experience.

Download Full-text

How to (Easily) Extend the FAIRness of Existing Repositories

Data Intelligence ◽

10.1162/dint_a_00041 ◽

2020 ◽

Vol 2 (1-2) ◽

pp. 192-198 ◽

Cited By ~ 3

Author(s):

Mark Hahnel ◽

Dan Valen

Keyword(s):

Cloud Computing ◽

Data Repository ◽

Common Goal ◽

Data Repositories ◽

Feature Sets ◽

The Core ◽

Societal Needs ◽

Fair Principles ◽

First Time ◽

Research World

Data repository infrastructures for academics have appeared in waves since the dawn of Web technology. These waves are driven by changes in societal needs, archiving needs and the development of cloud computing resources. As such, the data repository landscape has many flavors when it comes to sustainability models, target audiences and feature sets. One thing that links all data repositories is a desire to make the content they host reusable, building on the core principles of cataloging content for economical and research speed efficiency. The FAIR principles are a common goal for all repository infrastructures to aim for. No matter what discipline or infrastructure, the goal of reusable content, for both humans and machines, is a common one. This is the first time that repositories can work toward a common goal that ultimately lends itself to interoperability. The idea that research can move further and faster as we un-silo these fantastic resources is an achievable one. This paper investigates the steps that existing repositories need to take in order to remain useful and relevant in a FAIR research world.

Download Full-text

Usage Patterns of Open Genomic Data

College & Research Libraries ◽

10.5860/crl-324 ◽

2013 ◽

Vol 74 (2) ◽

pp. 195-207 ◽

Cited By ~ 5

Author(s):

Jingfeng Xia ◽

Ying Liu

Keyword(s):

Large Scale ◽

Data Use ◽

Open Data ◽

Data Reuse ◽

Data Repository ◽

Biomedical Sciences ◽

Data Repositories ◽

Genome Expression ◽

Free Data ◽

Usage Patterns

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data to support their independent studies, it discovers that free data can support a wealth of high-quality investigations, that the rate of open data use keeps growing over the years, and that scholars in different countries show different rates of complying with data-sharing policies.

Download Full-text

CoreTrustSeal

Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare ◽

10.31263/voebm.v71i1.1981 ◽

2018 ◽

Vol 71 (1) ◽

pp. 162-170 ◽

Cited By ~ 5

Author(s):

Ingrid Dillo ◽

Lisa De Leeuw

Keyword(s):

Open Data ◽

Open Science ◽

Self Assessment ◽

Data Repositories ◽

Non Profit ◽

Digital Repositories ◽

Long Term Storage ◽

Fair Principles ◽

Core Characteristics

Open data and data management policies that call for the long-term storage and accessibility of data are becoming more and more commonplace in the research community. With it the need for trustworthy data repositories to store and disseminate data is growing. CoreTrustSeal, a community based and non-profit organisation, offers data repositories a core level certification based on the DSA-WDS Core Trustworthy Data Repositories Requirements catalogue and procedures. This universal catalogue of requirements reflects the core characteristics of trustworthy data repositories. Core certification involves an uncomplicated process whereby data repositories supply evidence that they are sustainable and trustworthy. A repository first conducts an internal self-assessment, which is then reviewed by community peers. Once the self-assessment is found adequate the CoreTrustSeal board certifies the repository with a CoreTrustSeal. The Seal is valid for a period of three years. Being a certified repository has several external and internal benefits. It for instance improves the quality and transparency of internal processes, increases awareness of and compliance with established standards, builds stakeholder confidence, enhances the reputation of the repository, and demonstrates that the repository is following good practices. It is also offering a benchmark for comparison and helps to determine the strengths and weaknesses of a repository. In the future we foresee a larger uptake through different domains, not in the least because within the European Open Science Cloud, the FAIR principles and therefore also the certification of trustworthy digital repositories holding data is becoming increasingly important. Next to that the CoreTrustSeal requirements will most probably become a European Technical standard which can be used in procurement (under review by the European Commission).

Download Full-text

Toward collaborative open data science in metabolomics using Jupyter Notebooks and cloud computing

Metabolomics ◽

10.1007/s11306-019-1588-0 ◽

2019 ◽

Vol 15 (10) ◽

Cited By ~ 7

Author(s):

Kevin M. Mendez ◽

Leighton Pritchard ◽

Stacey N. Reinke ◽

David I. Broadhurst

Keyword(s):

Cloud Computing ◽

Web Application ◽

Data Science ◽

Open Data ◽

Open Science ◽

Data Repository ◽

Data Repositories ◽

Fully Integrated ◽

Computing Platform ◽

Novices And Experts

Abstract Background A lack of transparency and reporting standards in the scientific community has led to increasing and widespread concerns relating to reproduction and integrity of results. As an omics science, which generates vast amounts of data and relies heavily on data science for deriving biological meaning, metabolomics is highly vulnerable to irreproducibility. The metabolomics community has made substantial efforts to align with FAIR data standards by promoting open data formats, data repositories, online spectral libraries, and metabolite databases. Open data analysis platforms also exist; however, they tend to be inflexible and rely on the user to adequately report their methods and results. To enable FAIR data science in metabolomics, methods and results need to be transparently disseminated in a manner that is rapid, reusable, and fully integrated with the published work. To ensure broad use within the community such a framework also needs to be inclusive and intuitive for both computational novices and experts alike. Aim of Review To encourage metabolomics researchers from all backgrounds to take control of their own data science, mould it to their personal requirements, and enthusiastically share resources through open science. Key Scientific Concepts of Review This tutorial introduces the concept of interactive web-based computational laboratory notebooks. The reader is guided through a set of experiential tutorials specifically targeted at metabolomics researchers, based around the Jupyter Notebook web application, GitHub data repository, and Binder cloud computing platform.

Download Full-text

PhenoMeNal: Processing and analysis of Metabolomics data in the Cloud

10.1101/409151 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kristian Peters ◽

James Bradbury ◽

Sven Bergmann ◽

Marco Capuccini ◽

Marta Cascante ◽

...

Keyword(s):

Data Analysis ◽

User Interfaces ◽

Large Scale ◽

Open Data ◽

Research Field ◽

Metabolome Analysis ◽

Data Repositories ◽

Metabolomics Data ◽

Web Interfaces ◽

Data Formats

AbstractBackgroundMetabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism’s metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent – and sometimes incompatible – analysis methods that are difficult to connect into a useful and complete data analysis solution.FindingsThe PhenoMeNal (Phenome and Metabolome aNalysis) e-infrastructure provides a complete, workflow-oriented, interoperable metabolomics data analysis solution for a modern infrastructure-as-a-service (IaaS) cloud platform. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project’s continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm.ConclusionsPhenoMeNal constitutes a keystone solution in cloud infrastructures available for metabolomics. It provides scientists with a ready-to-use, workflow-driven, reproducible and shareable data analysis platform harmonizing the software installation and configuration through user-friendly web interfaces. The deployed cloud environments can be dynamically scaled to enable large-scale analyses which are interfaced through standard data formats, versioned, and have been tested for reproducibility and interoperability. The flexible implementation of PhenoMeNal allows easy adaptation of the infrastructure to other application areas and ‘omics research domains.

Download Full-text

Street Network Models and Indicators for Every Urban Area in the World

10.31235/osf.io/f2dqc ◽

2020 ◽

Author(s):

Geoff Boeing

Keyword(s):

Open Source ◽

Urban Area ◽

Open Data ◽

Network Models ◽

Data Repository ◽

Street Network ◽

Data Repositories ◽

Urban Street ◽

The World ◽

Network Form

Cities worldwide exhibit a variety of street network patterns and configurations that shape human mobility, equity, health, and livelihoods. This study models and analyzes the street networks of each urban area in the world, using boundaries derived from the Global Human Settlement Layer. Street network data are acquired and modeled from OpenStreetMap with the open-source OSMnx software. In total, this study models over 160 million OpenStreetMap street network nodes and over 320 million edges across 8,914 urban areas in 178 countries, and attaches elevation and grade data. This article presents the study's reproducible computational workflow, introduces two new open data repositories of ready-to-use global street network models and calculated indicators, and discusses summary findings on street network form worldwide. It makes four contributions. First, it reports the methodological advances of this open-source workflow. Second, it produces an open data repository containing street network models for each urban area. Third, it analyzes these models to produce an open data repository containing street network form indicators for each urban area. No such global urban street network indicator dataset has previously existed. Fourth, it presents a summary analysis of urban street network form, reporting the first such worldwide results in the literature.

Download Full-text

Atmosphere Understanding for Humans Robots Interaction Based on SVR and Fuzzy Set

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2014.p0062 ◽

2014 ◽

Vol 18 (1) ◽

pp. 62-70 ◽

Cited By ~ 3

Author(s):

Kazuhiro Ohnishi ◽

◽

Fangyan Dong ◽

Kaoru Hirota

Keyword(s):

Support Vector Regression ◽

Fuzzy Set ◽

Support Vector ◽

Ongoing Research ◽

Individual Assessment ◽

Average Accuracy ◽

Multi Agent ◽

Entire Society ◽

The Subject ◽

The Individual

A method for understanding the atmosphere is proposed for humans-robots interactions in a multi-agent society, where the individual assessment of the atmosphere is estimated using a Support Vector Regression (SVR) method that represents the emotions of all agents and the atmosphere of the entire society is represented as a fuzzy set in a Fuzzy Atmosfield. This method provides the necessary information that allows each agent (human/robot) in the society to understand the differences between the objective characteristics of the atmosphere and the agent’s individual assessment of the subjective atmosphere and to make appropriate behavioral decisions thereafter. In the experiments, 13 scenarios are tested by four humans. The characteristics of the atmosphere are calculated by applying the proposed method to the emotion data from the four humans. The results are compared with the subjective atmosphere information from the four humans and it is found that the average accuracy reaches 90%. This proposal is planned in order to realize customized services for the humans-robots interactions in a “Multi-Agent Fuzzy Atmosfield,” which is the subject of the authors’ group’s ongoing research project.

Download Full-text

An Opinion on Strategic Directions for an Economy in the Capitalist Periphery: The Case of India

Journal of Economic Development Environment and People ◽

10.26458/jedep.v3i3.78 ◽

2014 ◽

Vol 3 (3) ◽

pp. 6

Author(s):

Sorab Sadri

Keyword(s):

Corporate Governance ◽

Business Ethics ◽

Corporate Culture ◽

Good Governance ◽

Western India ◽

Ongoing Research ◽

Industrial Sectors ◽

A Value ◽

The Subject ◽

The Individual

This paper is based on sixteen years of intensive examination and research into three Industrial Sectors (manufacturing, process and technology) of Western India and all observations contained herein are born out of and relate directly to those sectors. The premise upon which we stand is that if business ethics and corporate governance co-exists then with proper HR interventions a value centred corporate culture will very likely emerges and the journey towards achieving organisational excellence becomes that much easier. In the postgraduate textbook entitled Organisational Excellence through Business Ethics and Corporate Governance the authors had begun by defining ethics and stating that ethics was the precondition for generating value centred corporate cultures? They then delved deep into what ethics entrails and how it impacts the organisation as well as the individual within it. Thereafter they went in to the concept of Corporate Governance, defined it, viewed how it developed, examined how it was practiced overseas and then how it came to India. In this paper the authors attempt to show how good governance is based on ethics and how those who head the functions of People Management, Company Secretary, and Accountancy (cost and chartered) can gainfully use it for realising the larger interest of the organisations they belong to. To that extent, this paper is just what the title suggests: an opinion on the subject based on ongoing research

Download Full-text

Applying FAIR Principles to Plant Phenotypic Data Management in GnpIS

Plant Phenomics ◽

10.34133/2019/1671403 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15 ◽

Cited By ~ 12

Author(s):

C. Pommier ◽

C. Michotey ◽

G. Cornut ◽

P. Roumet ◽

E. Duchêne ◽

...

Keyword(s):

Open Science ◽

International Standards ◽

Data Repository ◽

Data Types ◽

Science Data ◽

Phenotypic Data ◽

Data Repositories ◽

Wide Range ◽

Fair Principles ◽

Plant Trials

GnpIS is a data repository for plant phenomics that stores whole field and greenhouse experimental data including environment measures. It allows long-term access to datasets following the FAIR principles: Findable, Accessible, Interoperable, and Reusable, by using a flexible and original approach. It is based on a generic and ontology driven data model and an innovative software architecture that uncouples data integration, storage, and querying. It takes advantage of international standards including the Crop Ontology, MIAPPE, and the Breeding API. GnpIS allows handling data for a wide range of species and experiment types, including multiannual perennial plants experimental network or annual plant trials with either raw data, i.e., direct measures, or computed traits. It also ensures the integration and the interoperability among phenotyping datasets and with genotyping data. This is achieved through a careful curation and annotation of the key resources conducted in close collaboration with the communities providing data. Our repository follows the Open Science data publication principles by ensuring citability of each dataset. Finally, GnpIS compliance with international standards enables its interoperability with other data repositories hence allowing data links between phenotype and other data types. GnpIS can therefore contribute to emerging international federations of information systems.

Download Full-text