scholarly journals Alignment of biomedical data repositories with open, FAIR, citable and trust-worthy principles

2021 ◽  
Author(s):  
Fiona Murphy ◽  
Michael Bar-Sinai ◽  
Maryann E. Martone

AbstractIncreasing attention is being paid to the operation of biomedical data repositories in light of efforts to improve how scientific data is handled and made available for the long term. Simultaneously, groups around the world have been coming together to formalize principles that govern different aspects of open science and data sharing.The most well known are the FAIR data principles. These are joined by principles and practices that govern openness, citation, credit and good stewardship (trustworthiness). Together, these define a framework for data repositories to support Open, FAIR, Citable and Trustworthy (OFCT) data. Here we developed an instrument using the open source PolicyModels toolkit that attempts to operationalize key aspects of OFCT principles and applied the instrument to eight biomedical community repositories listed by the NIDDK Information Network (dkNET.org). The evaluation was performed through inspection of documentation and interaction with the sites. Overall, there was little explicit acknowledgement of any of the OFCT principles, although the majority of repositories provided at least some support for their tenets.

2019 ◽  
Vol 21 (6) ◽  
pp. 1937-1953 ◽  
Author(s):  
Jussi Paananen ◽  
Vittorio Fortino

Abstract The drug discovery process starts with identification of a disease-modifying target. This critical step traditionally begins with manual investigation of scientific literature and biomedical databases to gather evidence linking molecular target to disease, and to evaluate the efficacy, safety and commercial potential of the target. The high-throughput and affordability of current omics technologies, allowing quantitative measurements of many putative targets (e.g. DNA, RNA, protein, metabolite), has exponentially increased the volume of scientific data available for this arduous task. Therefore, computational platforms identifying and ranking disease-relevant targets from existing biomedical data sources, including omics databases, are needed. To date, more than 30 drug target discovery (DTD) platforms exist. They provide information-rich databases and graphical user interfaces to help scientists identify putative targets and pre-evaluate their therapeutic efficacy and potential side effects. Here we survey and compare a set of popular DTD platforms that utilize multiple data sources and omics-driven knowledge bases (either directly or indirectly) for identifying drug targets. We also provide a description of omics technologies and related data repositories which are important for DTD tasks.


2020 ◽  
Author(s):  
Danny Brooke

<p>For more than a decade, the Dataverse Project (dataverse.org) has provided an open-source platform used to build data repositories around the world. Core to its success is its hybrid development approach, which pairs a core team based at the Institute for Quantitative Social Science at Harvard University with an empowered, worldwide community contributing code, documentation, and other efforts towards open science. In addition to an overview of the platform and how to join the community, we’ll discuss recent and future efforts towards large data support, geospatial data integrations, sensitive data support, integrations with reproducibility tools, access to computation resources, and many other useful features for researchers, journals, and institutions. </p>


2021 ◽  
Vol 29 (4) ◽  
pp. 209-217
Author(s):  
Anton Boiko ◽  
Olha Kramarenko ◽  
Sardar Shabanov

Purpose: To determine the current state of development of open science in the paradigm of open research data in Ukraine and the world, as well as to analyze the representation of Ukraine in the world research space, in terms of research data exchange. Design / Method / Research Approach: Methods of synthesis, logical and comparative analysis used to determine the dynamics of the number of research data journals and data files in the world, as well as to quantify the share of research data repositories in Ukraine and the world. Trend and bibliometric analysis were used to determine the share of publications with their open primary data; analysis of their thematic structures; identification of the main scientific clusters of such publications; research of geographic indicators and share of publications by research institutions. Findings: The study found a tendency to increase both the number of data logs and data files in Dryad (open data repository). The results of the analysis of the share of data repositories indexed in re3data (register of research data repositories) show that 51% of the total number are repositories of data from European countries, with Germany leading with 460 repositories, followed by the United Kingdom (302 repositories) and France (116 repositories). Ukraine has only 2 data repositories indexed in re3data. The trend of relevance of data exchange is confirmed by the increase of publications with datasets for the last 10 years (2011-2020) in 5 times. Research institutions and universities are the main sources of research data, which are mainly focused on the fields of knowledge in chemistry (23.3%); biochemistry, genetics and molecular biology (13.8%); medicine (12.9%). An analysis of the latest thematic groups formed on the basis of publications with datasets shows that there is a significant correlation between publications with open source data and COVID-19 studies. More than 50% of publications with datasets both in Ukraine and around the world are aimed at achieving the goal of SDG 3 Good Health. Theoretical Implications: It is substantiated that in Ukraine there is a need to implement specific tactical and strategic plans for open science and open access to research data. Practical Implications: The results of the study can be used to support decision-making in the management of research data at the macro and micro levels. Future Research: It should be noted that the righteous bibliometric analysis of the state of the dissemination of data underlying the research results did not include the assessment of quality indicators and compliance with the FAIR principles, because accessibility and reusability are fundamental components of open science, which may be an area for further research. Moreover, it is advisable to investigate the degree of influence of the disclosure of the data underlying the research result on economic indicators, as well as indicators of ratings of higher education, etc. Research Limitations: Since publications with datasets in Scopus-indexed journals became the information base of the analysis for our study, it can be assumed that the dataset did not include publications with datasets published in editions that the Scopus bibliographic database does not cover. Paper type: Theoretical


Author(s):  
Ingrid Dillo ◽  
Lisa De Leeuw

Open data and data management policies that call for the long-term storage and accessibility of data are becoming more and more commonplace in the research community. With it the need for trustworthy data repositories to store and disseminate data is growing. CoreTrustSeal, a community based and non-profit organisation, offers data repositories a core level certification based on the DSA-WDS Core Trustworthy Data Repositories Requirements catalogue and procedures. This universal catalogue of requirements reflects the core characteristics of trustworthy data repositories. Core certification involves an uncomplicated process whereby data repositories supply evidence that they are sustainable and trustworthy. A repository first conducts an internal self-assessment, which is then reviewed by community peers. Once the self-assessment is found adequate the CoreTrustSeal board certifies the repository with a CoreTrustSeal. The Seal is valid for a period of three years. Being a certified repository has several external and internal benefits. It for instance improves the quality and transparency of internal processes, increases awareness of and compliance with established standards, builds stakeholder confidence, enhances the reputation of the repository, and demonstrates that the repository is following good practices. It is also offering a benchmark for comparison and helps to determine the strengths and weaknesses of a repository. In the future we foresee a larger uptake through different domains, not in the least because within the European Open Science Cloud, the FAIR principles and therefore also the certification of trustworthy digital repositories holding data is becoming increasingly important. Next to that the CoreTrustSeal requirements will most probably become a European Technical standard which can be used in procurement (under review by the European Commission).


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Renata Curty

RESUMO As diretivas governamentais e institucionais em torno do compartilhamento de dados de pesquisas financiadas com dinheiro público têm impulsionado a rápida expansão de repositórios digitais de dados afim de disponibilizar esses ativos científicos para reutilização, com propósitos nem sempre antecipados, pelos pesquisadores que os produziram/coletaram. De modo contraditório, embora o argumento em torno do compartilhamento de dados seja fortemente sustentado no potencial de reúso e em suas consequentes contribuições para o avanço científico, esse tema permanece acessório às discussões em torno da ciência de dados e da ciência aberta. O presente artigo de revisão narrativa tem por objetivo lançar um olhar mais atento ao reúso de dados e explorar mais diretamente esse conceito, ao passo que propõe uma classificação inicial de cinco abordagens distintas para o reúso de dados de pesquisa (reaproveitamento, agregação, integração, metanálise e reanálise), com base em situações hipotéticas acompanhadas de casos de reúso de dados publicados na literatura científica. Também explora questões determinantes para a condição de reúso, relacionando a reusabilidade à qualidade da documentação que acompanha os dados. Oferece discussão sobre os desafios da documentação de dados, bem como algumas iniciativas e recomendações para que essas dificuldades sejam contornadas. Espera-se que os argumentos apresentados contribuam não somente para o avanço conceitual em torno do reúso e da reusabilidade de dados, mas também reverberem em ações relacionadas à documentação dos dados de modo a incrementar o potencial de reúso desses ativos científicos.Palavras-chave: Reúso de Dados; Reprodutibilidade Científica; Reusabilidade; Ciência Aberta; Dados de Pesquisa. ABSTRACT The availability of scientific assets through data repositories has been greatly increased as a result of government and institutional data sharing policies and mandates for publicly funded research, allowing data to be reused for purposes not always anticipated by primary researchers. Despite the fact that the argument favoring data sharing is strongly grounded in the possibilities of data reuse and its contributions to scientific advancement, this subject remains unobserved in discussions about data science and open science. This paper follows a narrative review method to take a closer look at data reuse in order to better conceptualize this term, while proposing an early classification of five distinct data reuse approaches (repurposing, aggregation, integration, meta-analysis and reanalysis) based on hypothetical cases and literature examples. It also explores the determinants of what constitutes reusable data, and the relationship between data reusability and documentation quality. It presents some challenges associated with data documentation and points out some initiatives and recommendations to overcome such problems. It expects to contribute not only for the conceptual advancement around the reusability and effective reuse of the data, but also to result in initiatives related to data documentation in order to increase the reuse potential of these scientific assets.Keywords:Data Reuse; Scientific Reproducibility; Reusability; Open Science; Research Data.  


2018 ◽  
Author(s):  
Tomislav Hengl ◽  
Ichsani Wheeler ◽  
Robert A MacMillan

Using the term "Open data" has become a bit of a fashion, but using it without clear specifications is misleading i.e. it can be considered just an empty phrase. Probably even worse is the term "Open Science" — can science be NOT open at all? Are we reinventing something that should be obvious from start? This guide tries to clarify some key aspects of Open Data, Open Source Software and Crowdsourcing using examples of projects and business. It aims at helping you understand and appreciate complexity of Open Data, Open Source software and Open Access publications. It was specifically written for producers and users of environmental data, however, the guide will likely be useful to any data producers and user.


2016 ◽  
Vol 34 (2) ◽  
pp. 113-121 ◽  
Author(s):  
John I. Ogungbeni ◽  
Amaka R. Obiamalu ◽  
Samuel Ssemambo ◽  
Charles M. Bazibu

This study investigates the roles of academic libraries in propagating Open Science. The study is a qualitative survey based on literature review. Various definitions of open science from different scholars and schools of thought were examined. Research articles on the effects of open science on research and the place of academic libraries in scientific research were reviewed. Open science enhances collaborations and sharing of resources among researchers. Metadata related activities are more prevalent due to open science. Open science has increased the relevance of science to our environment and world issues like privacy and the rightful author of scientific data are still some of the challenges facing open science. Academic libraries continue to take steps to be involved as key players in the propagation of open science through advocacy, building of institutional data repositories and serving as hubs for scientific collaboration among others. Academic libraries have to do more in the area of advocacy and provision of data repositories.


2021 ◽  
Vol 4 ◽  
Author(s):  
Musa Abdulkareem ◽  
Steffen E. Petersen

COVID-19 has created enormous suffering, affecting lives, and causing deaths. The ease with which this type of coronavirus can spread has exposed weaknesses of many healthcare systems around the world. Since its emergence, many governments, research communities, commercial enterprises, and other institutions and stakeholders around the world have been fighting in various ways to curb the spread of the disease. Science and technology have helped in the implementation of policies of many governments that are directed toward mitigating the impacts of the pandemic and in diagnosing and providing care for the disease. Recent technological tools, artificial intelligence (AI) tools in particular, have also been explored to track the spread of the coronavirus, identify patients with high mortality risk and diagnose patients for the disease. In this paper, areas where AI techniques are being used in the detection, diagnosis and epidemiological predictions, forecasting and social control for combating COVID-19 are discussed, highlighting areas of successful applications and underscoring issues that need to be addressed to achieve significant progress in battling COVID-19 and future pandemics. Several AI systems have been developed for diagnosing COVID-19 using medical imaging modalities such as chest CT and X-ray images. These AI systems mainly differ in their choices of the algorithms for image segmentation, classification and disease diagnosis. Other AI-based systems have focused on predicting mortality rate, long-term patient hospitalization and patient outcomes for COVID-19. AI has huge potential in the battle against the COVID-19 pandemic but successful practical deployments of these AI-based tools have so far been limited due to challenges such as limited data accessibility, the need for external evaluation of AI models, the lack of awareness of AI experts of the regulatory landscape governing the deployment of AI tools in healthcare, the need for clinicians and other experts to work with AI experts in a multidisciplinary context and the need to address public concerns over data collection, privacy, and protection. Having a dedicated team with expertise in medical data collection, privacy, access and sharing, using federated learning whereby AI scientists hand over training algorithms to the healthcare institutions to train models locally, and taking full advantage of biomedical data stored in biobanks can alleviate some of problems posed by these challenges. Addressing these challenges will ultimately accelerate the translation of AI research into practical and useful solutions for combating pandemics.


2020 ◽  
Author(s):  
Natalia Sergeyeva ◽  
Alexei Gvishiani ◽  
Anatoly Soloviev ◽  
Lyudmila Zabarinskaya ◽  
Tamara Krylova ◽  
...  

Abstract. K index is one of the oldest universal indices of geomagnetic activity, introduced in 1938 by Julius Bartels, that is still being widely used. Up to the present day, long-term timeseries of homogeneous K index records have been accumulated at data repositories all over the world. The multidecadal practice of its application makes it an indispensable source of information for retrospective analysis of solar-terrestrial interaction for nearly eight Solar cycles. Most significantly, while studying the historical geomagnetic data, K index datasheets are in certain cases far easier for automated analysis than the conventional analogue magnetograms. The presented collection includes the results of the K index determination at 41 geomagnetic observatories of the former USSR for the period from July 1957 to early 1990s. This unique collection was formed at the World Data Center for Solar-Terrestrial Physics in Moscow. The historical data, which are offered to the international scientific community, cover the second half of the 20th century and can be used for retrospective analysis and study of geomagnetic events in the past as well as for data validation or forecasting (Sergeyeva et al., 2020). The dataset is available at: https://doi.org/10.1594/PANGAEA.922233, last access: 16 September 2020.


2018 ◽  
Author(s):  
Tomislav Hengl ◽  
Ichsani Wheeler ◽  
Robert A MacMillan

Using the term "Open data" has become a bit of a fashion, but using it without clear specifications is misleading i.e. it can be considered just an empty phrase. Probably even worse is the term "Open Science" — can science be NOT open at all? Are we reinventing something that should be obvious from start? This guide tries to clarify some key aspects of Open Data, Open Source Software and Crowdsourcing using examples of projects and business. It aims at helping you understand and appreciate complexity of Open Data, Open Source software and Open Access publications. It was specifically written for producers and users of environmental data, however, the guide will likely be useful to any data producers and user.


Sign in / Sign up

Export Citation Format

Share Document