scholarly journals Exploring barriers and solutions in advancing cross-centre population data science

Author(s):  
Kerina H Jones ◽  
Sharon M Heys ◽  
Helen Daniels ◽  
David V Ford

IntroductionIt is widely acknowledged that population health and administrative data, especially when linked at the individual level, hold great value for research. Cross-centre working between data centres providing access to such data has the potential to further increase this value by effectively expanding the data available for research. However, there is limited published information on how to address the challenges and achieve success. The aim of this paper is to explore perceived barriers and solutions to inform developments in cross-centre working across data centres. MethodsWe carried out a narrative literature review on data sharing and cross centre working. We used a mixed methods approach to assess the opinions of members of the public on cross-centre data sharing, and the views and experiences of among data centre staff connected with the UK Farr Institute for Health Informatics Research. ResultsThe literature review uncovered a myriad of practical and cultural issues. Our engagement with a public group suggested that cross-centre working involving anonymised data being moved between established centres is considered acceptable. The main themes emerging from discussions with data centre staff were dedicated resourcing, practical issues, information governance and culture. ConclusionIn seeking to advance cross-centre working between data centres, we conclude that there is a need for dedicated resourcing, indicators to recognise data reuse, collaboration to solve common issues, and balancing necessary barrier removal with incentivisation. This will require on-going commitment, engagement and an academic culture change.

2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Renata Curty

RESUMO As diretivas governamentais e institucionais em torno do compartilhamento de dados de pesquisas financiadas com dinheiro público têm impulsionado a rápida expansão de repositórios digitais de dados afim de disponibilizar esses ativos científicos para reutilização, com propósitos nem sempre antecipados, pelos pesquisadores que os produziram/coletaram. De modo contraditório, embora o argumento em torno do compartilhamento de dados seja fortemente sustentado no potencial de reúso e em suas consequentes contribuições para o avanço científico, esse tema permanece acessório às discussões em torno da ciência de dados e da ciência aberta. O presente artigo de revisão narrativa tem por objetivo lançar um olhar mais atento ao reúso de dados e explorar mais diretamente esse conceito, ao passo que propõe uma classificação inicial de cinco abordagens distintas para o reúso de dados de pesquisa (reaproveitamento, agregação, integração, metanálise e reanálise), com base em situações hipotéticas acompanhadas de casos de reúso de dados publicados na literatura científica. Também explora questões determinantes para a condição de reúso, relacionando a reusabilidade à qualidade da documentação que acompanha os dados. Oferece discussão sobre os desafios da documentação de dados, bem como algumas iniciativas e recomendações para que essas dificuldades sejam contornadas. Espera-se que os argumentos apresentados contribuam não somente para o avanço conceitual em torno do reúso e da reusabilidade de dados, mas também reverberem em ações relacionadas à documentação dos dados de modo a incrementar o potencial de reúso desses ativos científicos.Palavras-chave: Reúso de Dados; Reprodutibilidade Científica; Reusabilidade; Ciência Aberta; Dados de Pesquisa. ABSTRACT The availability of scientific assets through data repositories has been greatly increased as a result of government and institutional data sharing policies and mandates for publicly funded research, allowing data to be reused for purposes not always anticipated by primary researchers. Despite the fact that the argument favoring data sharing is strongly grounded in the possibilities of data reuse and its contributions to scientific advancement, this subject remains unobserved in discussions about data science and open science. This paper follows a narrative review method to take a closer look at data reuse in order to better conceptualize this term, while proposing an early classification of five distinct data reuse approaches (repurposing, aggregation, integration, meta-analysis and reanalysis) based on hypothetical cases and literature examples. It also explores the determinants of what constitutes reusable data, and the relationship between data reusability and documentation quality. It presents some challenges associated with data documentation and points out some initiatives and recommendations to overcome such problems. It expects to contribute not only for the conceptual advancement around the reusability and effective reuse of the data, but also to result in initiatives related to data documentation in order to increase the reuse potential of these scientific assets.Keywords:Data Reuse; Scientific Reproducibility; Reusability; Open Science; Research Data.  


Author(s):  
Peter Christen ◽  
Thilina Ranbaduge ◽  
Dinusha Vatsalan

Data linkage, the process of identifying records that refer to the same entities across databases, is a crucial component of Population Data Science. Data linkage has a history going back over fifty years with many different methods and techniques being developed in various disciplines including computer science, statistics, and health informatics. Data linkage researchers and practitioners are commonly only familiar with methods and techniques that have been developed or are used in their own discipline, and they often only follow research that is being published at venues in their own discipline. There is currently no single online resource that allows data linkage researchers and practitioners across different disciplines to exchange ideas, post questions, or advertise new publications, software, open positions, or upcoming conferences and workshops. This leads to a communication gap in the multi-disciplinary field of data linkage. We aim to address this gap with the DLforum, a public online discussion forum for data linkage. DLforum contains several discussion areas, including publication announcements, resources (software and data sets), information about upcoming conferences and workshops, job opportunities, and general questions related to data linkage. The forum includes a moderation process where all registered users can post content and reply to posts by other users. We anticipate that the number of users registered and the amount of content posted in the forum will show that such an online forum is of value to data linkage researchers and practitioners from different disciplines to effectively communicate and exchange their knowledge, and thus form an online community of practice. In this paper we describe the methods of developing the DLforum, its structure and content, and our plan on how to evaluate the forum. The DLforum is freely available at: https://dmm.anu.edu.au/DLforum/


ITNOW ◽  
2021 ◽  
Vol 63 (4) ◽  
pp. 18-20
Author(s):  
John Booth

Abstract John Booth MBCS, Data Centre Energy Efficiency and Sustainability Consultant at Carbon3IT, explores the detrimental trajectory of data centre energy use, against a backdrop of COP26, climate change and proposed EU directives.


10.29007/h27x ◽  
2019 ◽  
Author(s):  
Mohammed Alasmar ◽  
George Parisis

In this paper we present our work towards an evaluation platform for data centre transport protocols. We developed a simulation model for NDP1, a modern data transport protocol in data centres, a FatTree network topology and per-packet ECMP load balancing. We also developed a data centre environment that can be used to evaluate and compare data transport protocols, usch as NDP and TCP. We describe how we integrated our model with the INET Framework and present example simulations to showcase the workings of the developed framework. For that, we ran a comprehensive set of experiments and studied different components and parameters of the developed models.


2019 ◽  
Vol 15 (2) ◽  
Author(s):  
Sonia Elisa Caregnato ◽  
Samile Andrea de Souza Vanz ◽  
Caterina Groposo Pavão ◽  
Paula Caroline Jardim Schifino Passos ◽  
Eduardo Borges ◽  
...  

RESUMO O artigo apresenta análise exploratória das práticas e das percepções a respeito do acesso aberto a dados de pesquisa embasada em dados coletados por meio de survey, realizada com pesquisadores brasileiros. As 4.676 respostas obtidas demonstram que, apesar do grande interesse pelo tema, evidenciado pela prevalência de variáveis relacionadas ao compartilhamento e ao uso de dados e aos repositórios institucionais, não há clareza por parte dos sujeitos sobre os principais tópicos relacionados. Conclui-se que, apesar da maioria dos pesquisadores afirmar que compartilha dados de pesquisa, a disponibilização desses dados de forma aberta e irrestrita ainda não é amplamente aceita.Palavras-chave: Dados Abertos de Pesquisa; Compartilhamento de Dados; Reuso de Dados.ABSTRACT This article presents an exploratory analysis of the practices and perceptions regarding open access to research data based on information collected by a survey with Brazilian researchers. The 4,676 responses show that, despite the great interest in the topic, evidenced by the prevalence of variables related to data sharing and use and to institutional repositories, there is no clarity on the part of the subjects on the main related topics. We conclude that, although the majority of the researchers share research data, the availability of this data in an open and unrestricted way is not yet widely accepted.Keywords: Open Research Data; Data Sharing; Data Reuse.


2015 ◽  
Vol 10 (1) ◽  
pp. 260-267 ◽  
Author(s):  
Kevin Read ◽  
Jessica Athens ◽  
Ian Lamb ◽  
Joey Nicholson ◽  
Sushan Chin ◽  
...  

A need was identified by the Department of Population Health (DPH) for an academic medical center to facilitate research using large, externally funded datasets. Barriers identified included difficulty in accessing and working with the datasets, and a lack of knowledge about institutional licenses. A need to facilitate sharing and reuse of datasets generated by researchers at the institution (internal datasets) was also recognized. The library partnered with a researcher in the DPH to create a catalog of external datasets, which provided detailed metadata and access instructions. The catalog listed researchers at the medical center and the main campus with expertise in using these external datasets in order to facilitate research and cross-campus collaboration. Data description standards were reviewed to create a set of metadata to facilitate access to both externally generated datasets, as well as the internally generated datasets that would constitute the next phase of development of the catalog. Interviews with a range of investigators at the institution identified DPH researchers as most interested in data sharing, therefore targeted outreach to this group was undertaken. Initial outreach resulted in additional external datasets being described, new local experts volunteering, proposals for additional functionality, and interest from researchers in inclusion of their internal datasets in the catalog. Despite limited outreach, the catalog has had ~250 unique page views in the three months since it went live. The establishment of the catalog also led to partnerships with the medical center’s data management core and the main university library. The Data Catalog in its present state serves a direct user need from the Department of Population Health to describe large, externally funded datasets. The library will use this initial strong community of users to expand the catalog and include internally generated research datasets. Future expansion plans will include working with DataCore and the main university library.


2015 ◽  
Author(s):  
Peter Weiland ◽  
Ina Dehnhard

See video of the presentation.The benefits of making research data permanently accessible through data archives is widely recognized: costs can be reduced by reusing existing data, research results can be compared and validated with results from archived studies, fraud can be more easily detected, and meta-analyses can be conducted. Apart from that, authors may gain recognition and reputation for producing the datasets. Since 2003, the accredited research data center PsychData (part of the Leibniz Institute for Psychology Information in Trier, Germany) documents and archives research data from all areas of psychology and related fields. In the beginning, the main focus was on datasets that provide a high potential for reuse, e.g. longitudinal studies, large-scale cross sectional studies, or studies that were conducted during historically unique conditions. Presently, more and more journal publishers and project funding agencies require researchers to archive their data and make them accessible for the scientific community. Therefore, PsychData also has to serve this need.In this presentation we report on our experiences in operating a discipline-specific research data archive in a domain where data sharing is met with considerable resistance. We will focus on the challenges for data sharing and data reuse in psychology, e.g.large amount of domain-specific knowledge necessary for data curationhigh costs for documenting the data because of a wide range on non-standardized measuressmall teams and little established infrastructures compared with the "big data" disciplinesstudies in psychology not designed for reuse (in contrast to the social sciences)data protectionresistance to sharing dataAt the end of the presentation, we will provide a brief outlook on DataWiz, a new project funded by the German Research Foundation (DFG). In this project, tools will be developed to support researchers in documenting their data during the research phase.


2019 ◽  
Vol 107 (4) ◽  
Author(s):  
Katherine G. Akers ◽  
Kevin B. Read ◽  
Liz Amos ◽  
Lisa M. Federer ◽  
Ayaba Logan ◽  
...  

As librarians are generally advocates of open access and data sharing, it is a bit surprising that peer-reviewed journals in the field of librarianship have been slow to adopt data sharing policies. Starting October 1, 2019, the Journal of the Medical Library Association (JMLA) is taking a step forward and implementing a firm data sharing policy to increase the rigor and reproducibility of published research, enable data reuse, and promote open science. This editorial explains the data sharing policy, describes how compliance with the policy will fit into the journal’s workflow, and provides further guidance for preparing for data sharing.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 12 ◽  
Author(s):  
Stéphanie Boué ◽  
Thomas Exner ◽  
Samik Ghosh ◽  
Vincenzo Belcastro ◽  
Joh Dokler ◽  
...  

The US FDA defines modified risk tobacco products (MRTPs) as products that aim to reduce harm or the risk of tobacco-related disease associated with commercially marketed tobacco products.  Establishing a product’s potential as an MRTP requires scientific substantiation including toxicity studies and measures of disease risk relative to those of cigarette smoking.  Best practices encourage verification of the data from such studies through sharing and open standards. Building on the experience gained from the OpenTox project, a proof-of-concept database and website (INTERVALS) has been developed to share results from both in vivo inhalation studies and in vitro studies conducted by Philip Morris International R&D to assess candidate MRTPs. As datasets are often generated by diverse methods and standards, they need to be traceable, curated, and the methods used well described so that knowledge can be gained using data science principles and tools. The data-management framework described here accounts for the latest standards of data sharing and research reproducibility. Curated data and methods descriptions have been prepared in ISA-Tab format and stored in a database accessible via a search portal on the INTERVALS website. The portal allows users to browse the data by study or mechanism (e.g., inflammation, oxidative stress) and obtain information relevant to study design, methods, and the most important results. Given the successful development of the initial infrastructure, the goal is to grow this initiative and establish a public repository for 21st-century preclinical systems toxicology MRTP assessment data and results that supports open data principles.


Sign in / Sign up

Export Citation Format

Share Document