The Spectrum of Data sharing Policies in Neuroimaging Data Repositories

Mapping Intimacies ◽

10.31234/osf.io/cnuy7 ◽

2021 ◽

Author(s):

Anita Jwa ◽

Russell Poldrack

Keyword(s):

Data Sharing ◽

Data Privacy ◽

Public Investment ◽

Science Inquiry ◽

Open Science ◽

Machine Learning Techniques ◽

Data Repositories ◽

Learning Techniques ◽

Neuroimaging Data ◽

Regulatory Analysis

Sharing data is a scientific imperative that accelerates scientific discoveries, reinforces open science inquiry, and allows for efficient use of public investment and research resources. Considering these benefits, data sharing has been widely promoted in diverse fields and neuroscience has been no exception to this movement. For all its promise, however, the sharing of human neuroimaging data raises critical ethical and legal issues, such as data privacy. Recently, the heightened risks to data privacy posed by the exponential development in artificial intelligence and machine learning techniques has made data sharing more challenging; the regulatory landscape around data sharing has also been evolving rapidly. Here we present an in-depth ethical and regulatory analysis that will examine how neuroimaging data are currently shared against the backdrop of the relevant regulations and policies and how advanced software tools and algorithms might undermine subjects’ privacy in neuroimaging data sharing. This analysis will inform researchers on responsible practice of neuroimaging data sharing and shed light on a regulatory framework to provide adequate protection of neuroimaging data while maximizing the benefits of data sharing.

Download Full-text

Abordagens de reúso e a questão da reusabilidade dos dados científicos | Approaches for data reuse and the issue of scientific data reusability

Liinc em Revista ◽

10.18617/liinc.v15i2.4777 ◽

2019 ◽

Vol 15 (2) ◽

Author(s):

Renata Curty

Keyword(s):

Data Sharing ◽

Data Science ◽

Meta Analysis ◽

Science Research ◽

Open Science ◽

Scientific Data ◽

Data Reuse ◽

Data Repositories ◽

Documentation Quality ◽

Data Documentation

RESUMO As diretivas governamentais e institucionais em torno do compartilhamento de dados de pesquisas financiadas com dinheiro público têm impulsionado a rápida expansão de repositórios digitais de dados afim de disponibilizar esses ativos científicos para reutilização, com propósitos nem sempre antecipados, pelos pesquisadores que os produziram/coletaram. De modo contraditório, embora o argumento em torno do compartilhamento de dados seja fortemente sustentado no potencial de reúso e em suas consequentes contribuições para o avanço científico, esse tema permanece acessório às discussões em torno da ciência de dados e da ciência aberta. O presente artigo de revisão narrativa tem por objetivo lançar um olhar mais atento ao reúso de dados e explorar mais diretamente esse conceito, ao passo que propõe uma classificação inicial de cinco abordagens distintas para o reúso de dados de pesquisa (reaproveitamento, agregação, integração, metanálise e reanálise), com base em situações hipotéticas acompanhadas de casos de reúso de dados publicados na literatura científica. Também explora questões determinantes para a condição de reúso, relacionando a reusabilidade à qualidade da documentação que acompanha os dados. Oferece discussão sobre os desafios da documentação de dados, bem como algumas iniciativas e recomendações para que essas dificuldades sejam contornadas. Espera-se que os argumentos apresentados contribuam não somente para o avanço conceitual em torno do reúso e da reusabilidade de dados, mas também reverberem em ações relacionadas à documentação dos dados de modo a incrementar o potencial de reúso desses ativos científicos.Palavras-chave: Reúso de Dados; Reprodutibilidade Científica; Reusabilidade; Ciência Aberta; Dados de Pesquisa. ABSTRACT The availability of scientific assets through data repositories has been greatly increased as a result of government and institutional data sharing policies and mandates for publicly funded research, allowing data to be reused for purposes not always anticipated by primary researchers. Despite the fact that the argument favoring data sharing is strongly grounded in the possibilities of data reuse and its contributions to scientific advancement, this subject remains unobserved in discussions about data science and open science. This paper follows a narrative review method to take a closer look at data reuse in order to better conceptualize this term, while proposing an early classification of five distinct data reuse approaches (repurposing, aggregation, integration, meta-analysis and reanalysis) based on hypothetical cases and literature examples. It also explores the determinants of what constitutes reusable data, and the relationship between data reusability and documentation quality. It presents some challenges associated with data documentation and points out some initiatives and recommendations to overcome such problems. It expects to contribute not only for the conceptual advancement around the reusability and effective reuse of the data, but also to result in initiatives related to data documentation in order to increase the reuse potential of these scientific assets.Keywords:Data Reuse; Scientific Reproducibility; Reusability; Open Science; Research Data.

Download Full-text

Adolescent Brain Cognitive Development (ABCD) Community MRI Collection and Utilities

10.1101/2021.07.09.451638 ◽

2021 ◽

Author(s):

Eric Feczko ◽

Greg Conan ◽

Scott Marek ◽

Brenden Tervo-Clemens ◽

Michaela Cordova ◽

...

Keyword(s):

Mental Health ◽

Cognitive Development ◽

Data Storage ◽

Data Privacy ◽

Scientific Progress ◽

Open Science ◽

Population Based ◽

Mental Health Research ◽

Data Set ◽

Neuroimaging Data

The Adolescent Brain Cognitive Development Study (ABCD), a 10 year longitudinal neuroimaging study of the largest population based and demographically distributed cohort of 9-10 year olds (N=11,877), was designed to overcome reproducibility limitations of prior child mental health studies. Besides the fantastic wealth of research opportunities, the extremely large size of the ABCD data set also creates enormous data storage, processing, and analysis challenges for researchers. To ensure data privacy and safety, researchers are not currently able to share neuroimaging data derivatives through the central repository at the National Data Archive (NDA). However, sharing derived data amongst researchers laterally can powerfully accelerate scientific progress, to ensure the maximum public benefit is derived from the ABCD study. To simultaneously promote collaboration and data safety, we developed the ABCD-BIDS Community Collection (ABCC), which includes both curated processed data and software utilities for further analyses. The ABCC also enables researchers to upload their own custom-processed versions of ABCD data and derivatives for sharing with the research community. This NeuroResource is meant to serve as the companion guide for the ABCC. In section we describe the ABCC. Section II highlights ABCC utilities that help researchers access, share, and analyze ABCD data, while section III provides two exemplar reproducibility analyses using ABCC utilities. We hope that adoption of the ABCC's data-safe, open-science framework will boost access and reproducibility, thus facilitating progress in child and adolescent mental health research.

Download Full-text

A survey of researchers' needs and priorities for data sharing

10.31219/osf.io/njr5u ◽

2021 ◽

Author(s):

Iain Hrynaszkiewicz ◽

James Harney ◽

Lauren Cadwallader

Keyword(s):

Data Sharing ◽

Research Impact ◽

Open Science ◽

Research Data ◽

Data Reuse ◽

Data Availability ◽

Data Repositories ◽

Use Of Data ◽

Share Data ◽

Do So

PLOS has long supported Open Science. One of the ways in which we do so is via our stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data. In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data. In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 728 completed and 667 partial responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data. There may however be opportunities - unmet researcher needs - in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.

Download Full-text

The Role of Medication Data to Enhance the Prediction of Alzheimer’s Progression Using Machine Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/8439655 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shaker El-Sappagh ◽

Tamer Abuhmed ◽

Bader Alouffi ◽

Radhya Sahal ◽

Naglaa Abdelhade ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

K Nearest Neighbor ◽

Multimodal Data ◽

Learning Techniques ◽

Early Progression ◽

Neuroimaging Data

Early detection of Alzheimer’s disease (AD) progression is crucial for proper disease management. Most studies concentrate on neuroimaging data analysis of baseline visits only. They ignore the fact that AD is a chronic disease and patient’s data are naturally longitudinal. In addition, there are no studies that examine the effect of dementia medicines on the behavior of the disease. In this paper, we propose a machine learning-based architecture for early progression detection of AD based on multimodal data of AD drugs and cognitive scores data. We compare the performance of five popular machine learning techniques including support vector machine, random forest, logistic regression, decision tree, and K-nearest neighbor to predict AD progression after 2.5 years. Extensive experiments are performed using an ADNI dataset of 1036 subjects. The cross-validation performance of most algorithms has been improved by fusing the drugs and cognitive scores data. The results indicate the important role of patient’s taken drugs on the progression of AD disease.

Download Full-text

Theoretical Approaches of Machine Learning to Schizophrenia

Engineering International ◽

10.18034/ei.v6i2.568 ◽

2018 ◽

Vol 6 (2) ◽

pp. 155-168 ◽

Cited By ~ 2

Author(s):

Naresh Babu Bynagari ◽

Takudzwa Fadziso

Keyword(s):

Machine Learning ◽

Disease Diagnosis ◽

Machine Learning Techniques ◽

Support Vector ◽

Healthy Controls ◽

Theoretical Approaches ◽

Automatic Categorization ◽

Learning Techniques ◽

Key Characteristics ◽

Neuroimaging Data

Machine learning techniques have been successfully used to analyze neuroimaging data in the context of disease diagnosis in recent years. In this study, we present an overview of contemporary support vector machine-based methods developed and used in psychiatric neuroimaging for schizophrenia research. We focus in particular on our group's algorithms, which have been used to categorize schizophrenia patients and healthy controls, and compare their accuracy findings to those of other recently published studies. First, we'll go over some basic pattern recognition and machine learning terms. Then, for each study, we describe and discuss it independently, emphasizing the key characteristics that distinguish each approach. Finally, conclusions are reached as a result of comparing the data obtained using the various methodologies presented to determine how beneficial automatic categorization systems are in understanding the molecular underpinnings of schizophrenia. The primary implications of applying these approaches in clinical practice are then discussed.

Download Full-text

Is There a Social Life in Open Data? The Case of Open Data Practices in Educational Technology Research

Publications ◽

10.3390/publications7010009 ◽

2019 ◽

Vol 7 (1) ◽

pp. 9 ◽

Cited By ~ 1

Author(s):

Juliana Raffaghelli ◽

Stefania Manca

Keyword(s):

Educational Technology ◽

Data Sharing ◽

Social Life ◽

Social Activity ◽

Open Data ◽

Open Science ◽

Future Research ◽

Data Repositories ◽

Research Perspectives ◽

Open Datasets

In the landscape of Open Science, Open Data (OD) plays a crucial role as data are one of the most basic components of research, despite their diverse formats across scientific disciplines. Opening up data is a recent concern for policy makers and researchers, as the basis for good Open Science practices. The common factor underlying these new practices—the relevance of promoting Open Data circulation and reuse—is mostly a social form of knowledge sharing and construction. However, while data sharing is being strongly promoted by policy making and is becoming a frequent practice in some disciplinary fields, Open Data sharing is much less developed in Social Sciences and in educational research. In this study, practices of OD publication and sharing in the field of Educational Technology are explored. The aim is to investigate Open Data sharing in a selection of Open Data repositories, as well as in the academic social network site ResearchGate. The 23 Open Datasets selected across five OD platforms were analysed in terms of (a) the metrics offered by the platforms and the affordances for social activity; (b) the type of OD published; (c) the FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles compliance; and (d) the extent of presence and related social activity on ResearchGate. The results show a very low social activity in the platforms and very few correspondences in ResearchGate that highlight a limited social life surrounding Open Datasets. Future research perspectives as well as limitations of the study are interpreted in the discussion.

Download Full-text

Bringing Code to Data: Do Not Forget Governance

Journal of Medical Internet Research ◽

10.2196/18087 ◽

2020 ◽

Vol 22 (7) ◽

pp. e18087

Author(s):

Christine Suver ◽

Adrian Thorogood ◽

Megan Doerr ◽

John Wilbanks ◽

Bartha Knoppers

Keyword(s):

Data Sharing ◽

Clinical Data ◽

Local Control ◽

Data Privacy ◽

High Performance ◽

Data Access ◽

Open Science ◽

Data Sets ◽

Regulatory Frameworks ◽

Legal Constraints

Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.

Download Full-text

Bringing Code to Data: Do Not Forget Governance (Preprint)

10.2196/preprints.18087 ◽

2020 ◽

Author(s):

Christine Suver ◽

Adrian Thorogood ◽

Megan Doerr ◽

John Wilbanks ◽

Bartha Knoppers

Keyword(s):

Data Sharing ◽

Clinical Data ◽

Local Control ◽

Data Privacy ◽

High Performance ◽

Data Access ◽

Open Science ◽

Data Sets ◽

Regulatory Frameworks ◽

Legal Constraints

UNSTRUCTURED Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context.

Download Full-text

Data Sharing in Psychology

Psychology ◽

10.1093/obo/9780199828340-0272 ◽

2021 ◽

Author(s):

Joy Kennedy

Keyword(s):

Data Sharing ◽

Scientific Progress ◽

Secondary Data ◽

Open Science ◽

Primary Data ◽

Institutional Review Boards ◽

Data Repositories ◽

Replication Studies ◽

Explicit Consent ◽

Meta Analyses

Narrowly defined, data sharing is the practice of making scientific research data available to other researchers. However, the term is often used to include a variety of open-science practices, including making data, methodology (e.g., coding scheme), analytic syntax, and other research materials available to other researchers, as well as the reuse of those resources by others. There are multiple avenues for data sharing, for example data repositories (either subscription-based or free) or direct request to the researcher. Data sharing is a fairly common practice in the life and earth sciences. Excepting a handful of longitudinal projects, psychology lacks this robust historical precedent for sharing data. In fact, in the not-so-distant past, institutional review boards typically required that data be destroyed after a preset period in order to protect participants’ privacy—and some still do. And many researchers still do not take the first step—modifying their informed consent procedures to include explicit consent to share. Although still not frequent, data sharing in psychology is becoming more common. In part, this trend is being driven by the requirements set by publications and funding agencies. For publications, data sharing is intrinsic to transparency and replication of study findings. For funders, data sharing ensures greater return on investment—that expensive and time-consuming primary data collection does not wind up sitting on a dusty shelf, but rather can be reused for secondary data analysis to answer new questions. In psychology as in other fields, technological improvements in storage capacity and computing power have also facilitated data sharing and reuse. While many psychologists are still concerned that data sharing will result in being “scooped” or found in error, there is increasing recognition of the benefits of data sharing. First, data repositories ensure that data are archived, and that the burden of preservation does not fall on the researcher or the researcher’s institution. Sharing also increases the pace of scientific progress, as researchers can build on each other’s work. For example, researchers can learn how other experts approached measurement or coding of a given outcome. In replication studies, inconsistent findings can point to contextual variations in the construct under study, rather than researcher error. And in a field where null findings are often difficult to publish, sharing allows these data to be included in meta-analyses across studies to examine broader impacts. Most importantly, data sharing enhances transparency, a key ingredient in the scientific process.

Download Full-text

Open science: The open clinical trials data journey

Clinical Trials ◽

10.1177/1740774519865512 ◽

2019 ◽

Vol 16 (5) ◽

pp. 539-546 ◽

Cited By ~ 2

Author(s):

Frank Rockhold ◽

Christina Bromley ◽

Erin K Wagner ◽

Marc Buyse

Keyword(s):

Data Sharing ◽

Data Privacy ◽

Resource Constraints ◽

Open Data ◽

Data Access ◽

Open Science ◽

Current Data ◽

Shared Data ◽

Privacy Issues ◽

And Training

Open data sharing and access has the potential to promote transparency and reproducibility in research, contribute to education and training, and prompt innovative secondary research. Yet, there are many reasons why researchers don’t share their data. These include, among others, time and resource constraints, patient data privacy issues, lack of access to appropriate funding, insufficient recognition of the data originators’ contribution, and the concern that commercial or academic competitors may benefit from analyses based on shared data. Nevertheless, there is a positive interest within and across the research and patient communities to create shared data resources. In this perspective, we will try to highlight the spectrum of “openness” and “data access” that exists at present and highlight the strengths and weakness of current data access platforms, present current examples of data sharing platforms, and propose guidelines to revise current data sharing practices going forward.

Download Full-text