Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies

Frontiers in Psychiatry ◽

10.3389/fpsyt.2021.682495 ◽

2021 ◽

Vol 12 ◽

Author(s):

Rayus Kuplicki ◽

James Touthang ◽

Obada Al Zoubi ◽

Ahmad Mayeli ◽

Masaya Misaki ◽

...

Keyword(s):

Data Management ◽

Large Scale ◽

Laboratory Data ◽

Imaging Data ◽

Common Data Elements ◽

Level Data ◽

Reproducible Analysis ◽

Brain Imaging Data ◽

Data Elements ◽

The Brain

Neuroscience studies require considerable bioinformatic support and expertise. Numerous high-dimensional and multimodal datasets must be preprocessed and integrated to create robust and reproducible analysis pipelines. We describe a common data elements and scalable data management infrastructure that allows multiple analytics workflows to facilitate preprocessing, analysis and sharing of large-scale multi-level data. The process uses the Brain Imaging Data Structure (BIDS) format and supports MRI, fMRI, EEG, clinical, and laboratory data. The infrastructure provides support for other datasets such as Fitbit and flexibility for developers to customize the integration of new types of data. Exemplar results from 200+ participants and 11 different pipelines demonstrate the utility of the infrastructure.

Download Full-text

Common Data Elements, Scalable Data Management Infrastructure and Analytics Workflows for Large-scale Neuroimaging Studies

10.1101/2021.03.16.21253726 ◽

2021 ◽

Author(s):

Rayus Kuplicki ◽

James Touthang ◽

Obada Al Zoubi ◽

Ahmad Mayeli ◽

Masaya Misaki ◽

...

Keyword(s):

Data Management ◽

Large Scale ◽

Laboratory Data ◽

Imaging Data ◽

Common Data Elements ◽

Level Data ◽

Reproducible Analysis ◽

Brain Imaging Data ◽

Data Elements ◽

The Brain

Neuroscience studies require considerable bioinformatic support and expertise. Numerous high-dimensional and multimodal datasets must be preprocessed and integrated to create robust and reproducible analysis pipelines. We describe a common data elements and scalable data management infrastructure that allows multiple analytics workflows to facilitate preprocessing, analysis and sharing of large-scale multi-level data. The process uses the Brain Imaging Data Structure (BIDS) format and supports MRI, fMRI, EEG, clinical and laboratory data. The infrastructure provides support for other datasets such as Fitbit and flexibility for developers to customize the integration of new types of data. Exemplar results from 200+ participants and 11 different pipelines demonstrate the utility of the infrastructure.

Download Full-text

FlywheelTools: Data Curation and Manipulation on the Flywheel Platform

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.678403 ◽

2021 ◽

Vol 15 ◽

Author(s):

Tinashe M. Tapera ◽

Matthew Cieslak ◽

Max Bertolero ◽

Azeez Adebimpe ◽

Geoffrey K. Aguirre ◽

...

Keyword(s):

Large Scale ◽

Data Curation ◽

Imaging Data ◽

Large Scale Data ◽

Neuroscience Research ◽

Neuroimaging Data ◽

Brain Imaging Data ◽

Database Platform ◽

The Brain ◽

Scale Data

The recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is the de facto standard for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are limited on Flywheel. To address these challenges, we developed “FlywheelTools,” a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.

Download Full-text

Scan Once, Analyse Many: Using large open-access neuroimaging datasets to understand the brain

10.31234/osf.io/yrd27 ◽

2020 ◽

Author(s):

Christopher R Madan

Keyword(s):

Large Scale ◽

Use Cases ◽

Imaging Data ◽

Replication Sample ◽

Methods Development ◽

Large Scale Data ◽

Secondary Analyses ◽

Brain Imaging Data ◽

The Brain ◽

Scale Data

We are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility--both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.

Download Full-text

FlywheelTools: Data Curation and Manipulation on the Flywheel Platform

10.1101/2021.03.12.434998 ◽

2021 ◽

Author(s):

Tinashe M. Tapera ◽

Matthew Cieslak ◽

Max Bertolero ◽

Azeez Adebimpe ◽

Geoffrey K. Aguirre ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Data Curation ◽

Imaging Data ◽

Large Scale Data ◽

Neuroscience Research ◽

Neuroimaging Data ◽

Brain Imaging Data ◽

Database Platform ◽

The Brain

ABSTRACTThe recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is a data storage specification for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are not designed for use on cloud-based systems such as Flywheel. To address these challenges, we developed “FlywheelTools”, a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.

Download Full-text

Scan Once, Analyse Many: Using Large Open-Access Neuroimaging Datasets to Understand the Brain

Neuroinformatics ◽

10.1007/s12021-021-09519-6 ◽

2021 ◽

Author(s):

Christopher R. Madan

Keyword(s):

Large Scale ◽

Use Cases ◽

Imaging Data ◽

Replication Sample ◽

Methods Development ◽

Large Scale Data ◽

Secondary Analyses ◽

Brain Imaging Data ◽

The Brain ◽

Scale Data

AbstractWe are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility–both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.

Download Full-text

Dimensionality Reduction Methods for Brain Imaging Data Analysis

ACM Computing Surveys ◽

10.1145/3448302 ◽

2021 ◽

Vol 54 (4) ◽

pp. 1-36

Author(s):

Yunbo Tang ◽

Dan Chen ◽

Xiaoli Li

Keyword(s):

Big Data ◽

Dimensionality Reduction ◽

Brain Imaging ◽

Large Scale ◽

Past Century ◽

Complex Object ◽

Imaging Data ◽

Reduction Methods ◽

Brain Imaging Data ◽

Data Elements

The past century has witnessed the grand success of brain imaging technologies, such as electroencephalography and magnetic resonance imaging, in probing cognitive states and pathological brain dynamics for neuroscience research and neurology practices. Human brain is “the most complex object in the universe,” and brain imaging data ( BID ) are routinely of multiple/many attributes and highly non-stationary. These are determined by the nature of BID as the recordings of the evolving processes of the brain(s) under examination in various views. Driven by the increasingly high demands for precision, efficiency, and reliability in neuro-science and engineering tasks, dimensionality reduction has become a priority issue in BID analysis to handle the notoriously high dimensionality and large scale of big BID sets as well as the enormously complicated interdependencies among data elements. This has become particularly urgent and challenging in this big data era. Dimensionality reduction theories and methods manifest unrivaled potential in revealing key insights to BID via offering the low-dimensional/tiny representations/features, which may preserve critical characterizations of massive neuronal activities and brain functional and/or malfunctional states of interest. This study surveys the most salient work along this direction conforming to a 3-dimensional taxonomy with respect to (1) the scale of BID , of which the design with this consideration is important for the potential applications; (2) the order of BID , in which a higher order denotes more BID attributes manipulatable by the method; and (3) linearity , in which the method’s degree of linearity largely determines the “fidelity” in BID exploration. This study defines criteria for qualitative evaluations of these works in terms of effectiveness, interpretability, efficiency, and scalability. The classifications and evaluations based on the taxonomy provide comprehensive guides to (1) how existing research and development efforts are distributed and (2) their performance, features, and potential in influential applications especially when involving big data. In the end, this study crystallizes the open technical issues and proposes research challenges that must be solved to enable further researches in this area of great potential.

Download Full-text

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

GigaScience ◽

10.1093/gigascience/giaa133 ◽

2020 ◽

Vol 9 (12) ◽

Author(s):

Ariel Rokem ◽

Kendrick Kay

Keyword(s):

Ridge Regression ◽

Large Scale ◽

Imaging Data ◽

Regularization Technique ◽

Large Scale Data ◽

Novel Approach ◽

Manual Exploration ◽

L2 Norm ◽

Software Implementations ◽

Brain Imaging Data

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

Download Full-text

Abstract 17059: Re-Use of Clinical Trial Data From the NHLBI Data Repository (BioLINCC) for Patient-Level Meta-Analyses of Cardiovascular Outcomes: Challenges and Opportunities

Circulation ◽

10.1161/circ.138.suppl_1.17059 ◽

2018 ◽

Vol 138 (Suppl_1) ◽

Author(s):

Helena Sviglin ◽

Gauri Dandi ◽

Eileen Navarro Almario ◽

Tejas Patel ◽

Colin O Wu ◽

...

Keyword(s):

Risk Factors ◽

Meta Analysis ◽

Cardiovascular Outcomes ◽

Data Repository ◽

Common Data Elements ◽

Patient Level Data ◽

Patient Level ◽

Level Data ◽

Data Elements ◽

Meta Analyses

Introduction: An objective of the Meta-AnalyTical Interagency Group (MATIG) is to conduct patient-level meta-analyses of cardiovascular outcomes using data from publicly available repositories. We describe challenges with data re-use from a seminal trial, provide a systematic approach to identify and curate data elements for hypothesis generation, and establish stackable trials to support these analyses. Methods: We used data from the ACCORD trial to assess risk factors and their gender specific differences for the event of hospitalization or death due to heart failure (hdHF), in patients with type 2 diabetes*. We identified the data elements needed to answer the research questions, reviewed the trial protocol to verify definitions, extracted patient-level data, performed quality assessment and statistical analysis. The results showed a gender difference in the effect of intensive vs. standard glucose-lowering therapy on hdHF. To validate the findings, we sought additional trials in BioLINCC to develop a compendium for meta-analysis, and repeated these steps for each trial. Results: Challenges for reusing the ACCORD trial included access to complete patient-level data and metadata. The compendium, developed to evaluate the stackability** of data across trials, identified differences in trial designs, patient populations, study interventions, and data elements that may impact the feasibility and interpretation of meta-analysis. An example of compendium components is shown in Table 1. Conclusion: High-quality metadata facilitate re-use of trial repository data. This compendium standardizes common data elements for gender, racial and age-group specific outcome assessment in major clinical trials. It provides the framework to assess the fitness of trials for patient-level meta-analyses. Efforts are underway by MATIG to expand the compendium to include risk factors and major cardiovascular outcomes across multiple large trials for meta-analysis.

Download Full-text

Supercomputing in the Study and Stimulation of the Brain

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Methodologies and Applications of Supercomputing ◽

10.4018/978-1-7998-7156-9.ch018 ◽

2021 ◽

pp. 290-300

Author(s):

Laura Dipietro ◽

Seth Elkin-Frankston ◽

Ciro Ramos-Estebanez ◽

Timothy Wagner

Keyword(s):

High Performance ◽

Neurological Diseases ◽

Data Sets ◽

Imaging Data ◽

Building Simulations ◽

History Of ◽

Brain Imaging Data ◽

Evolution Of Science ◽

Computational Systems ◽

The Brain

The history of neuroscience has tracked with the evolution of science and technology. Today, neuroscience's trajectory is heavily dependent on computational systems and the availability of high-performance computing (HPC), which are becoming indispensable for building simulations of the brain, coping with high computational demands of analysis of brain imaging data sets, and developing treatments for neurological diseases. This chapter will briefly review the current and potential future use of supercomputers in neuroscience.

Download Full-text

High-density EEG mobile brain/body imaging data recorded during a challenging auditory gait pacing task

Scientific Data ◽

10.1038/s41597-019-0223-2 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Johanna Wagner ◽

Ramon Martinez-Cancino ◽

Arnaud Delorme ◽

Scott Makeig ◽

Teodoro Solis-Escalante ◽

...

Keyword(s):

Pressure Sensors ◽

Step Length ◽

High Density ◽

Imaging Data ◽

Joint Angles ◽

Body Imaging ◽

Cortical Dynamics ◽

Brain Imaging Data ◽

Auditory Cueing ◽

The Brain

Abstract In this report we present a mobile brain/body imaging (MoBI) dataset that allows study of source-resolved cortical dynamics supporting coordinated gait movements in a rhythmic auditory cueing paradigm. Use of an auditory pacing stimulus stream has been recommended to identify deficits and treat gait impairments in neurologic populations. Here, the rhythmic cueing paradigm required healthy young participants to walk on a treadmill (constant speed) while attempting to maintain step synchrony with an auditory pacing stream and to adapt their step length and rate to unanticipated shifts in tempo of the pacing stimuli (e.g., sudden shifts to a faster or slower tempo). High-density electroencephalography (EEG, 108 channels), surface electromyography (EMG, bilateral tibialis anterior), pressure sensors on the heel (to register timing of heel strikes), and goniometers (knee, hip, and ankle joint angles) were concurrently recorded in 20 participants. The data is provided in the Brain Imaging Data Structure (BIDS) format to promote data sharing and reuse, and allow the inclusion of the data into fully automated data analysis workflows.

Download Full-text