Scan Once, Analyse Many: Using Large Open-Access Neuroimaging Datasets to Understand the Brain

Neuroinformatics ◽

10.1007/s12021-021-09519-6 ◽

2021 ◽

Author(s):

Christopher R. Madan

Keyword(s):

Large Scale ◽

Use Cases ◽

Imaging Data ◽

Replication Sample ◽

Methods Development ◽

Large Scale Data ◽

Secondary Analyses ◽

Brain Imaging Data ◽

The Brain ◽

Scale Data

AbstractWe are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility–both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.

Download Full-text

Scan Once, Analyse Many: Using large open-access neuroimaging datasets to understand the brain

10.31234/osf.io/yrd27 ◽

2020 ◽

Author(s):

Christopher R Madan

Keyword(s):

Large Scale ◽

Use Cases ◽

Imaging Data ◽

Replication Sample ◽

Methods Development ◽

Large Scale Data ◽

Secondary Analyses ◽

Brain Imaging Data ◽

The Brain ◽

Scale Data

We are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility--both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.

Download Full-text

FlywheelTools: Data Curation and Manipulation on the Flywheel Platform

Frontiers in Neuroinformatics ◽

10.3389/fninf.2021.678403 ◽

2021 ◽

Vol 15 ◽

Author(s):

Tinashe M. Tapera ◽

Matthew Cieslak ◽

Max Bertolero ◽

Azeez Adebimpe ◽

Geoffrey K. Aguirre ◽

...

Keyword(s):

Large Scale ◽

Data Curation ◽

Imaging Data ◽

Large Scale Data ◽

Neuroscience Research ◽

Neuroimaging Data ◽

Brain Imaging Data ◽

Database Platform ◽

The Brain ◽

Scale Data

The recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is the de facto standard for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are limited on Flywheel. To address these challenges, we developed “FlywheelTools,” a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.

Download Full-text

FlywheelTools: Data Curation and Manipulation on the Flywheel Platform

10.1101/2021.03.12.434998 ◽

2021 ◽

Author(s):

Tinashe M. Tapera ◽

Matthew Cieslak ◽

Max Bertolero ◽

Azeez Adebimpe ◽

Geoffrey K. Aguirre ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Data Curation ◽

Imaging Data ◽

Large Scale Data ◽

Neuroscience Research ◽

Neuroimaging Data ◽

Brain Imaging Data ◽

Database Platform ◽

The Brain

ABSTRACTThe recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is a data storage specification for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are not designed for use on cloud-based systems such as Flywheel. To address these challenges, we developed “FlywheelTools”, a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.

Download Full-text

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

GigaScience ◽

10.1093/gigascience/giaa133 ◽

2020 ◽

Vol 9 (12) ◽

Author(s):

Ariel Rokem ◽

Kendrick Kay

Keyword(s):

Ridge Regression ◽

Large Scale ◽

Imaging Data ◽

Regularization Technique ◽

Large Scale Data ◽

Novel Approach ◽

Manual Exploration ◽

L2 Norm ◽

Software Implementations ◽

Brain Imaging Data

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

Download Full-text

COINSTAC: Decentralizing the future of brain imaging analysis

F1000Research ◽

10.12688/f1000research.12353.1 ◽

2017 ◽

Vol 6 ◽

pp. 1512 ◽

Cited By ~ 8

Author(s):

Jing Ming ◽

Eric Verner ◽

Anand Sarwate ◽

Ross Kelly ◽

Cory Reed ◽

...

Keyword(s):

Brain Imaging ◽

Data Sharing ◽

Large Scale ◽

Data Transfer ◽

Imaging Data ◽

Sensitive Data ◽

Large Scale Data ◽

Decentralized Algorithms ◽

Neuroimaging Data ◽

Brain Imaging Data

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.

Download Full-text

Common Data Elements, Scalable Data Management Infrastructure, and Analytics Workflows for Large-Scale Neuroimaging Studies

Frontiers in Psychiatry ◽

10.3389/fpsyt.2021.682495 ◽

2021 ◽

Vol 12 ◽

Author(s):

Rayus Kuplicki ◽

James Touthang ◽

Obada Al Zoubi ◽

Ahmad Mayeli ◽

Masaya Misaki ◽

...

Keyword(s):

Data Management ◽

Large Scale ◽

Laboratory Data ◽

Imaging Data ◽

Common Data Elements ◽

Level Data ◽

Reproducible Analysis ◽

Brain Imaging Data ◽

Data Elements ◽

The Brain

Neuroscience studies require considerable bioinformatic support and expertise. Numerous high-dimensional and multimodal datasets must be preprocessed and integrated to create robust and reproducible analysis pipelines. We describe a common data elements and scalable data management infrastructure that allows multiple analytics workflows to facilitate preprocessing, analysis and sharing of large-scale multi-level data. The process uses the Brain Imaging Data Structure (BIDS) format and supports MRI, fMRI, EEG, clinical, and laboratory data. The infrastructure provides support for other datasets such as Fitbit and flexibility for developers to customize the integration of new types of data. Exemplar results from 200+ participants and 11 different pipelines demonstrate the utility of the infrastructure.

Download Full-text

Secondary Analyses With Large-Scale Data in Deaf Education Research

10.1093/oso/9780190455651.003.0006 ◽

2017 ◽

Author(s):

Carrie Lou Garberoglio

Keyword(s):

Large Scale ◽

Deaf Education ◽

Data Sets ◽

Deaf Students ◽

Large Scale Data ◽

Systemic Factors ◽

Secondary Analyses ◽

Successes And Challenges ◽

Scale Data ◽

Large Scale Data Sets

This chapter discusses how secondary analyses conducted with large-scale federal data sets offer a way of capturing national samples of the diverse population of deaf students, as well as important features that need to be considered when generalizing findings to practice. The author’s work with federal large-scale data sets has largely focused on an exploration of individual and systemic factors that influence postsecondary outcomes for deaf individuals. Large-scale data sets offer unique opportunities to efficiently test hypotheses and empirically address long-standing assumptions in the field through the use of large sample sizes that bring researchers closer to true representations of the heterogeneity in the Deaf community. Specific examples are shared that highlight some successes and challenges in this approach, and how researchers can best utilize large-scale data sets to conduct secondary analyses in their own work with deaf populations.

Download Full-text

Opportunities for Understanding MS Mechanisms and Progression With MRI Using Large-Scale Data Sharing and Artificial Intelligence

Neurology ◽

10.1212/wnl.0000000000012884 ◽

2021 ◽

pp. 10.1212/WNL.0000000000012884

Author(s):

Hugo Vrenken ◽

Mark Jenkinson ◽

Dzung Pham ◽

Charles R.G. Guttmann ◽

Deborah Pareto ◽

...

Keyword(s):

Artificial Intelligence ◽

Image Analysis ◽

Data Sharing ◽

Large Scale ◽

Personal Data ◽

Human Observer ◽

Imaging Data ◽

Large Scale Data ◽

Scale Data

Multiple sclerosis (MS) patients have heterogeneous clinical presentations, symptoms and progression over time, making MS difficult to assess and comprehend in vivo. The combination of large-scale data-sharing and artificial intelligence creates new opportunities for monitoring and understanding MS using magnetic resonance imaging (MRI).First, development of validated MS-specific image analysis methods can be boosted by verified reference, test and benchmark imaging data. Using detailed expert annotations, artificial intelligence algorithms can be trained on such MS-specific data. Second, understanding disease processes could be greatly advanced through shared data of large MS cohorts with clinical, demographic and treatment information. Relevant patterns in such data that may be imperceptible to a human observer could be detected through artificial intelligence techniques. This applies from image analysis (lesions, atrophy or functional network changes) to large multi-domain datasets (imaging, cognition, clinical disability, genetics, etc.).After reviewing data-sharing and artificial intelligence, this paper highlights three areas that offer strong opportunities for making advances in the next few years: crowdsourcing, personal data protection, and organized analysis challenges. Difficulties as well as specific recommendations to overcome them are discussed, in order to best leverage data sharing and artificial intelligence to improve image analysis, imaging and the understanding of MS.

Download Full-text

TRANSPARENCY AND REPRODUCIBILITY IN THE NEUROIMAGING OF AGING

Innovation in Aging ◽

10.1093/geroni/igz038.088 ◽

2019 ◽

Vol 3 (Supplement_1) ◽

pp. S23-S24

Author(s):

Kendra L Seaman

Keyword(s):

Large Scale ◽

Meta Analysis ◽

Social Science Research ◽

Science Research ◽

Data Sets ◽

Imaging Data ◽

Large Scale Data ◽

Neuroimaging Data ◽

Brain Imaging Data

Abstract In concert with broader efforts to increase the reliability of social science research, there are several efforts to increase transparency and reproducibility in neuroimaging. The large-scale nature of neuroimaging data and constantly evolving analysis tools can make transparency challenging. I will describe emerging tools used to document, organize, and share behavioral and neuroimaging data. These tools include: (1) the preregistration of neuroimaging data sets which increases openness and protects researchers from suspicions of p-hacking, (2) the conversion of neuroimaging data into a standardized format (Brain Imaging Data Structure: BIDS) that enables standardized scripts to process and share neuroimaging data, and (3) the sharing of final neuroimaging results on Neurovault which allows the community to do rapid meta-analysis. Using these tools improves workflows within labs, improves the overall quality of our science and provides a potential model for other disciplines using large-scale data.

Download Full-text

Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning

10.1101/2021.03.01.431313 ◽

2021 ◽

Author(s):

Noah F. Greenwald ◽

Geneva Miller ◽

Erick Moen ◽

Alex Kong ◽

Adam Kagel ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Cell Segmentation ◽

Tissue Imaging ◽

Imaging Data ◽

Whole Cell ◽

Data Annotation ◽

Large Scale Data ◽

Level Performance ◽

Scale Data

AbstractUnderstanding the spatial organization of tissues is of critical importance for both basic and translational research. While recent advances in tissue imaging are opening an exciting new window into the biology of human tissues, interpreting the data that they create is a significant computational challenge. Cell segmentation, the task of uniquely identifying each cell in an image, remains a substantial barrier for tissue imaging, as existing approaches are inaccurate or require a substantial amount of manual curation to yield useful results. Here, we addressed the problem of cell segmentation in tissue imaging data through large-scale data annotation and deep learning. We constructed TissueNet, an image dataset containing >1 million paired whole-cell and nuclear annotations for tissue images from nine organs and six imaging platforms. We created Mesmer, a deep learning-enabled segmentation algorithm trained on TissueNet that performs nuclear and whole-cell segmentation in tissue imaging data. We demonstrated that Mesmer has better speed and accuracy than previous methods, generalizes to the full diversity of tissue types and imaging platforms in TissueNet, and achieves human-level performance for whole-cell segmentation. Mesmer enabled the automated extraction of key cellular features, such as subcellular localization of protein signal, which was challenging with previous approaches. We further showed that Mesmer could be adapted to harness cell lineage information present in highly multiplexed datasets. We used this enhanced version to quantify cell morphology changes during human gestation. All underlying code and models are released with permissive licenses as a community resource.

Download Full-text