SPARC Data Structure: Rationale and Design of a FAIR Standard for Biomedical Research Data

Mapping Intimacies ◽

10.1101/2021.02.10.430563 ◽

2021 ◽

Author(s):

Anita Bandrowski ◽

Jeffrey S. Grethe ◽

Anna Pilko ◽

Tom Gillespie ◽

Gabi Pine ◽

...

Keyword(s):

Data Structure ◽

Biomedical Research ◽

Large Scale ◽

Open Data ◽

Cell Types ◽

Research Data ◽

Imaging Data ◽

Organ Specific ◽

The Rich ◽

Automated Tools

AbstractThe NIH Common Fund’s Stimulating Peripheral Activity to Relieve Conditions (SPARC) initiative is a large-scale program that seeks to accelerate the development of therapeutic devices that modulate electrical activity in nerves to improve organ function. Integral to the SPARC program are the rich anatomical and functional datasets produced by investigators across the SPARC consortium that provide key details about organ-specific circuitry, including structural and functional connectivity, mapping of cell types and molecular profiling. These datasets are provided to the research community through an open data platform, the SPARC Portal. To ensure SPARC datasets are Findable, Accessible, Interoperable and Reusable (FAIR), they are all submitted to the SPARC portal following a standard scheme established by the SPARC Curation Team, called the SPARC Data Structure (SDS). Inspired by the Brain Imaging Data Structure (BIDS), the SDS has been designed to capture the large variety of data generated by SPARC investigators who are coming from all fields of biomedical research. Here we present the rationale and design of the SDS, including a description of the SPARC curation process and the automated tools for complying with the SDS, including the SDS validator and Software to Organize Data Automatically (SODA) for SPARC. The objective is to provide detailed guidelines for anyone desiring to comply with the SDS. Since the SDS are suitable for any type of biomedical research data, it can be adopted by any group desiring to follow the FAIR data principles for managing their data, even outside of the SPARC consortium. Finally, this manuscript provides a foundational framework that can be used by any organization desiring to either adapt the SDS to suit the specific needs of their data or simply desiring to design their own FAIR data sharing scheme from scratch.

Download Full-text

An Automated Open-Source Workflow for Standards-Compliant Integration of Small Animal Magnetic Resonance Imaging Data

10.1101/667980 ◽

2019 ◽

Cited By ~ 1

Author(s):

Horea-Ioan Ioanas ◽

Markus Marks ◽

Clément M. Garin ◽

Marc Dhenain ◽

Mehmet Fatih Yanik ◽

...

Keyword(s):

Magnetic Resonance Imaging ◽

Data Structure ◽

Magnetic Resonance ◽

Open Source ◽

Large Scale ◽

Free Field ◽

Small Animal ◽

Field Operator ◽

Imaging Data ◽

Resonance Imaging

AbstractLarge-scale research integration is contingent on seamless access to data in standardized formats. Standards enable researchers to understand external experiment structures, pool results, and apply homogeneous preprocessing and analysis workflows. Particularly, they facilitate these features without the need for numerous potentially confounding compatibility add-ons. In small animal magnetic resonance imaging, an overwhelming proportion of data is acquired via the ParaVision software of the Bruker Corporation. The original data structure is predominantly transparent, but fundamentally incompatible with modern pipelines. Additionally, it sources metadata from free-field operator input, which diverges strongly between laboratories and researchers. In this article we present an open-source workflow which automatically converts and reposits data from the ParaVision structure into the widely supported and openly documented Brain Imaging Data Structure (BIDS). Complementing this workflow we also present operator guidelines for appropriate ParaVision data input, and a programmatic walk-through detailing how preexisting scans with uninterpretable metadata records can easily be made compliant after the acquisition.

Download Full-text

Representing COVID-19 information in collaborative knowledge graphs: The case of Wikidata

Semantic Web ◽

10.3233/sw-210444 ◽

2021 ◽

pp. 1-32

Author(s):

Houcemeddine Turki ◽

Mohamed Ali Hadj Taieb ◽

Thomas Shafee ◽

Tiago Lubiana ◽

Dariusz Jemielniak ◽

...

Keyword(s):

Knowledge Base ◽

Query Language ◽

Open Data ◽

General Information ◽

The Public ◽

Challenges And Opportunities ◽

The Rich ◽

Description Framework ◽

Automated Tools ◽

Collaborative Knowledge

Information related to the COVID-19 pandemic ranges from biological to bibliographic, from geographical to genetic and beyond. The structure of the raw data is highly complex, so converting it to meaningful insight requires data curation, integration, extraction and visualization, the global crowdsourcing of which provides both additional challenges and opportunities. Wikidata is an interdisciplinary, multilingual, open collaborative knowledge base of more than 90 million entities connected by well over a billion relationships. It acts as a web-scale platform for broader computer-supported cooperative work and linked open data, since it can be written to and queried in multiple ways in near real time by specialists, automated tools and the public. The main query language, SPARQL, is a semantic language used to retrieve and process information from databases saved in Resource Description Framework (RDF) format. Here, we introduce four aspects of Wikidata that enable it to serve as a knowledge base for general information on the COVID-19 pandemic: its flexible data model, its multilingual features, its alignment to multiple external databases, and its multidisciplinary organization. The rich knowledge graph created for COVID-19 in Wikidata can be visualized, explored, and analyzed for purposes like decision support as well as educational and scholarly research.

Download Full-text

BIDS-iEEG: an extension to the brain imaging data structure (BIDS) specification for human intracranial electrophysiology

10.31234/osf.io/r7vc2 ◽

2018 ◽

Author(s):

Christopher Holdgraf ◽

Stefan Appelhoff ◽

Stephan Bickel ◽

Kristofer Bouchard ◽

Sasha D'Ambrosio ◽

...

Keyword(s):

Data Structure ◽

Human Brain ◽

Brain Imaging ◽

Common Ground ◽

Data Transfer ◽

Open Data ◽

Imaging Data ◽

Data Repositories ◽

Brain Imaging Data ◽

The Brain

Intracranial electroencephalography (iEEG) data offer a unique combination of high spatial and temporal resolution measures of the living human brain. However, data collection is limited to highly specialized clinical environments. To improve internal (re)use and external sharing of these unique data, we present a structure for storing and sharing iEEG data: BIDS-iEEG, an extension of the Brain Imaging Data Structure (BIDS) specification, along with freely available examples and a bids-starter-kit. BIDS is a framework for organizing and documenting data and metadata with the aim to make datasets more transparent and reusable and to improve reproducibility of research. It is a community-driven specification with an inclusive decision-making process. As an extension of the BIDS specification, BIDS-iEEG facilitates integration with other modalities such as fMRI, MEG, and EEG. As the BIDS-iEEG extension has received input from many iEEG researchers, it provides a common ground for data transfer within labs, between labs, and in open-data repositories. It will facilitate reproducible analyses across datasets, experiments, and recording sites, allowing scientists to answer more complex questions about the human brain. Finally, the cross-modal nature of BIDS will enable efficient consolidation of data from multiple sites for addressing questions about generalized brain function.

Download Full-text

Integrated cytometry with machine learning applied to high-content imaging of human kidney tissue for in-situ cell classification and neighborhood analysis

10.1101/2021.12.27.474025 ◽

2021 ◽

Author(s):

Seth Winfree ◽

Andrew T McNutt ◽

Suraj Khochare ◽

Tyler J Borgard ◽

Daria Barwinska ◽

...

Keyword(s):

Machine Learning ◽

Single Cell ◽

Large Scale ◽

Spatial Organization ◽

Spatial Association ◽

Kidney Tissue ◽

Human Kidney ◽

Cell Types ◽

Imaging Data ◽

High Content Imaging

The human kidney is a complex organ with various cell types that are intricately organized to perform key physiological functions and maintain homeostasis. New imaging modalities such as mesoscale and highly multiplexed fluorescence microscopy are increasingly applied to human kidney tissue to create single cell resolution datasets that are both spatially large and multi-dimensional. These single cell resolution high-content imaging datasets have a great potential to uncover the complex spatial organization and cellular make-up of the human kidney. Tissue cytometry is a novel approach used for quantitative analysis of imaging data, but the scale and complexity of such datasets pose unique challenges for processing and analysis. We have developed the Volumetric Tissue Exploration and Analysis (VTEA) software, a unique tool that integrates image processing, segmentation and interactive cytometry analysis into a single framework on desktop computers. Supported by an extensible and open-source framework, VTEA's integrated pipeline now includes enhanced analytical tools, such as machine learning, data visualization, and neighborhood analyses for hyperdimensional large-scale imaging datasets. These novel capabilities enable the analysis of mesoscale two and three-dimensional multiplexed human kidney imaging datasets (such as CODEX and 3D confocal multiplexed fluorescence imaging). We demonstrate the utility of this approach in identifying cell subtypes in the kidney based on labels, spatial association and their microenvironment or neighborhood membership. VTEA provides integrated and intuitive approach to decipher the cellular and spatial complexity of the human kidney and complement other transcriptomics and epigenetic efforts to define the landscape of kidney cell types.

Download Full-text

Contribution of Ionotropic Glutamatergic Receptors to Excitability and Attentional Signals in Macaque Frontal Eye Field

Cerebral Cortex ◽

10.1093/cercor/bhab007 ◽

2021 ◽

Author(s):

Miguel Dasilva ◽

Christian Brandt ◽

Marc Alwin Gieselmann ◽

Claudia Distler ◽

Alexander Thiele

Keyword(s):

Attentional Control ◽

Large Scale ◽

Neuronal Excitability ◽

Cell Types ◽

Receptor Activation ◽

Frontal Eye Field ◽

Control Signals ◽

Cognitive Operations ◽

Eye Field ◽

Large Scale Networks

Abstract Top-down attention, controlled by frontal cortical areas, is a key component of cognitive operations. How different neurotransmitters and neuromodulators flexibly change the cellular and network interactions with attention demands remains poorly understood. While acetylcholine and dopamine are critically involved, glutamatergic receptors have been proposed to play important roles. To understand their contribution to attentional signals, we investigated how ionotropic glutamatergic receptors in the frontal eye field (FEF) of male macaques contribute to neuronal excitability and attentional control signals in different cell types. Broad-spiking and narrow-spiking cells both required N-methyl-D-aspartic acid and α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid receptor activation for normal excitability, thereby affecting ongoing or stimulus-driven activity. However, attentional control signals were not dependent on either glutamatergic receptor type in broad- or narrow-spiking cells. A further subdivision of cell types into different functional types using cluster-analysis based on spike waveforms and spiking characteristics did not change the conclusions. This can be explained by a model where local blockade of specific ionotropic receptors is compensated by cell embedding in large-scale networks. It sets the glutamatergic system apart from the cholinergic system in FEF and demonstrates that a reduction in excitability is not sufficient to induce a reduction in attentional control signals.

Download Full-text

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

GigaScience ◽

10.1093/gigascience/giaa133 ◽

2020 ◽

Vol 9 (12) ◽

Author(s):

Ariel Rokem ◽

Kendrick Kay

Keyword(s):

Ridge Regression ◽

Large Scale ◽

Imaging Data ◽

Regularization Technique ◽

Large Scale Data ◽

Novel Approach ◽

Manual Exploration ◽

L2 Norm ◽

Software Implementations ◽

Brain Imaging Data

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

Download Full-text

Transcriptional and morphological profiling of parvalbumin interneuron subpopulations in the mouse hippocampus

Nature Communications ◽

10.1038/s41467-020-20328-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Lin Que ◽

David Lukacsovich ◽

Wenshu Luo ◽

Csaba Földy

Keyword(s):

Large Scale ◽

Cell Types ◽

Rna Seq ◽

Neuronal Identity ◽

Parvalbumin Interneurons ◽

Different Types ◽

Parvalbumin Interneuron ◽

Cam Profile ◽

Developmental Domains

AbstractThe diversity reflected by >100 different neural cell types fundamentally contributes to brain function and a central idea is that neuronal identity can be inferred from genetic information. Recent large-scale transcriptomic assays seem to confirm this hypothesis, but a lack of morphological information has limited the identification of several known cell types. In this study, we used single-cell RNA-seq in morphologically identified parvalbumin interneurons (PV-INs), and studied their transcriptomic states in the morphological, physiological, and developmental domains. Overall, we find high transcriptomic similarity among PV-INs, with few genes showing divergent expression between morphologically different types. Furthermore, PV-INs show a uniform synaptic cell adhesion molecule (CAM) profile, suggesting that CAM expression in mature PV cells does not reflect wiring specificity after development. Together, our results suggest that while PV-INs differ in anatomy and in vivo activity, their continuous transcriptomic and homogenous biophysical landscapes are not predictive of these distinct identities.

Download Full-text

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Epidemiologia ◽

10.3390/epidemiologia2030024 ◽

2021 ◽

Vol 2 (3) ◽

pp. 315-324

Author(s):

Juan M. Banda ◽

Ramya Tekumalla ◽

Guanyu Wang ◽

Jingyuan Yu ◽

Tuo Liu ◽

...

Keyword(s):

Large Scale ◽

Social Dynamics ◽

Additional Data ◽

Open Data ◽

Data Sources ◽

Research Projects ◽

Research Groups ◽

The World ◽

Data Source

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

Download Full-text

Mean-Field Models for EEG/MEG: From Oscillations to Waves

Brain Topography ◽

10.1007/s10548-021-00842-4 ◽

2021 ◽

Author(s):

Áine Byrne ◽

James Ross ◽

Rachel Nicks ◽

Stephen Coombes

Keyword(s):

Gap Junction ◽

Large Scale ◽

Mean Field ◽

Coarse Grained ◽

Neural Mass Model ◽

Neuron Network ◽

Mass Model ◽

Mean Field Model ◽

Neural Mass ◽

The Rich

AbstractNeural mass models have been used since the 1970s to model the coarse-grained activity of large populations of neurons. They have proven especially fruitful for understanding brain rhythms. However, although motivated by neurobiological considerations they are phenomenological in nature, and cannot hope to recreate some of the rich repertoire of responses seen in real neuronal tissue. Here we consider a simple spiking neuron network model that has recently been shown to admit an exact mean-field description for both synaptic and gap-junction interactions. The mean-field model takes a similar form to a standard neural mass model, with an additional dynamical equation to describe the evolution of within-population synchrony. As well as reviewing the origins of this next generation mass model we discuss its extension to describe an idealised spatially extended planar cortex. To emphasise the usefulness of this model for EEG/MEG modelling we show how it can be used to uncover the role of local gap-junction coupling in shaping large scale synaptic waves.

Download Full-text

Building Damage Detection Using U-Net with Attention Mechanism from Pre- and Post-Disaster Remote Sensing Datasets

Remote Sensing ◽

10.3390/rs13050905 ◽

2021 ◽

Vol 13 (5) ◽

pp. 905

Author(s):

Chuyi Wu ◽

Feng Zhang ◽

Junshi Xia ◽

Yichen Xu ◽

Guoqing Li ◽

...

Keyword(s):

Damage Assessment ◽

Large Scale ◽

Binary Classification ◽

Open Data ◽

Building Damage ◽

Attention Mechanism ◽

Large Scale Dataset ◽

Data Program ◽

The Impact ◽

Post Disaster

The building damage status is vital to plan rescue and reconstruction after a disaster and is also hard to detect and judge its level. Most existing studies focus on binary classification, and the attention of the model is distracted. In this study, we proposed a Siamese neural network that can localize and classify damaged buildings at one time. The main parts of this network are a variety of attention U-Nets using different backbones. The attention mechanism enables the network to pay more attention to the effective features and channels, so as to reduce the impact of useless features. We train them using the xBD dataset, which is a large-scale dataset for the advancement of building damage assessment, and compare their result balanced F (F1) scores. The score demonstrates that the performance of SEresNeXt with an attention mechanism gives the best performance, with the F1 score reaching 0.787. To improve the accuracy, we fused the results and got the best overall F1 score of 0.792. To verify the transferability and robustness of the model, we selected the dataset on the Maxar Open Data Program of two recent disasters to investigate the performance. By visual comparison, the results show that our model is robust and transferable.

Download Full-text