Decentralized Distribution-sampled Classification Models with Application to Brain Imaging

Abstract0.1backgroundIn this age of big data, large data stores allow researchers to compose robust models that are accurate and informative. In many cases, the data are stored in separate locations requiring data transfer between local sites, which can cause various practical hurdles, such as privacy concerns or heavy network load. This is especially true for medical imaging data, which can be constrained due to the health insurance portability and accountability act (HIPAA). Medical imaging datasets can also contain many thousands or millions of features, requiring heavy network load.0.2New MethodOur research expands upon current decentralized classification research by implementing a new singleshot method for both neural networks and support vector machines. Our approach is to estimate the statistical distribution of the data at each local site and pass this information to the other local sites where each site resamples from the individual distributions and trains a model on both locally available data and the resampled data.0.3ResultsWe show applications of our approach to handwritten digit classification as well as to multi-subject classification of brain imaging data collected from patients with schizophrenia and healthy controls. Overall, the results showed comparable classification accuracy to the centralized model with lower network load than multishot methods.0.4Comparison with Existing MethodsMany decentralized classifiers are multishot, requiring heavy network traffic. Our model attempts to alleviate this load while preserving prediction accuracy.0.5ConclusionsWe show that our proposed approach performs comparably to a centralized approach while minimizing network traffic compared to multishot methods.0.6HighlightsA novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approach

Download Full-text

COINSTAC: Decentralizing the future of brain imaging analysis

F1000Research ◽

10.12688/f1000research.12353.1 ◽

2017 ◽

Vol 6 ◽

pp. 1512 ◽

Cited By ~ 8

Author(s):

Jing Ming ◽

Eric Verner ◽

Anand Sarwate ◽

Ross Kelly ◽

Cory Reed ◽

...

Keyword(s):

Brain Imaging ◽

Data Sharing ◽

Large Scale ◽

Data Transfer ◽

Imaging Data ◽

Sensitive Data ◽

Large Scale Data ◽

Decentralized Algorithms ◽

Neuroimaging Data ◽

Brain Imaging Data

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.

Download Full-text

Deriving Statistical Significance Maps for Support Vector Regression Using Medical Imaging Data

2013 International Workshop on Pattern Recognition in Neuroimaging ◽

10.1109/prni.2013.13 ◽

2013 ◽

Cited By ~ 5

Author(s):

Bilwaj Gaonkar ◽

Aristeidis Sotiras ◽

Christos Davatzikos

Keyword(s):

Medical Imaging ◽

Support Vector Regression ◽

Statistical Significance ◽

Support Vector ◽

Imaging Data ◽

Medical Imaging Data

Download Full-text

Sign-consistency based variable importance for machine learning in brain imaging

10.1101/124453 ◽

2017 ◽

Author(s):

Vanessa Gómez-Verdejo ◽

Emilio Parrado-Hernández ◽

Jussi Tohka ◽

Keyword(s):

Brain Imaging ◽

Hypothesis Test ◽

Variable Importance ◽

Support Vector ◽

Single Subject ◽

Imaging Data ◽

Magnetic Resonance Imaging Data ◽

Extra Step ◽

Sign Consistency ◽

Variable Importance Measures

AbstractAn important problem that hinders the use of supervised classification algorithms for brain imaging is that the number of variables per single subject far exceeds the number of training subjects available. Deriving multivariate measures of variable importance becomes a challenge in such scenarios. This paper proposes a new measure of variable importance termed sign-consistency bagging (SCB). The SCB captures variable importance by analyzing the sign consistency of the corresponding weights in an ensemble of linear support vector machine (SVM) classifiers. Further, the SCB variable importances are enhanced by means of transductive conformal analysis. This extra step is important when the data can be assumed to be heterogeneous. Finally, the proposal of these SCB variable importance measures is completed with the derivation of a parametric hypothesis test of variable importance. The new importance measures were compared with a t-test based univariate and an SVM-based multivariate variable importances using anatomical and functional magnetic resonance imaging data. The obtained results demonstrated that the new SCB based importance measures were superior to the compared methods in terms of reproducibility and classification accuracy.

Download Full-text

Canadian Association of Radiologists White Paper on De-identification of Medical Imaging: Part 2, Practical Considerations

Canadian Association of Radiologists Journal ◽

10.1177/0846537120967345 ◽

2020 ◽

pp. 084653712096734

Author(s):

William Parker ◽

Jacob L. Jaremko ◽

Mark Cicero ◽

Marleine Azar ◽

Khaled El-Emam ◽

...

Keyword(s):

Machine Learning ◽

Medical Imaging ◽

Access To Health Care ◽

Data Transfer ◽

Large Data ◽

White Paper ◽

Personal Health Information ◽

Third Party ◽

Patient Privacy ◽

Canadian Association

The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 2 of this article will inform CAR members on the practical aspects of medical imaging de-identification, strengths and limitations of de-identification approaches, list of de-identification software and tools available, and perspectives on future directions.

Download Full-text

Using hypertemporal Sentinel-1 data to predict forest growing stock volume

10.1101/2021.09.02.458789 ◽

2021 ◽

Author(s):

Shaojia Ge ◽

Erkki Tomppo ◽

Yrjö Rauste ◽

Ronald E. McRoberts ◽

Jaan Praks ◽

...

Keyword(s):

Time Series ◽

Prediction Accuracy ◽

Temporal Dynamics ◽

Support Vector ◽

List Type ◽

Prediction Errors ◽

Growing Stock ◽

Reduction Techniques ◽

Combined Images ◽

Stock Volume

AbstractIn this study, we assess the potential of long time series of Sentinel-1 SAR data to predict forest growing stock volume and evaluate the temporal dynamics of the predictions. The boreal coniferous forests study site is located near the Hyytiälä forest station in central Finland and covers an area of 2,500 km2 with nearly 17,000 stands. We considered several prediction approaches (linear, support vector and random forests regression) and fine-tuned them to predict growing stock volume in several evaluation scenarios. The analyses used 96 Sentinel-1 images acquired over three years. Different approaches for aggregating SAR images and choosing feature (predictor) variables were evaluated. Our results demonstrate considerable decrease in RMSEs of growing stock volume as the number of images increases. While prediction accuracy using individual Sentinel-1 images varied from 85 to 91 m3/ha RMSE (relative RMSE 50-53%), RMSE with combined images decreased to 75.6 m3/ha (relative RMSE 44%). Feature extraction and dimension reduction techniques facilitated achieving the near-optimal prediction accuracy using only 8-10 images. When using assemblages of eight consecutive images, the GSV was predicted with the greatest accuracy when initial acquisitions started between September and January.HighlightsTime series of 96 Sentinel-1 images is analysed over study area with 17,762 forest stands.Rigorous evaluation of tools for SAR feature selection and GSV prediction.Improved periodic seasonality using assemblages of consecutive Sentinel-1 images.Analysis of combining images acquired in “frozen” and “dry summer” conditions.Competitive estimates using calculation of prediction errors with stand-area weighting.

Download Full-text

BIDS-iEEG: an extension to the brain imaging data structure (BIDS) specification for human intracranial electrophysiology

10.31234/osf.io/r7vc2 ◽

2018 ◽

Author(s):

Christopher Holdgraf ◽

Stefan Appelhoff ◽

Stephan Bickel ◽

Kristofer Bouchard ◽

Sasha D'Ambrosio ◽

...

Keyword(s):

Data Structure ◽

Human Brain ◽

Brain Imaging ◽

Common Ground ◽

Data Transfer ◽

Open Data ◽

Imaging Data ◽

Data Repositories ◽

Brain Imaging Data ◽

The Brain

Intracranial electroencephalography (iEEG) data offer a unique combination of high spatial and temporal resolution measures of the living human brain. However, data collection is limited to highly specialized clinical environments. To improve internal (re)use and external sharing of these unique data, we present a structure for storing and sharing iEEG data: BIDS-iEEG, an extension of the Brain Imaging Data Structure (BIDS) specification, along with freely available examples and a bids-starter-kit. BIDS is a framework for organizing and documenting data and metadata with the aim to make datasets more transparent and reusable and to improve reproducibility of research. It is a community-driven specification with an inclusive decision-making process. As an extension of the BIDS specification, BIDS-iEEG facilitates integration with other modalities such as fMRI, MEG, and EEG. As the BIDS-iEEG extension has received input from many iEEG researchers, it provides a common ground for data transfer within labs, between labs, and in open-data repositories. It will facilitate reproducible analyses across datasets, experiments, and recording sites, allowing scientists to answer more complex questions about the human brain. Finally, the cross-modal nature of BIDS will enable efficient consolidation of data from multiple sites for addressing questions about generalized brain function.

Download Full-text

MEG-BIDS: an extension to the Brain Imaging Data Structure for magnetoencephalography

10.1101/172684 ◽

2017 ◽

Cited By ~ 3

Author(s):

Guiomar Niso ◽

Krzysztof J. Gorgolewski ◽

Elizabeth Bock ◽

Teon L. Brooks ◽

Guillaume Flandin ◽

...

Keyword(s):

Data Structure ◽

Brain Imaging ◽

Brain Activity ◽

Large Data ◽

Imaging Data ◽

Source Imaging ◽

Standard Data ◽

Magnetic Resonance Imaging Mri ◽

Brain Imaging Data ◽

The Brain

AbstractWe present a significant extension of the Brain Imaging Data Structure (BIDS) to support the specific aspects of magnetoencephalography (MEG) data. MEG provides direct measurement of brain activity with millisecond temporal resolution and unique source imaging capabilities. So far, BIDS has provided a solution to structure the organization of magnetic resonance imaging (MRI) data, which nature and acquisition parameters are different. Despite the lack of standard data format for MEG, MEG-BIDS is a principled solution to store, organize and share the typically-large data volumes produced. It builds on BIDS for MRI, and therefore readily yields a multimodal data organization by construction. This is particularly valuable for the anatomical and functional registration of MEG source imaging with MRI. With MEG-BIDS and a growing range of software adopting the standard, the MEG community has a solution to minimize curation overheads, reduce data handling errors and optimize usage of computational resources for analytics. The standard also includes well-defined metadata, to facilitate future data harmonization and sharing efforts.

Download Full-text

Canadian Association of Radiologists White Paper on De-Identification of Medical Imaging: Part 1, General Principles

Canadian Association of Radiologists Journal ◽

10.1177/0846537120967349 ◽

2020 ◽

pp. 084653712096734

Author(s):

William Parker ◽

Jacob L. Jaremko ◽

Mark Cicero ◽

Marleine Azar ◽

Khaled El-Emam ◽

...

Keyword(s):

Machine Learning ◽

Medical Imaging ◽

Access To Health Care ◽

Data Transfer ◽

Large Data ◽

Personal Health Information ◽

Third Party ◽

Patient Privacy ◽

Data Set ◽

Canadian Association

The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 1 of this article will inform CAR members on principles of de-identification, pseudonymization, encryption, direct and indirect identifiers, k-anonymization, risks of reidentification, implementations, data set release models, and validation of AI algorithms, with a view to developing appropriate standards to safeguard patient information effectively.

Download Full-text

Role of brain imaging in disorders of brain–gut interaction: a Rome Working Team Report

Gut ◽

10.1136/gutjnl-2019-318308 ◽

2019 ◽

Vol 68 (9) ◽

pp. 1701-1715 ◽

Cited By ~ 20

Author(s):

Emeran A Mayer ◽

Jennifer Labus ◽

Qasim Aziz ◽

Irene Tracey ◽

Lisa Kilpatrick ◽

...

Keyword(s):

Brain Imaging ◽

Clinical Symptoms ◽

Imaging Techniques ◽

Large Data ◽

Brain Regions ◽

Data Sets ◽

Imaging Data ◽

Pain Syndromes ◽

Network Properties ◽

Metabolic Properties

Imaging of the living human brain is a powerful tool to probe the interactions between brain, gut and microbiome in health and in disorders of brain–gut interactions, in particular IBS. While altered signals from the viscera contribute to clinical symptoms, the brain integrates these interoceptive signals with emotional, cognitive and memory related inputs in a non-linear fashion to produce symptoms. Tremendous progress has occurred in the development of new imaging techniques that look at structural, functional and metabolic properties of brain regions and networks. Standardisation in image acquisition and advances in computational approaches has made it possible to study large data sets of imaging studies, identify network properties and integrate them with non-imaging data. These approaches are beginning to generate brain signatures in IBS that share some features with those obtained in other often overlapping chronic pain disorders such as urological pelvic pain syndromes and vulvodynia, suggesting shared mechanisms. Despite this progress, the identification of preclinical vulnerability factors and outcome predictors has been slow. To overcome current obstacles, the creation of consortia and the generation of standardised multisite repositories for brain imaging and metadata from multisite studies are required.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text