scholarly journals Decentralized Distribution-sampled Classification Models with Application to Brain Imaging

2019 ◽  
Author(s):  
Noah Lewis ◽  
Harshvardhan Gazula ◽  
Sergey M. Plis ◽  
Vince D. Calhoun

Abstract0.1backgroundIn this age of big data, large data stores allow researchers to compose robust models that are accurate and informative. In many cases, the data are stored in separate locations requiring data transfer between local sites, which can cause various practical hurdles, such as privacy concerns or heavy network load. This is especially true for medical imaging data, which can be constrained due to the health insurance portability and accountability act (HIPAA). Medical imaging datasets can also contain many thousands or millions of features, requiring heavy network load.0.2New MethodOur research expands upon current decentralized classification research by implementing a new singleshot method for both neural networks and support vector machines. Our approach is to estimate the statistical distribution of the data at each local site and pass this information to the other local sites where each site resamples from the individual distributions and trains a model on both locally available data and the resampled data.0.3ResultsWe show applications of our approach to handwritten digit classification as well as to multi-subject classification of brain imaging data collected from patients with schizophrenia and healthy controls. Overall, the results showed comparable classification accuracy to the centralized model with lower network load than multishot methods.0.4Comparison with Existing MethodsMany decentralized classifiers are multishot, requiring heavy network traffic. Our model attempts to alleviate this load while preserving prediction accuracy.0.5ConclusionsWe show that our proposed approach performs comparably to a centralized approach while minimizing network traffic compared to multishot methods.0.6HighlightsA novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approach

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1512 ◽  
Author(s):  
Jing Ming ◽  
Eric Verner ◽  
Anand Sarwate ◽  
Ross Kelly ◽  
Cory Reed ◽  
...  

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.


2017 ◽  
Author(s):  
Vanessa Gómez-Verdejo ◽  
Emilio Parrado-Hernández ◽  
Jussi Tohka ◽  

AbstractAn important problem that hinders the use of supervised classification algorithms for brain imaging is that the number of variables per single subject far exceeds the number of training subjects available. Deriving multivariate measures of variable importance becomes a challenge in such scenarios. This paper proposes a new measure of variable importance termed sign-consistency bagging (SCB). The SCB captures variable importance by analyzing the sign consistency of the corresponding weights in an ensemble of linear support vector machine (SVM) classifiers. Further, the SCB variable importances are enhanced by means of transductive conformal analysis. This extra step is important when the data can be assumed to be heterogeneous. Finally, the proposal of these SCB variable importance measures is completed with the derivation of a parametric hypothesis test of variable importance. The new importance measures were compared with a t-test based univariate and an SVM-based multivariate variable importances using anatomical and functional magnetic resonance imaging data. The obtained results demonstrated that the new SCB based importance measures were superior to the compared methods in terms of reproducibility and classification accuracy.


2020 ◽  
pp. 084653712096734
Author(s):  
William Parker ◽  
Jacob L. Jaremko ◽  
Mark Cicero ◽  
Marleine Azar ◽  
Khaled El-Emam ◽  
...  

The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 2 of this article will inform CAR members on the practical aspects of medical imaging de-identification, strengths and limitations of de-identification approaches, list of de-identification software and tools available, and perspectives on future directions.


2021 ◽  
Author(s):  
Shaojia Ge ◽  
Erkki Tomppo ◽  
Yrjö Rauste ◽  
Ronald E. McRoberts ◽  
Jaan Praks ◽  
...  

AbstractIn this study, we assess the potential of long time series of Sentinel-1 SAR data to predict forest growing stock volume and evaluate the temporal dynamics of the predictions. The boreal coniferous forests study site is located near the Hyytiälä forest station in central Finland and covers an area of 2,500 km2 with nearly 17,000 stands. We considered several prediction approaches (linear, support vector and random forests regression) and fine-tuned them to predict growing stock volume in several evaluation scenarios. The analyses used 96 Sentinel-1 images acquired over three years. Different approaches for aggregating SAR images and choosing feature (predictor) variables were evaluated. Our results demonstrate considerable decrease in RMSEs of growing stock volume as the number of images increases. While prediction accuracy using individual Sentinel-1 images varied from 85 to 91 m3/ha RMSE (relative RMSE 50-53%), RMSE with combined images decreased to 75.6 m3/ha (relative RMSE 44%). Feature extraction and dimension reduction techniques facilitated achieving the near-optimal prediction accuracy using only 8-10 images. When using assemblages of eight consecutive images, the GSV was predicted with the greatest accuracy when initial acquisitions started between September and January.HighlightsTime series of 96 Sentinel-1 images is analysed over study area with 17,762 forest stands.Rigorous evaluation of tools for SAR feature selection and GSV prediction.Improved periodic seasonality using assemblages of consecutive Sentinel-1 images.Analysis of combining images acquired in “frozen” and “dry summer” conditions.Competitive estimates using calculation of prediction errors with stand-area weighting.


2018 ◽  
Author(s):  
Christopher Holdgraf ◽  
Stefan Appelhoff ◽  
Stephan Bickel ◽  
Kristofer Bouchard ◽  
Sasha D'Ambrosio ◽  
...  

Intracranial electroencephalography (iEEG) data offer a unique combination of high spatial and temporal resolution measures of the living human brain. However, data collection is limited to highly specialized clinical environments. To improve internal (re)use and external sharing of these unique data, we present a structure for storing and sharing iEEG data: BIDS-iEEG, an extension of the Brain Imaging Data Structure (BIDS) specification, along with freely available examples and a bids-starter-kit. BIDS is a framework for organizing and documenting data and metadata with the aim to make datasets more transparent and reusable and to improve reproducibility of research. It is a community-driven specification with an inclusive decision-making process. As an extension of the BIDS specification, BIDS-iEEG facilitates integration with other modalities such as fMRI, MEG, and EEG. As the BIDS-iEEG extension has received input from many iEEG researchers, it provides a common ground for data transfer within labs, between labs, and in open-data repositories. It will facilitate reproducible analyses across datasets, experiments, and recording sites, allowing scientists to answer more complex questions about the human brain. Finally, the cross-modal nature of BIDS will enable efficient consolidation of data from multiple sites for addressing questions about generalized brain function.


2017 ◽  
Author(s):  
Guiomar Niso ◽  
Krzysztof J. Gorgolewski ◽  
Elizabeth Bock ◽  
Teon L. Brooks ◽  
Guillaume Flandin ◽  
...  

AbstractWe present a significant extension of the Brain Imaging Data Structure (BIDS) to support the specific aspects of magnetoencephalography (MEG) data. MEG provides direct measurement of brain activity with millisecond temporal resolution and unique source imaging capabilities. So far, BIDS has provided a solution to structure the organization of magnetic resonance imaging (MRI) data, which nature and acquisition parameters are different. Despite the lack of standard data format for MEG, MEG-BIDS is a principled solution to store, organize and share the typically-large data volumes produced. It builds on BIDS for MRI, and therefore readily yields a multimodal data organization by construction. This is particularly valuable for the anatomical and functional registration of MEG source imaging with MRI. With MEG-BIDS and a growing range of software adopting the standard, the MEG community has a solution to minimize curation overheads, reduce data handling errors and optimize usage of computational resources for analytics. The standard also includes well-defined metadata, to facilitate future data harmonization and sharing efforts.


2020 ◽  
pp. 084653712096734
Author(s):  
William Parker ◽  
Jacob L. Jaremko ◽  
Mark Cicero ◽  
Marleine Azar ◽  
Khaled El-Emam ◽  
...  

The application of big data, radiomics, machine learning, and artificial intelligence (AI) algorithms in radiology requires access to large data sets containing personal health information. Because machine learning projects often require collaboration between different sites or data transfer to a third party, precautions are required to safeguard patient privacy. Safety measures are required to prevent inadvertent access to and transfer of identifiable information. The Canadian Association of Radiologists (CAR) is the national voice of radiology committed to promoting the highest standards in patient-centered imaging, lifelong learning, and research. The CAR has created an AI Ethical and Legal standing committee with the mandate to guide the medical imaging community in terms of best practices in data management, access to health care data, de-identification, and accountability practices. Part 1 of this article will inform CAR members on principles of de-identification, pseudonymization, encryption, direct and indirect identifiers, k-anonymization, risks of reidentification, implementations, data set release models, and validation of AI algorithms, with a view to developing appropriate standards to safeguard patient information effectively.


Gut ◽  
2019 ◽  
Vol 68 (9) ◽  
pp. 1701-1715 ◽  
Author(s):  
Emeran A Mayer ◽  
Jennifer Labus ◽  
Qasim Aziz ◽  
Irene Tracey ◽  
Lisa Kilpatrick ◽  
...  

Imaging of the living human brain is a powerful tool to probe the interactions between brain, gut and microbiome in health and in disorders of brain–gut interactions, in particular IBS. While altered signals from the viscera contribute to clinical symptoms, the brain integrates these interoceptive signals with emotional, cognitive and memory related inputs in a non-linear fashion to produce symptoms. Tremendous progress has occurred in the development of new imaging techniques that look at structural, functional and metabolic properties of brain regions and networks. Standardisation in image acquisition and advances in computational approaches has made it possible to study large data sets of imaging studies, identify network properties and integrate them with non-imaging data. These approaches are beginning to generate brain signatures in IBS that share some features with those obtained in other often overlapping chronic pain disorders such as urological pelvic pain syndromes and vulvodynia, suggesting shared mechanisms. Despite this progress, the identification of preclinical vulnerability factors and outcome predictors has been slow. To overcome current obstacles, the creation of consortia and the generation of standardised multisite repositories for brain imaging and metadata from multisite studies are required.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


Sign in / Sign up

Export Citation Format

Share Document