scholarly journals dSNE: a visualization approach for use with decentralized data

2019 ◽  
Author(s):  
D. K. Saha ◽  
V. D. Calhoun ◽  
Y. Du ◽  
Z. Fu ◽  
S. R. Panta ◽  
...  

AbstractVisualization of high dimensional large-scale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. It plays a vital role in neuroimaging field because sometimes it is the only way to perform quality control of large dataset. There are many methods developed to perform this task but most of them rely on the assumption that all samples are locally available for the computation. Specifically, one needs access to all the samples in order to compute the distance directly between all pairs of points to measure the similarity. But all pairs of samples may not be available locally always from local sites for various reasons (e.g. privacy concerns for rare disease data, institutional or IRB policies). This is quite common for biomedical data, e.g. neuroimaging and genetic, where privacy-preservation is a major concern. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. We introduced an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). In our approach, data samples (i.e. brain images) located at different sites are simultaneously mapped into the same space according to their similarities. Yet, the data never leaves the individual sites and no pairwise metric is ever directly computed between any two samples not collocated. Based on the Modified National Institute of Standards and Technology database (MNIST) and the Columbia Object Image Library (COIL-20) dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to various multi-site neuroimaging datasets and show promising results which highlight the potential of our decentralized visualization approach.

Author(s):  
Debbrata K. Saha ◽  
Vince D. Calhoun ◽  
Sandeep R. Panta ◽  
Sergey M. Plis

Visualization of high dimensional large-scale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. There are many methods developed for this task but most assume that all pairs of samples are available for common computation. Specifically, the distances between all pairs of points need to be directly computable. In contrast, we work with sensitive neuroimaging data, when local sites cannot share their samples and the distances cannot be easily computed across the sites. Yet, the desire is to let all the local data participate in collaborative computation without leaving their respective sites. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. This paper introduces an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). Based on the MNIST dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to a multi-site neuroimaging dataset with encouraging results.


2019 ◽  
Author(s):  
Md Rishad Ahmed ◽  
Yuan Zhang ◽  
Omer T. Inan ◽  
Hongen Liao

AbstractAutism spectrum disorder (ASD) is an intricate neuropsychiatric brain disorder characterized by social deficits and repetitive behaviors. Associated ASD biomarkers can be supportive of apprehending the underlying roots of the disease and lead the targeted diagnosis as well as treatment. Although deep learning approaches have been applied in functional magnetic resonance imaging (fMRI) based clinical or behavioral identification of ASD, most erstwhile models are inadequate in their capacity to exploit the data richness. Classification techniques often solely rely on region-based summary and/or functional connectivity analysis of one pipeline or unique site dataset. Besides these, biomedical data modeling to analyze big data related to ASD is still perplexing due to its complexity and heterogeneity. Single volume image consideration has not been previously investigated in classification purposes. By deeming these challenges, in this work, firstly, we design an image generator to generate single volume brain images from the whole-brain image of each subject. Secondly, the single volume images are analyzed by evaluating four deep learning approaches comprising one amended volume base Convolutional Neural Network framework to classify ASD and typical control participants. Thirdly, we propose a novel deep ensemble learning classifier using VGG16 as feature extractor to ensure further classification performance. Then, to evaluate the classifier performance across the inter sites, we apply the proposed method on each site individually and validate our findings by comparing literature reports. We showcase our approaches on large-scale multi-site brain imaging dataset (ABIDE) by considering four preprocessing pipelines, and the outcome demonstrates the state-of-the-art performance compared with the literature findings; hence, which are robust and consistent.


2017 ◽  
Author(s):  
Oscar Esteban ◽  
Daniel Birman ◽  
Marie Schaer ◽  
Oluwasanmi O. Koyejo ◽  
Russell A. Poldrack ◽  
...  

AbstractQuality control of MRI is essential for excluding problematic acquisitions and avoiding bias in subsequent image processing and analysis. Visual inspection is subjective and impractical for large scale datasets. Although automated quality assessments have been demonstrated on single-site datasets, it is unclear that solutions can generalize to unseen data acquired at new sites. Here, we introduce the MRI Quality Control tool (MRIQC), a tool for extracting quality measures and fitting a binary (accept/exclude) classifier. Our tool can be run both locally and as a free online service via the OpenNeuro.org portal. The classifier is trained on a publicly available, multi-site dataset (17 sites, N=1102). We perform model selection evaluating different normalization and feature exclusion approaches aimed at maximizing across-site generalization and estimate an accuracy of 76%±13% on new sites, using leave-one-site-out cross-validation. We confirm that result on a held-out dataset (2 sites, N=265) also obtaining a 76% accuracy. Even though the performance of the trained classifier is statistically above chance, we show that it is susceptible to site effects and unable to account for artifacts specific to new sites. MRIQC performs with high accuracy in intra-site prediction, but performance on unseen sites leaves space for improvement which might require more labeled data and new approaches to the between-site variability. Overcoming these limitations is crucial for a more objective quality assessment of neuroimaging data, and to enable the analysis of extremely large and multi-site samples.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Amir Bahmani ◽  
Arash Alavi ◽  
Thore Buergel ◽  
Sushil Upadhyayula ◽  
Qiwen Wang ◽  
...  

AbstractThe large amount of biomedical data derived from wearable sensors, electronic health records, and molecular profiling (e.g., genomics data) is rapidly transforming our healthcare systems. The increasing scale and scope of biomedical data not only is generating enormous opportunities for improving health outcomes but also raises new challenges ranging from data acquisition and storage to data analysis and utilization. To meet these challenges, we developed the Personal Health Dashboard (PHD), which utilizes state-of-the-art security and scalability technologies to provide an end-to-end solution for big biomedical data analytics. The PHD platform is an open-source software framework that can be easily configured and deployed to any big data health project to store, organize, and process complex biomedical data sets, support real-time data analysis at both the individual level and the cohort level, and ensure participant privacy at every step. In addition to presenting the system, we illustrate the use of the PHD framework for large-scale applications in emerging multi-omics disease studies, such as collecting and visualization of diverse data types (wearable, clinical, omics) at a personal level, investigation of insulin resistance, and an infrastructure for the detection of presymptomatic COVID-19.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Nikhil Bhagwat ◽  
Amadou Barry ◽  
Erin W Dickie ◽  
Shawn T Brown ◽  
Gabriel A Devenyi ◽  
...  

Abstract Background The choice of preprocessing pipeline introduces variability in neuroimaging analyses that affects the reproducibility of scientific findings. Features derived from structural and functional MRI data are sensitive to the algorithmic or parametric differences of preprocessing tasks, such as image normalization, registration, and segmentation to name a few. Therefore it is critical to understand and potentially mitigate the cumulative biases of pipelines in order to distinguish biological effects from methodological variance. Methods Here we use an open structural MRI dataset (ABIDE), supplemented with the Human Connectome Project, to highlight the impact of pipeline selection on cortical thickness measures. Specifically, we investigate the effect of (i) software tool (e.g., ANTS, CIVET, FreeSurfer), (ii) cortical parcellation (Desikan-Killiany-Tourville, Destrieux, Glasser), and (iii) quality control procedure (manual, automatic). We divide our statistical analyses by (i) method type, i.e., task-free (unsupervised) versus task-driven (supervised); and (ii) inference objective, i.e., neurobiological group differences versus individual prediction. Results Results show that software, parcellation, and quality control significantly affect task-driven neurobiological inference. Additionally, software selection strongly affects neurobiological (i.e. group) and individual task-free analyses, and quality control alters the performance for the individual-centric prediction tasks. Conclusions This comparative performance evaluation partially explains the source of inconsistencies in neuroimaging findings. Furthermore, it underscores the need for more rigorous scientific workflows and accessible informatics resources to replicate and compare preprocessing pipelines to address the compounding problem of reproducibility in the age of large-scale, data-driven computational neuroscience.


1966 ◽  
Vol 05 (02) ◽  
pp. 67-74 ◽  
Author(s):  
W. I. Lourie ◽  
W. Haenszeland

Quality control of data collected in the United States by the Cancer End Results Program utilizing punchcards prepared by participating registries in accordance with a Uniform Punchcard Code is discussed. Existing arrangements decentralize responsibility for editing and related data processing to the local registries with centralization of tabulating and statistical services in the End Results Section, National Cancer Institute. The most recent deck of punchcards represented over 600,000 cancer patients; approximately 50,000 newly diagnosed cases are added annually.Mechanical editing and inspection of punchcards and field audits are the principal tools for quality control. Mechanical editing of the punchcards includes testing for blank entries and detection of in-admissable or inconsistent codes. Highly improbable codes are subjected to special scrutiny. Field audits include the drawing of a 1-10 percent random sample of punchcards submitted by a registry; the charts are .then reabstracted and recoded by a NCI staff member and differences between the punchcard and the results of independent review are noted.


Author(s):  
Yulia P. Melentyeva

In recent years as public in general and specialist have been showing big interest to the matters of reading. According to discussion and launch of the “Support and Development of Reading National Program”, many Russian libraries are organizing the large-scale events like marathons, lecture cycles, bibliographic trainings etc. which should draw attention of different social groups to reading. The individual forms of attraction to reading are used much rare. To author’s mind the main reason of such an issue has to be the lack of information about forms and methods of attraction to reading.


2021 ◽  
Author(s):  
Carmen Seller Oria ◽  
Adrian Thummerer ◽  
Jeffrey Free ◽  
Johannes A. Langendijk ◽  
Stefan Both ◽  
...  

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Tao Yue ◽  
Da Zhao ◽  
Duc T. T. Phan ◽  
Xiaolin Wang ◽  
Joshua Jonghyun Park ◽  
...  

AbstractThe vascular network of the circulatory system plays a vital role in maintaining homeostasis in the human body. In this paper, a novel modular microfluidic system with a vertical two-layered configuration is developed to generate large-scale perfused microvascular networks in vitro. The two-layer polydimethylsiloxane (PDMS) configuration allows the tissue chambers and medium channels not only to be designed and fabricated independently but also to be aligned and bonded accordingly. This method can produce a modular microfluidic system that has high flexibility and scalability to design an integrated platform with multiple perfused vascularized tissues with high densities. The medium channel was designed with a rhombic shape and fabricated to be semiclosed to form a capillary burst valve in the vertical direction, serving as the interface between the medium channels and tissue chambers. Angiogenesis and anastomosis at the vertical interface were successfully achieved by using different combinations of tissue chambers and medium channels. Various large-scale microvascular networks were generated and quantified in terms of vessel length and density. Minimal leakage of the perfused 70-kDa FITC-dextran confirmed the lumenization of the microvascular networks and the formation of tight vertical interconnections between the microvascular networks and medium channels in different structural layers. This platform enables the culturing of interconnected, large-scale perfused vascularized tissue networks with high density and scalability for a wide range of multiorgan-on-a-chip applications, including basic biological studies and drug screening.


2020 ◽  
Vol 499 (2) ◽  
pp. 2934-2958
Author(s):  
A Richard-Laferrière ◽  
J Hlavacek-Larrondo ◽  
R S Nemmen ◽  
C L Rhea ◽  
G B Taylor ◽  
...  

ABSTRACT A variety of large-scale diffuse radio structures have been identified in many clusters with the advent of new state-of-the-art facilities in radio astronomy. Among these diffuse radio structures, radio mini-halos are found in the central regions of cool core clusters. Their origin is still unknown and they are challenging to discover; less than 30 have been published to date. Based on new VLA observations, we confirmed the mini-halo in the massive strong cool core cluster PKS 0745−191 (z = 0.1028) and discovered one in the massive cool core cluster MACS J1447.4+0827 (z = 0.3755). Furthermore, using a detailed analysis of all known mini-halos, we explore the relation between mini-halos and active galactic nucleus (AGN) feedback processes from the central galaxy. We find evidence of strong, previously unknown correlations between mini-halo radio power and X-ray cavity power, and between mini-halo and the central galaxy radio power related to the relativistic jets when spectrally decomposing the AGN radio emission into a component for past outbursts and one for ongoing accretion. Overall, our study indicates that mini-halos are directly connected to the central AGN in clusters, following previous suppositions. We hypothesize that AGN feedback may be one of the dominant mechanisms giving rise to mini-halos by injecting energy into the intra-cluster medium and reaccelerating an old population of particles, while sloshing motion may drive the overall shape of mini-halos inside cold fronts. AGN feedback may therefore not only play a vital role in offsetting cooling in cool core clusters, but may also play a fundamental role in re-energizing non-thermal particles in clusters.


Sign in / Sign up

Export Citation Format

Share Document