scholarly journals Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Anees Abrol ◽  
Zening Fu ◽  
Mustafa Salman ◽  
Rogers Silva ◽  
Yuhui Du ◽  
...  

AbstractRecent critical commentaries unfavorably compare deep learning (DL) with standard machine learning (SML) approaches for brain imaging data analysis. However, their conclusions are often based on pre-engineered features depriving DL of its main advantage — representation learning. We conduct a large-scale systematic comparison profiled in multiple classification and regression tasks on structural MRI images and show the importance of representation learning for DL. Results show that if trained following prevalent DL practices, DL methods have the potential to scale particularly well and substantially improve compared to SML methods, while also presenting a lower asymptotic complexity in relative computational time, despite being more complex. We also demonstrate that DL embeddings span comprehensible task-specific projection spectra and that DL consistently localizes task-discriminative brain biomarkers. Our findings highlight the presence of nonlinearities in neuroimaging data that DL can exploit to generate superior task-discriminative representations for characterizing the human brain.

Author(s):  
Anees Abrol ◽  
Zening Fu ◽  
Mustafa Salman ◽  
Rogers Silva ◽  
Yuhui Du ◽  
...  

AbstractPrevious successes of deep learning (DL) approaches on several complex tasks have hugely inflated expectations of their power to learn subtle properties of complex brain imaging data, and scale to large datasets. Perhaps as a reaction to this inflation, recent critical commentaries unfavorably compare DL with standard machine learning (SML) approaches for the analysis of brain imaging data. Yet, their conclusions are based on pre-engineered features which deprives DL of its main advantage: representation learning. Here we evaluate this and show the importance of representation learning for DL performance on brain imaging data. We report our findings from a large-scale systematic comparison of SML approaches versus DL profiled in a ten-way age and gender-based classification task on 12,314 structural MRI images. Results show that DL methods, if implemented and trained following the prevalent DL practices, have the potential to substantially improve compared to SML approaches. We also show that DL approaches scale particularly well presenting a lower asymptotic complexity in relative computational time, despite being more complex. Our analysis reveals that the performance improvement saturates as the training sample size grows, but shows significantly higher performance throughout. We also show evidence that the superior performance of DL is primarily due to the excellent representation learning capabilities and that SML methods can perform equally well when operating on representations produced by the trained DL models. Finally, we demonstrate that DL embeddings span a comprehensible projection spectrum and that DL consistently localizes discriminative brain biomarkers, providing an example of the robustness of prediction relevance estimates. Our findings highlight the presence of non-linearities in brain imaging data that DL frameworks can exploit to generate superior predictive representations for characterizing the human brain, even with currently available data sizes.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1512 ◽  
Author(s):  
Jing Ming ◽  
Eric Verner ◽  
Anand Sarwate ◽  
Ross Kelly ◽  
Cory Reed ◽  
...  

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.


2021 ◽  
Vol 15 ◽  
Author(s):  
Tinashe M. Tapera ◽  
Matthew Cieslak ◽  
Max Bertolero ◽  
Azeez Adebimpe ◽  
Geoffrey K. Aguirre ◽  
...  

The recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is the de facto standard for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are limited on Flywheel. To address these challenges, we developed “FlywheelTools,” a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.


2020 ◽  
Author(s):  
Sudhakar Tummala ◽  
Niels K. Focke

ABSTRACTRigid and affine registrations to a common template are the essential steps during pre-processing of brain structural magnetic resonance imaging (MRI) data. Manual quality check (QC) of these registrations is quite tedious if the data contains several thousands of images. Therefore, we propose a machine learning (ML) framework for fully automatic QC of these registrations via local computation of the similarity functions such as normalized cross-correlation, normalized mutual-information, and correlation ratio, and using these as features for training of different ML classifiers. To facilitate supervised learning, misaligned images are generated. A structural MRI dataset consisting of 215 subjects from autism brain imaging data exchange is used for 5-fold cross-validation and testing. Few classifiers such as kNN, AdaBoost, and random forest reached testing F1-scores of 0.98 for QC of both rigid and affine registrations. These tested ML models could be deployed for practical use.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S23-S24
Author(s):  
Kendra L Seaman

Abstract In concert with broader efforts to increase the reliability of social science research, there are several efforts to increase transparency and reproducibility in neuroimaging. The large-scale nature of neuroimaging data and constantly evolving analysis tools can make transparency challenging. I will describe emerging tools used to document, organize, and share behavioral and neuroimaging data. These tools include: (1) the preregistration of neuroimaging data sets which increases openness and protects researchers from suspicions of p-hacking, (2) the conversion of neuroimaging data into a standardized format (Brain Imaging Data Structure: BIDS) that enables standardized scripts to process and share neuroimaging data, and (3) the sharing of final neuroimaging results on Neurovault which allows the community to do rapid meta-analysis. Using these tools improves workflows within labs, improves the overall quality of our science and provides a potential model for other disciplines using large-scale data.


2021 ◽  
Author(s):  
Tinashe M. Tapera ◽  
Matthew Cieslak ◽  
Max Bertolero ◽  
Azeez Adebimpe ◽  
Geoffrey K. Aguirre ◽  
...  

ABSTRACTThe recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is a data storage specification for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are not designed for use on cloud-based systems such as Flywheel. To address these challenges, we developed “FlywheelTools”, a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.


GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Ariel Rokem ◽  
Kendrick Kay

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.


2018 ◽  
Vol 8 (4) ◽  
pp. 34 ◽  
Author(s):  
Vishal Saxena ◽  
Xinyu Wu ◽  
Ira Srivastava ◽  
Kehan Zhu

The ongoing revolution in Deep Learning is redefining the nature of computing that is driven by the increasing amount of pattern classification and cognitive tasks. Specialized digital hardware for deep learning still holds its predominance due to the flexibility offered by the software implementation and maturity of algorithms. However, it is being increasingly desired that cognitive computing occurs at the edge, i.e., on hand-held devices that are energy constrained, which is energy prohibitive when employing digital von Neumann architectures. Recent explorations in digital neuromorphic hardware have shown promise, but offer low neurosynaptic density needed for scaling to applications such as intelligent cognitive assistants (ICA). Large-scale integration of nanoscale emerging memory devices with Complementary Metal Oxide Semiconductor (CMOS) mixed-signal integrated circuits can herald a new generation of Neuromorphic computers that will transcend the von Neumann bottleneck for cognitive computing tasks. Such hybrid Neuromorphic System-on-a-chip (NeuSoC) architectures promise machine learning capability at chip-scale form factor, and several orders of magnitude improvement in energy efficiency. Practical demonstration of such architectures has been limited as performance of emerging memory devices falls short of the expected behavior from the idealized memristor-based analog synapses, or weights, and novel machine learning algorithms are needed to take advantage of the device behavior. In this article, we review the challenges involved and present a pathway to realize large-scale mixed-signal NeuSoCs, from device arrays and circuits to spike-based deep learning algorithms with ‘brain-like’ energy-efficiency.


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


Big data is large-scale data collected for knowledge discovery, it has been widely used in various applications. Big data often has image data from the various applications and requires effective technique to process data. In this paper, survey has been done in the big image data researches to analysis the effective performance of the methods. Deep learning techniques provides the effective performance compared to other methods included wavelet based methods. The deep learning techniques has the problem of requiring more computational time, and this can be overcome by lightweight methods.


Sign in / Sign up

Export Citation Format

Share Document