scholarly journals Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

GigaScience ◽  
2020 ◽  
Vol 9 (12) ◽  
Author(s):  
Ariel Rokem ◽  
Kendrick Kay

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 1512 ◽  
Author(s):  
Jing Ming ◽  
Eric Verner ◽  
Anand Sarwate ◽  
Ross Kelly ◽  
Cory Reed ◽  
...  

In the era of Big Data, sharing neuroimaging data across multiple sites has become increasingly important. However, researchers who want to engage in centralized, large-scale data sharing and analysis must often contend with problems such as high database cost, long data transfer time, extensive manual effort, and privacy issues for sensitive data. To remove these barriers to enable easier data sharing and analysis, we introduced a new, decentralized, privacy-enabled infrastructure model for brain imaging data called COINSTAC in 2016. We have continued development of COINSTAC since this model was first introduced. One of the challenges with such a model is adapting the required algorithms to function within a decentralized framework. In this paper, we report on how we are solving this problem, along with our progress on several fronts, including additional decentralized algorithms implementation, user interface enhancement, decentralized regression statistic calculation, and complete pipeline specifications.


2021 ◽  
Vol 15 ◽  
Author(s):  
Tinashe M. Tapera ◽  
Matthew Cieslak ◽  
Max Bertolero ◽  
Azeez Adebimpe ◽  
Geoffrey K. Aguirre ◽  
...  

The recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is the de facto standard for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are limited on Flywheel. To address these challenges, we developed “FlywheelTools,” a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S23-S24
Author(s):  
Kendra L Seaman

Abstract In concert with broader efforts to increase the reliability of social science research, there are several efforts to increase transparency and reproducibility in neuroimaging. The large-scale nature of neuroimaging data and constantly evolving analysis tools can make transparency challenging. I will describe emerging tools used to document, organize, and share behavioral and neuroimaging data. These tools include: (1) the preregistration of neuroimaging data sets which increases openness and protects researchers from suspicions of p-hacking, (2) the conversion of neuroimaging data into a standardized format (Brain Imaging Data Structure: BIDS) that enables standardized scripts to process and share neuroimaging data, and (3) the sharing of final neuroimaging results on Neurovault which allows the community to do rapid meta-analysis. Using these tools improves workflows within labs, improves the overall quality of our science and provides a potential model for other disciplines using large-scale data.


2020 ◽  
Author(s):  
Christopher R Madan

We are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility--both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.


2021 ◽  
Author(s):  
Tinashe M. Tapera ◽  
Matthew Cieslak ◽  
Max Bertolero ◽  
Azeez Adebimpe ◽  
Geoffrey K. Aguirre ◽  
...  

ABSTRACTThe recent and growing focus on reproducibility in neuroimaging studies has led many major academic centers to use cloud-based imaging databases for storing, analyzing, and sharing complex imaging data. Flywheel is one such database platform that offers easily accessible, large-scale data management, along with a framework for reproducible analyses through containerized pipelines. The Brain Imaging Data Structure (BIDS) is a data storage specification for neuroimaging data, but curating neuroimaging data into BIDS can be a challenging and time-consuming task. In particular, standard solutions for BIDS curation are not designed for use on cloud-based systems such as Flywheel. To address these challenges, we developed “FlywheelTools”, a software toolbox for reproducible data curation and manipulation on Flywheel. FlywheelTools includes two elements: fw-heudiconv, for heuristic-driven curation of data into BIDS, and flaudit, which audits and inventories projects on Flywheel. Together, these tools accelerate reproducible neuroscience research on the widely used Flywheel platform.


2021 ◽  
Author(s):  
Christopher R. Madan

AbstractWe are now in a time of readily available brain imaging data. Not only are researchers now sharing data more than ever before, but additionally large-scale data collecting initiatives are underway with the vision that many future researchers will use the data for secondary analyses. Here I provide an overview of available datasets and some example use cases. Example use cases include examining individual differences, more robust findings, reproducibility–both in public input data and availability as a replication sample, and methods development. I further discuss a variety of considerations associated with using existing data and the opportunities associated with large datasets. Suggestions for further readings on general neuroimaging and topic-specific discussions are also provided.


Author(s):  
Jun Huang ◽  
Linchuan Xu ◽  
Jing Wang ◽  
Lei Feng ◽  
Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.


2021 ◽  
Vol 12 ◽  
Author(s):  
Rayus Kuplicki ◽  
James Touthang ◽  
Obada Al Zoubi ◽  
Ahmad Mayeli ◽  
Masaya Misaki ◽  
...  

Neuroscience studies require considerable bioinformatic support and expertise. Numerous high-dimensional and multimodal datasets must be preprocessed and integrated to create robust and reproducible analysis pipelines. We describe a common data elements and scalable data management infrastructure that allows multiple analytics workflows to facilitate preprocessing, analysis and sharing of large-scale multi-level data. The process uses the Brain Imaging Data Structure (BIDS) format and supports MRI, fMRI, EEG, clinical, and laboratory data. The infrastructure provides support for other datasets such as Fitbit and flexibility for developers to customize the integration of new types of data. Exemplar results from 200+ participants and 11 different pipelines demonstrate the utility of the infrastructure.


2021 ◽  
Vol 118 (47) ◽  
pp. e2109889118
Author(s):  
Christopher W. Lynn ◽  
Eli J. Cornblath ◽  
Lia Papadopoulos ◽  
Maxwell A. Bertolero ◽  
Danielle S. Bassett

Living systems break detailed balance at small scales, consuming energy and producing entropy in the environment to perform molecular and cellular functions. However, it remains unclear how broken detailed balance manifests at macroscopic scales and how such dynamics support higher-order biological functions. Here we present a framework to quantify broken detailed balance by measuring entropy production in macroscopic systems. We apply our method to the human brain, an organ whose immense metabolic consumption drives a diverse range of cognitive functions. Using whole-brain imaging data, we demonstrate that the brain nearly obeys detailed balance when at rest, but strongly breaks detailed balance when performing physically and cognitively demanding tasks. Using a dynamic Ising model, we show that these large-scale violations of detailed balance can emerge from fine-scale asymmetries in the interactions between elements, a known feature of neural systems. Together, these results suggest that violations of detailed balance are vital for cognition and provide a general tool for quantifying entropy production in macroscopic systems.


2013 ◽  
Vol 19 (6) ◽  
pp. 659-667 ◽  
Author(s):  
A Di Martino ◽  
C-G Yan ◽  
Q Li ◽  
E Denio ◽  
F X Castellanos ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document