scholarly journals BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

Author(s):  
Manoj Kumar ◽  
Cameron Thomas Ellis ◽  
Qihong Lu ◽  
Hejia Zhang ◽  
Mihai Capota ◽  
...  

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

2020 ◽  
Author(s):  
Manoj Kumar ◽  
Michael Anderson ◽  
James Antony ◽  
Christopher Baldassano ◽  
Paula Pacheco Brooks ◽  
...  

Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally-optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEM), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve, and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.


2021 ◽  
Author(s):  
Jin Wang ◽  
Jiacheng Wu ◽  
Mingda Li ◽  
Jiaqi Gu ◽  
Ariyam Das ◽  
...  

AbstractWith an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


2021 ◽  
Author(s):  
Lin Huang ◽  
Kun Qian

Abstract Early cancer detection greatly increases the chances for successful treatment, but available diagnostics for some tumours, including lung adenocarcinoma (LA), are limited. An ideal early-stage diagnosis of LA for large-scale clinical use must address quick detection, low invasiveness, and high performance. Here, we conduct machine learning of serum metabolic patterns to detect early-stage LA. We extract direct metabolic patterns by the optimized ferric particle-assisted laser desorption/ionization mass spectrometry within 1 second using only 50 nL of serum. We define a metabolic range of 100-400 Da with 143 m/z features. We diagnose early-stage LA with sensitivity~70-90% and specificity~90-93% through the sparse regression machine learning of patterns. We identify a biomarker panel of seven metabolites and relevant pathways to distinguish early-stage LA from controls (p < 0.05). Our approach advances the design of metabolic analysis for early cancer detection and holds promise as an efficient test for low-cost rollout to clinics.


2021 ◽  
Author(s):  
Roman Nuterman ◽  
Dion Häfner ◽  
Markus Jochum

&lt;p&gt;Until recently, our pure Python, primitive equation ocean model Veros&amp;#160;&lt;br&gt;has been about 1.5x slower than a corresponding Fortran implementation.&amp;#160;&lt;br&gt;But thanks to a thriving scientific and machine learning library&amp;#160;&lt;br&gt;ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are&amp;#160;&lt;br&gt;within reach. Leveraging Google's JAX library, we find that our Python&amp;#160;&lt;br&gt;model code can reach a 2-5 times higher energy efficiency on GPU&amp;#160;&lt;br&gt;compared to a traditional Fortran model.&lt;/p&gt;&lt;p&gt;Therefore, we propose a new generation of geophysical models: One that&amp;#160;&lt;br&gt;combines high-level abstractions and user friendliness on one hand, and&amp;#160;&lt;br&gt;that leverages modern developments in high-performance computing and&amp;#160;&lt;br&gt;machine learning research on the other hand.&lt;/p&gt;&lt;p&gt;We discuss what there is to gain from building models in high-level&amp;#160;&lt;br&gt;programming languages, what we have achieved in Veros, and where we see&amp;#160;&lt;br&gt;the modelling community heading in the future.&lt;/p&gt;


Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.


2018 ◽  
Vol 308 ◽  
pp. 21-33 ◽  
Author(s):  
L. Waller ◽  
A. Brovkin ◽  
L. Dorfschmidt ◽  
D. Bzdok ◽  
H. Walter ◽  
...  

2016 ◽  
Author(s):  
Pavel A. Brodskiy ◽  
Paulina M. Eberts ◽  
Cody Narciso ◽  
Jochen Kursawe ◽  
Alexander Fletcher ◽  
...  

ABSTRACTFluorescence micrographs naturally exhibit darkening around their edges (vignetting), which makes seamless stitching challenging. If vignetting is not corrected for, a stitched image will have visible seams where the individual images (tiles) overlap, introducing a systematic error into any quantitative analysis of the image. Although multiple vignetting correction methods exist, there remains no open-source tool that robustly handles large 2D immunofluorescence-based mosaic images. Here, we develop and validate QuickStitch, a tool that applies a recursive normalization algorithm to stitch large-scale immunofluorescence-based mosaics without incurring vignetting seams. We demonstrate how the tool works successfully for tissues of differing size, morphology, and fluorescence intensity. QuickStitch requires no specific information about the imaging system. It is provided as an open-source tool that is both user friendly and extensible, allowing straightforward incorporation into existing image processing pipelines. This enables studies that require accurate segmentation and analysis of high-resolution datasets when parameters of interest include both cellular-level phenomena and larger tissue-level regions of interest.


Sign in / Sign up

Export Citation Format

Share Document