BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

Mapping Intimacies ◽

10.31219/osf.io/j4sbc ◽

2019 ◽

Cited By ~ 2

Author(s):

Manoj Kumar ◽

Cameron Thomas Ellis ◽

Qihong Lu ◽

Hejia Zhang ◽

Mihai Capota ◽

...

Keyword(s):

Machine Learning ◽

Functional Connectivity ◽

Open Source ◽

Programming Languages ◽

High Performance ◽

Large Scale ◽

Markov Models ◽

Matrix Analysis ◽

Fmri Analysis ◽

User Friendly

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

Download Full-text

BrainIAK: The Brain Imaging Analysis Kit

10.31219/osf.io/db2ev ◽

2020 ◽

Author(s):

Manoj Kumar ◽

Michael Anderson ◽

James Antony ◽

Christopher Baldassano ◽

Paula Pacheco Brooks ◽

...

Keyword(s):

Brain Imaging ◽

High Performance ◽

Markov Models ◽

Real Data ◽

Matrix Analysis ◽

Imaging Analysis ◽

Neural Basis ◽

Noise Characteristics ◽

Fmri Analysis ◽

The Brain

Functional magnetic resonance imaging (fMRI) offers a rich source of data for studying the neural basis of cognition. Here, we describe the Brain Imaging Analysis Kit (BrainIAK), an open-source, free Python package that provides computationally-optimized solutions to key problems in advanced fMRI analysis. A variety of techniques are presently included in BrainIAK: intersubject correlation (ISC) and intersubject functional connectivity (ISFC), functional alignment via the shared response model (SRM), full correlation matrix analysis (FCMA), a Bayesian version of representational similarity analysis (BRSA), event segmentation using hidden Markov models, topographic factor analysis (TFA), inverted encoding models (IEM), an fMRI data simulator that uses noise characteristics from real data (fmrisim), and some emerging methods. These techniques have been optimized to leverage the efficiencies of high performance compute (HPC) clusters, and the same code can be seamlessly transferred from a laptop to a cluster. For each of the aforementioned techniques, we describe the data analysis problem that the technique is meant to solve, and how it solves that problem; we also include an example Jupyter notebook for each technique and an annotated bibliography of papers that have used and/or described that technique. In addition to the sections describing various analysis techniques in BrainIAK, we have included sections describing the future applications of BrainIAK to real-time fMRI, tutorials that we have developed and shared online to facilitate learning the techniques in BrainIAK, computational innovations in BrainIAK, and how to contribute to BrainIAK. We hope that this manuscript helps readers to understand how BrainIAK might be useful in their research.

Download Full-text

Formal semantics and high performance in declarative machine learning using Datalog

The VLDB Journal ◽

10.1007/s00778-021-00665-6 ◽

2021 ◽

Author(s):

Jin Wang ◽

Jiacheng Wu ◽

Mingda Li ◽

Jiaqi Gu ◽

Ariyam Das ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Large Scale ◽

Formal Semantics ◽

Distributed Data ◽

Recursive Programs ◽

Diverse Application ◽

User Friendly ◽

Performance Gains ◽

New Framework

AbstractWith an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely declarative ML framework with a Datalog query interface. We show that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics. This is achieved by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones. We further provide specialized compilation and planning techniques for semi-naive fixpoint computation in the presence of aggregates and optimization strategies that are effective on diverse recursive programs and distributed data platforms. To test and demonstrate these research advances, we have developed a powerful and user-friendly system on top of Apache Spark. Extensive evaluations on large-scale datasets illustrate that this approach will achieve promising performance gains while improving both programming flexibility and ease of development and deployment for ML applications.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

High-performance Machine Learning in Enabling Large-scale Load Analysis Considering Class Imbalance and Frequency Domain Characteristics

2020 IEEE Sustainable Power and Energy Conference (iSPEC) ◽

10.1109/ispec50848.2020.9350922 ◽

2020 ◽

Author(s):

Xi Wang ◽

Quan Tang ◽

Haiyan Wang ◽

Ruiguang Ma ◽

Zizhuo Tang

Keyword(s):

Machine Learning ◽

Frequency Domain ◽

High Performance ◽

Large Scale ◽

Class Imbalance ◽

Load Analysis

Download Full-text

Machine learning of serum metabolic patterns encodes early-stage lung adenocarcinoma

10.21203/rs.3.pex-963/v1 ◽

2021 ◽

Author(s):

Lin Huang ◽

Kun Qian

Keyword(s):

Machine Learning ◽

Lung Adenocarcinoma ◽

Cancer Detection ◽

High Performance ◽

Large Scale ◽

Early Cancer ◽

Early Stage ◽

Early Cancer Detection ◽

Ionization Mass ◽

Efficient Test

Abstract Early cancer detection greatly increases the chances for successful treatment, but available diagnostics for some tumours, including lung adenocarcinoma (LA), are limited. An ideal early-stage diagnosis of LA for large-scale clinical use must address quick detection, low invasiveness, and high performance. Here, we conduct machine learning of serum metabolic patterns to detect early-stage LA. We extract direct metabolic patterns by the optimized ferric particle-assisted laser desorption/ionization mass spectrometry within 1 second using only 50 nL of serum. We define a metabolic range of 100-400 Da with 143 m/z features. We diagnose early-stage LA with sensitivity~70-90% and specificity~90-93% through the sparse regression machine learning of patterns. We identify a biomarker panel of seven metabolites and relevant pathways to distinguish early-stage LA from controls (p < 0.05). Our approach advances the design of metabolic analysis for early cancer detection and holds promise as an efficient test for low-cost rollout to clinics.

Download Full-text

Higher-level geophysical modelling

10.5194/egusphere-egu21-2127 ◽

2021 ◽

Author(s):

Roman Nuterman ◽

Dion Häfner ◽

Markus Jochum

Keyword(s):

Machine Learning ◽

Programming Languages ◽

High Performance ◽

Ocean Model ◽

User Friendliness ◽

Model Code ◽

Building Models ◽

Fortran Implementation ◽

High Level ◽

New Generation

Until recently, our pure Python, primitive equation ocean model Veros&#160; has been about 1.5x slower than a corresponding Fortran implementation.&#160; But thanks to a thriving scientific and machine learning library&#160; ecosystem, tremendous speed-ups on GPU, and to a lesser degree CPU, are&#160; within reach. Leveraging Google's JAX library, we find that our Python&#160; model code can reach a 2-5 times higher energy efficiency on GPU&#160; compared to a traditional Fortran model.Therefore, we propose a new generation of geophysical models: One that&#160; combines high-level abstractions and user friendliness on one hand, and&#160; that leverages modern developments in high-performance computing and&#160; machine learning research on the other hand.We discuss what there is to gain from building models in high-level&#160; programming languages, what we have achieved in Veros, and where we see&#160; the modelling community heading in the future.

Download Full-text

Logging Analysis and Prediction in Open Source Java Project

Research Anthology on Usage and Development of Open Source Software ◽

10.4018/978-1-7998-9158-1.ch038 ◽

2021 ◽

pp. 733-761

Author(s):

Sangeeta Lal ◽

Neetu Sardana ◽

Ashish Sureka

Keyword(s):

Machine Learning ◽

Content Analysis ◽

Software Development ◽

Anomaly Detection ◽

Open Source ◽

Large Scale ◽

Source Code ◽

Scale Analysis ◽

Large Scale Analysis ◽

Research Questions

Log statements present in source code provide important information to the software developers because they are useful in various software development activities such as debugging, anomaly detection, and remote issue resolution. Most of the previous studies on logging analysis and prediction provide insights and results after analyzing only a few code constructs. In this chapter, the authors perform an in-depth, focused, and large-scale analysis of logging code constructs at two levels: the file level and catch-blocks level. They answer several research questions related to statistical and content analysis. Statistical and content analysis reveals the presence of differentiating properties among logged and nonlogged code constructs. Based on these findings, the authors propose a machine-learning-based model for catch-blocks logging prediction. The machine-learning-based model is found to be effective in catch-blocks logging prediction.

Download Full-text

GraphVar 2.0: A user-friendly toolbox for machine learning on functional connectivity measures

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2018.07.001 ◽

2018 ◽

Vol 308 ◽

pp. 21-33 ◽

Cited By ~ 5

Author(s):

L. Waller ◽

A. Brovkin ◽

L. Dorfschmidt ◽

D. Bzdok ◽

H. Walter ◽

...

Keyword(s):

Machine Learning ◽

Functional Connectivity ◽

User Friendly

Download Full-text

Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms

Journal of Computational Science ◽

10.1016/j.jocs.2015.09.008 ◽

2015 ◽

Vol 11 ◽

pp. 69-81 ◽

Cited By ~ 32

Author(s):

Emad Elsebakhi ◽

Frank Lee ◽

Eric Schendel ◽

Anwar Haque ◽

Nagarajan Kathireason ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Functional Networks ◽

Computing Platforms ◽

Performance Computing

Download Full-text

QuickStitch for seamless stitching of confocal mosaics through high-pass filtering and recursive normalization

10.1101/075440 ◽

2016 ◽

Author(s):

Pavel A. Brodskiy ◽

Paulina M. Eberts ◽

Cody Narciso ◽

Jochen Kursawe ◽

Alexander Fletcher ◽

...

Keyword(s):

Open Source ◽

Large Scale ◽

Imaging System ◽

Tissue Level ◽

Cellular Level ◽

Specific Information ◽

Open Source Tool ◽

The Individual ◽

User Friendly ◽

High Pass

ABSTRACTFluorescence micrographs naturally exhibit darkening around their edges (vignetting), which makes seamless stitching challenging. If vignetting is not corrected for, a stitched image will have visible seams where the individual images (tiles) overlap, introducing a systematic error into any quantitative analysis of the image. Although multiple vignetting correction methods exist, there remains no open-source tool that robustly handles large 2D immunofluorescence-based mosaic images. Here, we develop and validate QuickStitch, a tool that applies a recursive normalization algorithm to stitch large-scale immunofluorescence-based mosaics without incurring vignetting seams. We demonstrate how the tool works successfully for tissues of differing size, morphology, and fluorescence intensity. QuickStitch requires no specific information about the imaging system. It is provided as an open-source tool that is both user friendly and extensible, allowing straightforward incorporation into existing image processing pipelines. This enables studies that require accurate segmentation and analysis of high-resolution datasets when parameters of interest include both cellular-level phenomena and larger tissue-level regions of interest.

Download Full-text