SIMNETS: a computationally efficient and scalable framework for identifying sub-networks of functionally similar neurons

AbstractRecent technological advances have made it possible to simultaneously record the activity of thousands of individual neurons in the cortex of awake behaving animals. However, the comparatively slower development of analytical tools capable of handling the scale and complexity of large-scale recordings is a growing problem for the field of neuroscience. We present the Similarity Networks (SIMNETS) algorithm: a computationally efficient and scalable method for identifying and visualizing sub-networks of functionally similar neurons within larger simultaneously recorded ensembles. While traditional approaches tend to group neurons according to the statistical similarities of inter-neuron spike patterns, our approach begins by mathematically capturing the intrinsic relationship between the spike train outputs of each neuron across experimental conditions, before any comparisons are made between neurons. This strategy estimates the intrinsic geometry of each neuron’s output space, allowing us to capture the information processing properties of each neuron in a common format that is easily compared between neurons. Dimensionality reduction tools are then used to map high-dimensional neuron similarity vectors into a low-dimensional space where functional groupings are identified using clustering and statistical techniques. SIMNETS makes minimal assumptions about single neuron encoding properties; is efficient enough to run on consumer-grade hardware (100 neurons < 4s run-time); and has a computational complexity that scales near-linearly with neuron number. These properties make SIMNETS well-suited for examining large networks of neurons during complex behaviors. We validate the ability of our approach for detecting statistically and physiologically meaningful functional groupings in a population of synthetic neurons with known ground-truth, as well three publicly available datasets of ensemble recordings from primate primary visual and motor cortex and the rat hippocampal CA1 region.

Download Full-text

Discovering a sparse set of pairwise discriminating features in high-dimensional data

Bioinformatics ◽

10.1093/bioinformatics/btaa690 ◽

2020 ◽

Author(s):

Samuel Melton ◽

Sharad Ramanathan

Keyword(s):

Single Cell ◽

Dimensional Space ◽

Cell Types ◽

Dimensional Subspace ◽

Supplementary Information ◽

High Dimensional ◽

Technological Advances ◽

Data Points ◽

Low Dimensional ◽

Sparse Set

Abstract Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A theory of multineuronal dimensionality, dynamics and measurement

10.1101/214262 ◽

2017 ◽

Cited By ~ 38

Author(s):

Peiran Gao ◽

Eric Trautmann ◽

Byron Yu ◽

Gopal Santhanam ◽

Stephen Ryu ◽

...

Keyword(s):

Experimental Design ◽

Large Scale ◽

Task Complexity ◽

Dimensional Space ◽

Neural Dynamics ◽

Firing Rates ◽

Control Behavior ◽

Reduction Methods ◽

Low Dimensional

AbstractIn many experiments, neuroscientists tightly control behavior, record many trials, and obtain trial-averaged firing rates from hundreds of neurons in circuits containing billions of behaviorally relevant neurons. Di-mensionality reduction methods reveal a striking simplicity underlying such multi-neuronal data: they can be reduced to a low-dimensional space, and the resulting neural trajectories in this space yield a remarkably insightful dynamical portrait of circuit computation. This simplicity raises profound and timely conceptual questions. What are its origins and its implications for the complexity of neural dynamics? How would the situation change if we recorded more neurons? When, if at all, can we trust dynamical portraits obtained from measuring an infinitesimal fraction of task relevant neurons? We present a theory that answers these questions, and test it using physiological recordings from reaching monkeys. This theory reveals conceptual insights into how task complexity governs both neural dimensionality and accurate recovery of dynamic portraits, thereby providing quantitative guidelines for future large-scale experimental design.

Download Full-text

A learned embedding for efficient joint analysis of millions of mass spectra

10.1101/483263 ◽

2018 ◽

Cited By ~ 4

Author(s):

Damon H. May ◽

Jeffrey Bilmes ◽

William S. Noble

Keyword(s):

Mass Spectra ◽

Large Scale ◽

Dimensional Space ◽

Software Implementation ◽

Mass Spectrometry Data ◽

Joint Analysis ◽

Clustering Methods ◽

Peptide Mass ◽

Public Repositories ◽

Low Dimensional

AbstractDespite an explosion of data in public repositories, peptide mass spectra are usually analyzed by each laboratory in isolation, treating each experiment as if it has no relationship to any others. This approach fails to exploit the wealth of existing, previously analyzed mass spectrometry data. Others have jointly analyzed many mass spectra, often using clustering. However, mass spectra are not necessarily best summarized as clusters, and although new spectra can be added to existing clusters, clustering methods previously applied to mass spectra do not allow new clusters to be defined without completely re-clustering. As an alternative, we propose to train a deep neural network, called “GLEAMS,” to learn an embedding of spectra into a low-dimensional space in which spectra generated by the same peptide are close to one another. We demonstrate empirically the utility of this learned embedding by propagating annotations from labeled to unlabeled spectra. We further use GLEAMS to detect groups of unidentified, proximal spectra representing the same peptide, and we show how to use these spectral communities to reveal misidentified spectra and to characterize frequently observed but consistently unidentified molecular species. We provide a software implementation of our approach, along with a tool to quickly embed additional spectra using a pre-trained model, to facilitate large-scale analyses.

Download Full-text

Social Media-based User Embedding: A Literature Review

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/881 ◽

2019 ◽

Author(s):

Shimei Pan ◽

Tao Ding

Keyword(s):

Social Media ◽

High Performance ◽

Large Scale ◽

Ground Truth ◽

Representation Learning ◽

Success Stories ◽

Recent Success ◽

User Data ◽

Low Dimensional ◽

And Behavior

Automated representation learning is behind many recent success stories in machine learning. It is often used to transfer knowledge learned from a large dataset (e.g., raw text) to tasks for which only a small number of training examples are available. In this paper, we review recent advance in learning to represent social media users in low-dimensional embeddings. The technology is critical for creating high performance social media-based human traits and behavior models since the ground truth for assessing latent human traits and behavior is often expensive to acquire at a large scale. In this survey, we review typical methods for learning a unified user embeddings from heterogeneous user data (e.g., combines social media texts with images to learn a unified user representation). Finally we point out some current issues and future directions.

Download Full-text

Context-Aware Content Generation for Virtual Environments

Volume 1B: 36th Computers and Information in Engineering Conference ◽

10.1115/detc2016-59997 ◽

2016 ◽

Author(s):

Andrew Brock ◽

Theodore Lim ◽

J. M. Ritchie ◽

Nick Weston

Keyword(s):

Large Scale ◽

Dimensional Space ◽

Context Aware ◽

3 Dimensional ◽

Latent Space ◽

Variational Autoencoder ◽

Computationally Intensive ◽

Expert Input ◽

Low Dimensional ◽

Content Generation

Large scale scene generation is a computationally intensive operation, and added complexities arise when dynamic content generation is required. We propose a system capable of generating virtual content from non-expert input. The proposed system uses a 3-dimensional variational autoencoder to interactively generate new virtual objects by interpolating between extant objects in a learned low-dimensional space, as well as by randomly sampling in that space. We present an interface that allows a user to intuitively explore the latent manifold, taking advantage of the network’s ability to perform algebra in the latent space to help infer context and generalize to previously unseen inputs.

Download Full-text

MOFA+: a probabilistic framework for comprehensive integration of structured single-cell data

10.1101/837104 ◽

2019 ◽

Cited By ~ 8

Author(s):

Ricard Argelaguet ◽

Damien Arnol ◽

Danila Bredikhin ◽

Yonatan Deloro ◽

Britta Velten ◽

...

Keyword(s):

Factor Analysis ◽

Single Cell ◽

Cell Fate ◽

Joint Analysis ◽

Experimental Conditions ◽

Joint Modelling ◽

Stochastic Variational Inference ◽

Technological Advances ◽

Low Dimensional ◽

Cell Data

AbstractTechnological advances have enabled the joint analysis of multiple molecular layers at single cell resolution. At the same time, increased experimental throughput has facilitated the study of larger numbers of experimental conditions. While methods for analysing single-cell data that model the resulting structure of either of these dimensions are beginning to emerge, current methods do not account for complex experimental designs that include both multiple views (modalities or assays) and groups (conditions or experiments). Here we present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of structured single cell multi-modal data. MOFA+ builds upon a Bayesian Factor Analysis framework combined with fast GPU-accelerated stochastic variational inference. Similar to existing factor models, MOFA+ allows for interpreting variation in single-cell datasets by pooling information across cells and features to reconstruct a low-dimensional representation of the data. Uniquely, the model supports flexible group-level sparsity constraints that allow joint modelling of variation across multiple groups and views.To illustrate MOFA+, we applied it to single-cell data sets of different scales and designs, demonstrating practical advantages when analyzing datasets with complex group and/or view structure. In a multi-omics analysis of mouse gastrulation this joint modelling reveals coordinated changes between gene expression and epigenetic variation associated with cell fate commitment.

Download Full-text

Physical model of the genotype-to-phenotype map of proteins

10.1101/069039 ◽

2016 ◽

Author(s):

Tsvi Tlusty ◽

Albert Libchaber ◽

Jean-Pierre Eckmann

Keyword(s):

Shear Band ◽

Physical Model ◽

Protein Function ◽

Large Scale ◽

Dimensional Space ◽

Binary Sequences ◽

Basic Question ◽

Mechanical Basis ◽

Low Dimensional ◽

Scale Motion

How DNA is mapped to functional proteins is a basic question of living matter. We introduce and study a physical model of protein evolution which suggests a mechanical basis for this map. Many proteins rely on large-scale motion to function. We therefore treat protein as learning amorphous matter that evolves towards such a mechanical function: Genes are binary sequences that encode the connectivity of the amino acid network that makes a protein. The gene is evolved until the network forms a shear band across the protein, which allows for long-range, soft modes required for protein function. The evolution reduces the high-dimensional sequence space to a low-dimensional space of mechanical modes, in accord with the observed dimensional reduction between genotype and phenotype of proteins. Spectral analysis of the space of 106 solutions shows a strong correspondence between localization around the shear band of both mechanical modes and the sequence structure. Specifically, our model shows how mutations are correlated among amino acids whose interactions determine the functional mode.PACS numbers: 87.14.E-, 87.15.-v, 87.10.-e

Download Full-text

Interpretable, Scalable, and Transferrable Functional Projection of Large-Scale Transcriptome Data Using Constrained Matrix Decomposition

Frontiers in Genetics ◽

10.3389/fgene.2021.719099 ◽

2021 ◽

Vol 12 ◽

Author(s):

Nicholas Panchy ◽

Kazuhide Watanabe ◽

Tian Hong

Keyword(s):

Large Scale ◽

Epithelial Mesenchymal Transition ◽

Single Cells ◽

Matrix Decomposition ◽

Biological Processes ◽

Transcriptome Data ◽

Experimental Conditions ◽

Mesenchymal Transition ◽

Gene Sets ◽

Low Dimensional

Large-scale transcriptome data, such as single-cell RNA-sequencing data, have provided unprecedented resources for studying biological processes at the systems level. Numerous dimensionality reduction methods have been developed to visualize and analyze these transcriptome data. In addition, several existing methods allow inference of functional variations among samples using gene sets with known biological functions. However, it remains challenging to analyze transcriptomes with reduced dimensions that are interpretable in terms of dimensions’ directionalities, transferrable to new data, and directly expose the contribution or association of individual genes. In this study, we used gene set non-negative principal component analysis (gsPCA) and non-negative matrix factorization (gsNMF) to analyze large-scale transcriptome datasets. We found that these methods provide low-dimensional information about the progression of biological processes in a quantitative manner, and their performances are comparable to existing functional variation analysis methods in terms of distinguishing multiple cell states and samples from multiple conditions. Remarkably, upon training with a subset of data, these methods allow predictions of locations in the functional space using data from experimental conditions that are not exposed to the models. Specifically, our models predicted the extent of progression and reversion for cells in the epithelial-mesenchymal transition (EMT) continuum. These methods revealed conserved EMT program among multiple types of single cells and tumor samples. Finally, we demonstrate this approach is broadly applicable to data and gene sets beyond EMT and provide several recommendations on the choice between the two linear methods and the optimal algorithmic parameters. Our methods show that simple constrained matrix decomposition can produce to low-dimensional information in functionally interpretable and transferrable space, and can be widely useful for analyzing large-scale transcriptome data.

Download Full-text

Community Detection Based on DeepWalk Model in Large-Scale Networks

Security and Communication Networks ◽

10.1155/2020/8845942 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yunfang Chen ◽

Li Wang ◽

Dehao Qi ◽

Tinghuai Ma ◽

Wei Zhang

Keyword(s):

Community Structure ◽

Community Detection ◽

Large Scale ◽

Dimensional Space ◽

Complex Structure ◽

Gaussian Mixture ◽

Detection Algorithm ◽

Detection Methods ◽

Large Scale Networks ◽

Low Dimensional

The large-scale and complex structure of real networks brings enormous challenges to traditional community detection methods. In order to detect community structure in large-scale networks more accurately and efficiently, we propose a community detection algorithm based on the network embedding representation method. Firstly, in order to solve the scarce problem of network data, this paper uses the DeepWalk model to embed a high-dimensional network into low-dimensional space with topology information. Then, low-dimensional data are processed, with each node treated as a sample and each dimension of the node as a feature. Finally, samples are fed into a Gaussian mixture model (GMM), and in order to automatically learn the number of communities, variational inference is introduced into GMM. Experimental results on the DBLP dataset show that the model method of this paper can more effectively discover the communities in large-scale networks. By further analyzing the excavated community structure, the organizational characteristics within the community are better revealed.

Download Full-text

Factor-Based Framework for Multivariate and Multi-step-ahead Forecasting of Large Scale Time Series

Frontiers in Big Data ◽

10.3389/fdata.2021.690267 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jacopo De Stefani ◽

Gianluca Bontempi

Keyword(s):

Large Scale ◽

Factor Model ◽

Data Availability ◽

Dynamic Factor ◽

Model Driven ◽

Technological Advances ◽

Non Linear ◽

Low Dimensional ◽

Computational Resources ◽

Multivariate Forecasting

State-of-the-art multivariate forecasting methods are restricted to low dimensional tasks, linear dependencies and short horizons. The technological advances (notably the Big data revolution) are instead shifting the focus to problems characterized by a large number of variables, non-linear dependencies and long forecasting horizons. In the last few years, the majority of the best performing techniques for multivariate forecasting have been based on deep-learning models. However, such models are characterized by high requirements in terms of data availability and computational resources and suffer from a lack of interpretability. To cope with the limitations of these methods, we propose an extension to the DFML framework, a hybrid forecasting technique inspired by the Dynamic Factor Model (DFM) approach, a successful forecasting methodology in econometrics. This extension improves the capabilities of the DFM approach, by implementing and assessing both linear and non-linear factor estimation techniques as well as model-driven and data-driven factor forecasting techniques. We assess several method integrations within the DFML, and we show that the proposed technique provides competitive results both in terms of forecasting accuracy and computational efficiency on multiple very large-scale (>102 variables and > 103 samples) real forecasting tasks.

Download Full-text