YASS: Yet Another Spike Sorter

AbstractSpike sorting is a critical first step in extracting neural signals from large-scale electrophysiological data. This manuscript describes an efficient, reliable pipeline for spike sorting on dense multi-electrode arrays (MEAs), where neural signals appear across many electrodes and spike sorting currently represents a major computational bottleneck. We present several new techniques that make dense MEA spike sorting more robust and scalable. Our pipeline is based on an efficient multi-stage “triage-then-cluster-then-pursuit” approach that initially extracts only clean, high-quality waveforms from the electrophysiological time series by temporarily skipping noisy or “collided” events (representing two neurons firing synchronously). This is accomplished by developing a neural network detection method followed by efficient outlier triaging. The clean waveforms are then used to infer the set of neural spike waveform templates through nonparametric Bayesian clustering. Our clustering approach adapts a “coreset” approach for data reduction and uses efficient inference methods in a Dirichlet process mixture model framework to dramatically improve the scalability and reliability of the entire pipeline. The “triaged” waveforms are then finally recovered with matching-pursuit deconvolution techniques. The proposed methods improve on the state-of-the-art in terms of accuracy and stability on both real and biophysically-realistic simulated MEA data. Furthermore, the proposed pipeline is efficient, learning templates and clustering much faster than real-time for a ≃ 500-electrode dataset, using primarily a single CPU core.

Download Full-text

YASS: Yet Another Spike Sorter applied to large-scale multi-electrode array recordings in primate retina

10.1101/2020.03.18.997924 ◽

2020 ◽

Cited By ~ 2

Author(s):

JinHyung Lee ◽

Catalin Mitelut ◽

Hooshmand Shokri ◽

Ian Kinsella ◽

Nishchal Dethe ◽

...

Keyword(s):

Large Scale ◽

Matching Pursuit ◽

Receptive Fields ◽

Electrode Array ◽

Divide And Conquer ◽

Spike Sorting ◽

Denoising Method ◽

Multi Stage ◽

Split And Merge ◽

Multi Electrode Array

AbstractSpike sorting is a critical first step in extracting neural signals from large-scale multi-electrode array (MEA) data. This manuscript presents several new techniques that make MEA spike sorting more robust and accurate. Our pipeline is based on an efficient multi-stage “triage-then-cluster-then-pursuit” approach that initially extracts only clean, high-quality waveforms from the electrophysiological time series by temporarily skipping noisy or “collided” events (representing two neurons firing synchronously). This is accomplished by developing a neural network detection and denoising method followed by efficient outlier triaging. The denoised spike waveforms are then used to infer the set of spike templates through nonparametric Bayesian clustering. We use a divide-and-conquer strategy to parallelize this clustering step. Finally, we recover collided waveforms with matching-pursuit deconvolution techniques, and perform further split-and-merge steps to estimate additional templates from the pool of recovered waveforms. We apply the new pipeline to data recorded in the primate retina, where high firing rates and highly-overlapping axonal units provide a challenging testbed for the deconvolution approach; in addition, the well-defined mosaic structure of receptive fields in this preparation provides a useful quality check on any spike sorting pipeline. We show that our pipeline improves on the state-of-the-art in spike sorting (and outperforms manual sorting) on both real and semi-simulated MEA data with > 500 electrodes; open source code can be found at https://github.com/paninski-lab/yass.

Download Full-text

A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large-scale data

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2015.19 ◽

2015 ◽

Vol 4 ◽

Cited By ~ 1

Author(s):

Naohiro Tawara ◽

Tetsuji Ogawa ◽

Shinji Watanabe ◽

Atsushi Nakamura ◽

Tetsunori Kobayashi

Keyword(s):

Mixture Model ◽

Dirichlet Process ◽

Large Scale ◽

Dirichlet Process Mixture ◽

Agglomerative Clustering ◽

Speaker Clustering ◽

Dirichlet Process Mixture Model ◽

Large Scale Data ◽

Hierarchical Agglomerative Clustering ◽

Scale Data

An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.

Download Full-text

Fast and accurate spike sorting in vitro and in vivo for up to thousands of electrodes

10.1101/067843 ◽

2016 ◽

Cited By ~ 22

Author(s):

Pierre Yger ◽

Giulia L.B. Spampinato ◽

Elric Esposito ◽

Baptiste Lefebvre ◽

Stéphane Deny ◽

...

Keyword(s):

Template Matching ◽

Large Scale ◽

Ground Truth ◽

Spike Sorting ◽

Electrode Arrays ◽

Ground Truth Data ◽

Sorting Problem ◽

Large Populations

AbstractUnderstanding how assemblies of neurons encode information requires recording large populations of cells in the brain. In recent years, multi-electrode arrays and large silicon probes have been developed to record simultaneously from hundreds or thousands of electrodes packed with a high density. However, these new devices challenge the classical way to do spike sorting. Here we developed a new method to solve these issues, based on a highly automated algorithm to extract spikes from extracellular data, and show that this algorithm reached near optimal performance both in vitro and in vivo. The algorithm is composed of two main steps: 1) a “template-finding” phase to extract the cell templates, i.e. the pattern of activity evoked over many electrodes when one neuron fires an action potential; 2) a “template-matching” phase where the templates were matched to the raw data to find the location of the spikes. The manual intervention by the user was reduced to the minimal, and the time spent on manual curation did not scale with the number of electrodes. We tested our algorithm with large-scale data from in vitro and in vivo recordings, from 32 to 4225 electrodes. We performed simultaneous extracellular and patch recordings to obtain “ground truth” data, i.e. cases where the solution to the sorting problem is at least partially known. The performance of our algorithm was always close to the best expected performance. We thus provide a general solution to sort spikes from large-scale extracellular recordings.

Download Full-text

Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels

10.1101/061481 ◽

2016 ◽

Cited By ~ 115

Author(s):

Marius Pachitariu ◽

Nicholas Steinmetz ◽

Shabnam Kadir ◽

Matteo Carandini ◽

Harris Kenneth D.

Keyword(s):

Large Scale ◽

Superior Performance ◽

Spike Sorting ◽

Electrode Arrays ◽

Manual Curation ◽

Electrophysiological Recordings ◽

Dimensional Approximation ◽

Low Dimensional ◽

Channel Electrode

AbstractAdvances in silicon probe technology mean that in vivo electrophysiological recordings from hundreds of channels will soon become commonplace. To interpret these recordings we need fast, scalable and accurate methods for spike sorting, whose output requires minimal time for manual curation. Here we introduce Kilosort, a spike sorting framework that meets these criteria, and show that it allows rapid and accurate sorting of large-scale in vivo data. Kilosort models the recorded voltage as a sum of template waveforms triggered on the spike times, allowing overlapping spikes to be identified and resolved. Rapid processing is achieved thanks to a novel low-dimensional approximation for the spatiotemporal distribution of each template, and to batch-based optimization on GPUs. A novel post-clustering merging step based on the continuity of the templates substantially reduces the requirement for subsequent manual curation operations. We compare Kilosort to an established algorithm on data obtained from 384-channel electrodes, and show superior performance, at much reduced processing times. Data from 384-channel electrode arrays can be processed in approximately realtime. Kilosort is an important step towards fully automated spike sorting of multichannel electrode recordings, and is freely available (github.com/cortex-lab/Kilosort).

Download Full-text

Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes

Biostatistics ◽

10.1093/biostatistics/kxaa035 ◽

2020 ◽

Author(s):

Yize Zhao ◽

Tengfei Li ◽

Hongtu Zhu

Keyword(s):

Dirichlet Process ◽

Complex Traits ◽

Large Scale ◽

Imaging Genetics ◽

High Dimensional ◽

Dirichlet Process Mixture ◽

Heritability Estimation ◽

Heritability Analysis ◽

Hierarchical Selection ◽

Phenotypic Methods

Summary Heritability analysis plays a central role in quantitative genetics to describe genetic contribution to human complex traits and prioritize downstream analyses under large-scale phenotypes. Existing works largely focus on modeling single phenotype and currently available multivariate phenotypic methods often suffer from scaling and interpretation. In this article, motivated by understanding how genetic underpinning impacts human brain variation, we develop an integrative Bayesian heritability analysis to jointly estimate heritabilities for high-dimensional neuroimaging traits. To induce sparsity and incorporate brain anatomical configuration, we impose hierarchical selection among both regional and local measurements based on brain structural network and voxel dependence. We also use a nonparametric Dirichlet process mixture model to realize grouping among single nucleotide polymorphism-associated phenotypic variations, providing biological plausibility. Through extensive simulations, we show the proposed method outperforms existing ones in heritability estimation and heritable traits selection under various scenarios. We finally apply the method to two large-scale imaging genetics datasets: the Alzheimer’s Disease Neuroimaging Initiative and United Kingdom Biobank and show biologically meaningful results.

Download Full-text

Dynamic Dirichlet process mixture model for identifying voting coalitions in the United Nations General Assembly human rights roll call votes

Journal of Applied Statistics ◽

10.1080/02664763.2021.1931820 ◽

2021 ◽

pp. 1-20

Author(s):

Qiushi Yu

Keyword(s):

Human Rights ◽

United Nations ◽

Mixture Model ◽

Dirichlet Process ◽

General Assembly ◽

Roll Call ◽

Dirichlet Process Mixture ◽

Dirichlet Process Mixture Model ◽

United Nations General ◽

United Nations General Assembly

Download Full-text

Large-scale phase retrieval method for wavefront reconstruction with multi-stage random phase modulation

Optics Communications ◽

10.1016/j.optcom.2021.127115 ◽

2021 ◽

pp. 127115

Author(s):

Lei Zhao ◽

Kaiwei Wang ◽

Jian Bai

Keyword(s):

Phase Modulation ◽

Large Scale ◽

Phase Retrieval ◽

Random Phase ◽

Retrieval Method ◽

Wavefront Reconstruction ◽

Multi Stage

Download Full-text

Optimized permutation testing for information theoretic measures of multi-gene interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04107-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

James M. Kunert-Graf ◽

Nikita A. Sakhanenko ◽

David J. Galas

Keyword(s):

Large Scale ◽

Permutation Test ◽

Association Studies ◽

Genome Wide Association Studies ◽

Permutation Testing ◽

Exact Test ◽

Information Theoretic ◽

Information Theoretic Measures ◽

Full Analysis ◽

Computational Bottleneck

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.

Download Full-text