scholarly journals YASS: Yet Another Spike Sorter

2017 ◽  
Author(s):  
JinHyung Lee ◽  
David Carlson ◽  
Hooshmand Shokri ◽  
Weichi Yao ◽  
Georges Goetz ◽  
...  

AbstractSpike sorting is a critical first step in extracting neural signals from large-scale electrophysiological data. This manuscript describes an efficient, reliable pipeline for spike sorting on dense multi-electrode arrays (MEAs), where neural signals appear across many electrodes and spike sorting currently represents a major computational bottleneck. We present several new techniques that make dense MEA spike sorting more robust and scalable. Our pipeline is based on an efficient multi-stage “triage-then-cluster-then-pursuit” approach that initially extracts only clean, high-quality waveforms from the electrophysiological time series by temporarily skipping noisy or “collided” events (representing two neurons firing synchronously). This is accomplished by developing a neural network detection method followed by efficient outlier triaging. The clean waveforms are then used to infer the set of neural spike waveform templates through nonparametric Bayesian clustering. Our clustering approach adapts a “coreset” approach for data reduction and uses efficient inference methods in a Dirichlet process mixture model framework to dramatically improve the scalability and reliability of the entire pipeline. The “triaged” waveforms are then finally recovered with matching-pursuit deconvolution techniques. The proposed methods improve on the state-of-the-art in terms of accuracy and stability on both real and biophysically-realistic simulated MEA data. Furthermore, the proposed pipeline is efficient, learning templates and clustering much faster than real-time for a ≃ 500-electrode dataset, using primarily a single CPU core.

Author(s):  
JinHyung Lee ◽  
Catalin Mitelut ◽  
Hooshmand Shokri ◽  
Ian Kinsella ◽  
Nishchal Dethe ◽  
...  

AbstractSpike sorting is a critical first step in extracting neural signals from large-scale multi-electrode array (MEA) data. This manuscript presents several new techniques that make MEA spike sorting more robust and accurate. Our pipeline is based on an efficient multi-stage “triage-then-cluster-then-pursuit” approach that initially extracts only clean, high-quality waveforms from the electrophysiological time series by temporarily skipping noisy or “collided” events (representing two neurons firing synchronously). This is accomplished by developing a neural network detection and denoising method followed by efficient outlier triaging. The denoised spike waveforms are then used to infer the set of spike templates through nonparametric Bayesian clustering. We use a divide-and-conquer strategy to parallelize this clustering step. Finally, we recover collided waveforms with matching-pursuit deconvolution techniques, and perform further split-and-merge steps to estimate additional templates from the pool of recovered waveforms. We apply the new pipeline to data recorded in the primate retina, where high firing rates and highly-overlapping axonal units provide a challenging testbed for the deconvolution approach; in addition, the well-defined mosaic structure of receptive fields in this preparation provides a useful quality check on any spike sorting pipeline. We show that our pipeline improves on the state-of-the-art in spike sorting (and outperforms manual sorting) on both real and semi-simulated MEA data with > 500 electrodes; open source code can be found at https://github.com/paninski-lab/yass.


Author(s):  
Naohiro Tawara ◽  
Tetsuji Ogawa ◽  
Shinji Watanabe ◽  
Atsushi Nakamura ◽  
Tetsunori Kobayashi

An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.


2016 ◽  
Author(s):  
Pierre Yger ◽  
Giulia L.B. Spampinato ◽  
Elric Esposito ◽  
Baptiste Lefebvre ◽  
Stéphane Deny ◽  
...  

AbstractUnderstanding how assemblies of neurons encode information requires recording large populations of cells in the brain. In recent years, multi-electrode arrays and large silicon probes have been developed to record simultaneously from hundreds or thousands of electrodes packed with a high density. However, these new devices challenge the classical way to do spike sorting. Here we developed a new method to solve these issues, based on a highly automated algorithm to extract spikes from extracellular data, and show that this algorithm reached near optimal performance both in vitro and in vivo. The algorithm is composed of two main steps: 1) a “template-finding” phase to extract the cell templates, i.e. the pattern of activity evoked over many electrodes when one neuron fires an action potential; 2) a “template-matching” phase where the templates were matched to the raw data to find the location of the spikes. The manual intervention by the user was reduced to the minimal, and the time spent on manual curation did not scale with the number of electrodes. We tested our algorithm with large-scale data from in vitro and in vivo recordings, from 32 to 4225 electrodes. We performed simultaneous extracellular and patch recordings to obtain “ground truth” data, i.e. cases where the solution to the sorting problem is at least partially known. The performance of our algorithm was always close to the best expected performance. We thus provide a general solution to sort spikes from large-scale extracellular recordings.


2016 ◽  
Author(s):  
Marius Pachitariu ◽  
Nicholas Steinmetz ◽  
Shabnam Kadir ◽  
Matteo Carandini ◽  
Harris Kenneth D.

AbstractAdvances in silicon probe technology mean that in vivo electrophysiological recordings from hundreds of channels will soon become commonplace. To interpret these recordings we need fast, scalable and accurate methods for spike sorting, whose output requires minimal time for manual curation. Here we introduce Kilosort, a spike sorting framework that meets these criteria, and show that it allows rapid and accurate sorting of large-scale in vivo data. Kilosort models the recorded voltage as a sum of template waveforms triggered on the spike times, allowing overlapping spikes to be identified and resolved. Rapid processing is achieved thanks to a novel low-dimensional approximation for the spatiotemporal distribution of each template, and to batch-based optimization on GPUs. A novel post-clustering merging step based on the continuity of the templates substantially reduces the requirement for subsequent manual curation operations. We compare Kilosort to an established algorithm on data obtained from 384-channel electrodes, and show superior performance, at much reduced processing times. Data from 384-channel electrode arrays can be processed in approximately realtime. Kilosort is an important step towards fully automated spike sorting of multichannel electrode recordings, and is freely available (github.com/cortex-lab/Kilosort).


Biostatistics ◽  
2020 ◽  
Author(s):  
Yize Zhao ◽  
Tengfei Li ◽  
Hongtu Zhu

Summary Heritability analysis plays a central role in quantitative genetics to describe genetic contribution to human complex traits and prioritize downstream analyses under large-scale phenotypes. Existing works largely focus on modeling single phenotype and currently available multivariate phenotypic methods often suffer from scaling and interpretation. In this article, motivated by understanding how genetic underpinning impacts human brain variation, we develop an integrative Bayesian heritability analysis to jointly estimate heritabilities for high-dimensional neuroimaging traits. To induce sparsity and incorporate brain anatomical configuration, we impose hierarchical selection among both regional and local measurements based on brain structural network and voxel dependence. We also use a nonparametric Dirichlet process mixture model to realize grouping among single nucleotide polymorphism-associated phenotypic variations, providing biological plausibility. Through extensive simulations, we show the proposed method outperforms existing ones in heritability estimation and heritable traits selection under various scenarios. We finally apply the method to two large-scale imaging genetics datasets: the Alzheimer’s Disease Neuroimaging Initiative and United Kingdom Biobank and show biologically meaningful results.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
James M. Kunert-Graf ◽  
Nikita A. Sakhanenko ◽  
David J. Galas

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.


Sign in / Sign up

Export Citation Format

Share Document