scholarly journals Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery

Author(s):  
Lucas Ondel

This work investigates subspace non-parametric models for the task of learning a set of acoustic units from unlabeled speech recordings. We constrain the base-measure of a Dirichlet-Process mixture with a phonetic subspace---estimated from other source languages---to build an \emph{educated prior}, thereby forcing the learned acoustic units to resemble phones of known source languages. Two types of models are proposed: (i) the Subspace HMM (SHMM) which assumes that the phonetic subspace is the same for every language, (ii) the Hierarchical-Subspace HMM (H-SHMM) which relaxes this assumption and allows to have a language-specific subspace estimated on the unlabeled target data. These models are applied on 3 languages: English, Yoruba and Mboshi and they are compared with various competitive acoustic units discovery baselines. Experimental results show that both subspace models outperform other systems in terms of clustering quality and segmentation accuracy. Moreover, we observe that the H-SHMM provides results superior to the SHMM supporting the idea that language-specific priors are preferable to language-agnostic priors for acoustic unit discovery.

2021 ◽  
Author(s):  
Lucas Ondel

This work investigates subspace non-parametric models for the task of learning a set of acoustic units from unlabeled speech recordings. We constrain the base-measure of a Dirichlet-Process mixture with a phonetic subspace---estimated from other source languages---to build an \emph{educated prior}, thereby forcing the learned acoustic units to resemble phones of known source languages. Two types of models are proposed: (i) the Subspace HMM (SHMM) which assumes that the phonetic subspace is the same for every language, (ii) the Hierarchical-Subspace HMM (H-SHMM) which relaxes this assumption and allows to have a language-specific subspace estimated on the unlabeled target data. These models are applied on 3 languages: English, Yoruba and Mboshi and they are compared with various competitive acoustic units discovery baselines. Experimental results show that both subspace models outperform other systems in terms of clustering quality and segmentation accuracy. Moreover, we observe that the H-SHMM provides results superior to the SHMM supporting the idea that language-specific priors are preferable to language-agnostic priors for acoustic unit discovery.


Author(s):  
Nico Borgsmüller ◽  
Jose Bonet ◽  
Francesco Marass ◽  
Abel Gonzalez-Perez ◽  
Nuria Lopez-Bigas ◽  
...  

AbstractThe high resolution of single-cell DNA sequencing (scDNA-seq) offers great potential to resolve intra-tumor heterogeneity by distinguishing clonal populations based on their mutation profiles. However, the increasing size of scDNA-seq data sets and technical limitations, such as high error rates and a large proportion of missing values, complicate this task and limit the applicability of existing methods. Here we introduce BnpC, a novel non-parametric method to cluster individual cells into clones and infer their genotypes based on their noisy mutation profiles. BnpC employs a Dirichlet process mixture model coupled with a Markov chain Monte Carlo sampling scheme, including a modified split-merge move and a novel posterior estimator to predict clones and genotypes. We benchmarked our method comprehensively against state-of-the-art methods on simulated data using various data sizes, and applied it to three cancer scDNA-seq data sets. On simulated data, BnpC compared favorably against current methods in terms of accuracy, runtime, and scalability. Its inferred genotypes were the most accurate, and it was the only method able to run and produce results on data sets with 10,000 cells. On tumor scDNA-seq data, BnpC was able to identify clonal populations missed by the original cluster analysis but supported by supplementary experimental data. With ever growing scDNA-seq data sets, scalable and accurate methods such as BnpC will become increasingly relevant, not only to resolve intra-tumor heterogeneity but also as a pre-processing step to reduce data size. BnpC is freely available under MIT license at https://github.com/cbg-ethz/BnpC.


Author(s):  
Wen Cheng ◽  
Gurdiljot Singh Gill ◽  
Tom Vo ◽  
Jiao Zhou ◽  
Taha Sakrani

The current paper presents the comprehensive analysis of a bivariate Dirichlet process mixture spatial model for estimation of pedestrian and bicycle crash counts. This study focuses on active transportation at traffic analysis zone (TAZ) level by developing a semi-parametric model that accounts for the unobserved heterogeneity by combining the strengths of bivariate specification for correlation among crash modes; spatial random effects for the impact of neighboring TAZs; and Dirichlet process mixture for random intercept. Three alternate models, one Dirichlet and two parametric, are also developed for comparison based on different criteria. Bicycle and pedestrian crashes are observed to share three influential variables: the positive correlation of K12 student enrollment; the bike-lane density; and the percentage of arterial roads. The heterogeneity error term demonstrates the presence of statistically significant correlation among the bicycle and pedestrian crashes, whereas the spatial random effect term indicates the absence of a significant correlation for the area under focus. The Dirichlet models are consistently superior to non-Dirichlet ones under all evaluation criteria. Moreover, the Dirichlet models exhibit the capability to identify latent distinct subpopulations and suggest that the normal assumption of intercept associated with traditional parametric models does not hold true for the TAZ-level crash dataset of the current study.


2018 ◽  
Vol 35 (6) ◽  
pp. 953-961 ◽  
Author(s):  
Tiehang Duan ◽  
José P Pinto ◽  
Xiaohui Xie

Abstract Motivation With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (i) the clustering quality still needs to be improved; (ii) most models need prior knowledge on number of clusters, which is not always available; (iii) there is a demand for faster computational speed. Results We propose to tackle these challenges with Parallelized Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive inference on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability and implementation Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Mehdi Ahmadian ◽  
Xubin Song

Abstract A non-parametric model for magneto-rheological (MR) dampers is presented. After discussing the merits of parametric and non-parametric models for MR dampers, the test data for a MR damper is used to develop a non-parametric model. The results of the model are compared with the test data to illustrate the accuracy of the model. The comparison shows that the non-parametric model is able to accurately predict the damper force characteristics, including the damper non-linearity and electro-magnetic saturation. It is further shown that the parametric model can be numerically solved more efficiently than the parametric models.


2008 ◽  
Vol 35 (5) ◽  
pp. 567-582 ◽  
Author(s):  
Adam J. Branscum ◽  
Timothy E. Hanson ◽  
Ian A. Gardner

Sign in / Sign up

Export Citation Format

Share Document