hierarchical dirichlet process
Recently Published Documents


TOTAL DOCUMENTS

101
(FIVE YEARS 21)

H-INDEX

12
(FIVE YEARS 1)

Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3127
Author(s):  
Federico Bassetti ◽  
Lucia Ladelli

We introduce mixtures of species sampling sequences (mSSS) and discuss how these sequences are related to various types of Bayesian models. As a particular case, we recover species sampling sequences with general (not necessarily diffuse) base measures. These models include some “spike-and-slab” non-parametric priors recently introduced to provide sparsity. Furthermore, we show how mSSS arise while considering hierarchical species sampling random probabilities (e.g., the hierarchical Dirichlet process). Extending previous results, we prove that mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. Using this representation, we give an explicit expression of the Exchangeable Partition Probability Function of the partition generated by an mSSS. Some special cases are discussed in detail—in particular, species sampling sequences with general base measures and a mixture of species sampling sequences with Gibbs-type latent partition. Finally, we give explicit expressions of the predictive distributions of an mSSS.


2021 ◽  
Author(s):  
Leonardo H. Rocha ◽  
Daniel Welter ◽  
Denio Duarte

Abordagens probabilísticas de tópicos são ferramentas para descobrir e explorar estruturas temáticas escondidas em coleções de textos. Dada uma coleção de documentos, a tarefa de extrair os tópicos consiste em criar um vocabulário a partir da coleção, verificar a probabilidade de cada palavra pertencer a um documento da coleção. Em seguida, baseado no número de tópicos desejado, a probabilidade de cada palavra estar associada a um determinado tópico é contabilizada. Assim, um tópico é um conjunto de palavras ordenadas pela probabilidade de estar associada ao tópico. Várias abordagens são encontradas na literatura para criação de modelos de tópicos, e.g., Hierarchical Dirichlet Process (HDP), Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF) e Dirichlet-multinomial Regression (DMR). Este trabalho procura identificar a qualidade dos tópicos construídos pelas quatro abordagens citadas. A Qualidade será medida por métricas de coerência e todas as abordagens terão a mesma coleção de documentos como entrada: notícias de websites dos jornais Breibart, Business Insider, The Atlantic, CNN e New York Times contendo 50.000 artigos. Os resultados mostram que DMR e LDA são os melhores modelos para extrair tópicos da coleção utilizada.


2021 ◽  
Vol 11 (14) ◽  
pp. 6603
Author(s):  
Monika Tanwar ◽  
Hyunseok Park ◽  
Nagarajan Raghavan

In this study, we present a state-based diagnostic and prognostic methodology for lubricating oil degradation based on a nonparametric Bayesian approach, i.e., sticky hierarchical Dirichlet process–hidden Markov model (HDP-HMM). An accurate health state-space assessment for diagnostics and prognostics has always been unobservable and hypothetical in the past. The lubrication condition monitoring (LCM) data is generally segregated as “healthy or unhealthy”, representing a binary state-based perspective to the problem. This two-state performance-based formulation poses limitations to the precision and accuracy of the diagnosis and prognosis for real data wherein there may be multiple states of discrete performance that are characteristic of the system functionality. In particular, the reversible and nonlinear time-series trends of degradation data increase the complexity of state-based modeling. We propose a multistate diagnostic and prognostic framework for LCM data in the wear-out phase (i.e., the unhealthy portion of degradation data), accounting for irregular oil replenishment and oil change effects (i.e., nonlinearity in the degradation signal). The LCM data is simulated for an elementary mechanical system with four components. The sticky HDP sets the prior for the HMM parameters. The unsupervised learning over infinite observations and emission reveals four discrete health states and helps estimate the associated state transition probabilities. The inferred state sequence provides information relating to the state dynamics, which provides further guidance to maintenance decision making. The decision making is further backed by prognostics based on the conditional reliability function and mean residual life estimation.


2020 ◽  
Author(s):  
Shai He ◽  
Aaron Schein ◽  
Vishal Sarsani ◽  
Patrick Flaherty

There are distinguishing features or “hallmarks” of cancer that are found across tumors, individuals, and types of cancer, and these hallmarks can be driven by specific genetic mutations. Yet, within a single tumor there is often extensive genetic heterogeneity as evidenced by single-cell and bulk DNA sequencing data. The goal of this work is to jointly infer the underlying genotypes of tumor subpopulations and the distribution of those subpopulations in individual tumors by integrating single-cell and bulk sequencing data. Understanding the genetic composition of the tumor at the time of treatment is important in the personalized design of targeted therapeutic combinations and monitoring for possible recurrence after treatment.We propose a hierarchical Dirichlet process mixture model that incorporates the correlation structure induced by a structured sampling arrangement and we show that this model improves the quality of inference. We develop a representation of the hierarchical Dirichlet process prior as a Gamma-Poisson hierarchy and we use this representation to derive a fast Gibbs sampling inference algorithm using the augment-and-marginalize method. Experiments with simulation data show that our model outperforms standard numerical and statistical methods for decomposing admixed count data. Analyses of real acute lymphoblastic leukemia cancer sequencing dataset shows that our model improves upon state-of-the-art bioinformatic methods. An interpretation of the results of our model on this real dataset reveals co-mutated loci across samples.


Sign in / Sign up

Export Citation Format

Share Document