dirichlet process
Recently Published Documents


TOTAL DOCUMENTS

737
(FIVE YEARS 186)

H-INDEX

41
(FIVE YEARS 4)

Author(s):  
Sujatha Arun Kokatnoor ◽  
Balachandran Krishnan

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>


Sensors ◽  
2022 ◽  
Vol 22 (1) ◽  
pp. 388
Author(s):  
Bahman Moraffah ◽  
Antonia Papandreou-Suppappola

The paper considers the problem of tracking an unknown and time-varying number of unlabeled moving objects using multiple unordered measurements with unknown association to the objects. The proposed tracking approach integrates Bayesian nonparametric modeling with Markov chain Monte Carlo methods to estimate the parameters of each object when present in the tracking scene. In particular, we adopt the dependent Dirichlet process (DDP) to learn the multiple object state prior by exploiting inherent dynamic dependencies in the state transition using the dynamic clustering property of the DDP. Using the DDP to draw the mixing measures, Dirichlet process mixtures are used to learn and assign each measurement to its associated object identity. The Bayesian posterior to estimate the target trajectories is efficiently implemented using a Gibbs sampler inference scheme. A second tracking approach is proposed that replaces the DDP with the dependent Pitman–Yor process in order to allow for a higher flexibility in clustering. The improved tracking performance of the new approaches is demonstrated by comparison to the generalized labeled multi-Bernoulli filter.


2021 ◽  
Author(s):  
Chong Zhong ◽  
Zhihua Ma ◽  
Junshan Shen ◽  
Catherine Liu

Bayesian paradigm takes advantage of well-fitting complicated survival models and feasible computing in survival analysis owing to the superiority in tackling the complex censoring scheme, compared with the frequentist paradigm. In this chapter, we aim to display the latest tendency in Bayesian computing, in the sense of automating the posterior sampling, through a Bayesian analysis of survival modeling for multivariate survival outcomes with the complicated data structure. Motivated by relaxing the strong assumption of proportionality and the restriction of a common baseline population, we propose a generalized shared frailty model which includes both parametric and nonparametric frailty random effects to incorporate both treatment-wise and temporal variation for multiple events. We develop a survival-function version of the ANOVA dependent Dirichlet process to model the dependency among the baseline survival functions. The posterior sampling is implemented by the No-U-Turn sampler in Stan, a contemporary Bayesian computing tool, automatically. The proposed model is validated by analysis of the bladder cancer recurrences data. The estimation is consistent with existing results. Our model and Bayesian inference provide evidence that the Bayesian paradigm fosters complex modeling and feasible computing in survival analysis, and Stan relaxes the posterior inference.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3127
Author(s):  
Federico Bassetti ◽  
Lucia Ladelli

We introduce mixtures of species sampling sequences (mSSS) and discuss how these sequences are related to various types of Bayesian models. As a particular case, we recover species sampling sequences with general (not necessarily diffuse) base measures. These models include some “spike-and-slab” non-parametric priors recently introduced to provide sparsity. Furthermore, we show how mSSS arise while considering hierarchical species sampling random probabilities (e.g., the hierarchical Dirichlet process). Extending previous results, we prove that mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. Using this representation, we give an explicit expression of the Exchangeable Partition Probability Function of the partition generated by an mSSS. Some special cases are discussed in detail—in particular, species sampling sequences with general base measures and a mixture of species sampling sequences with Gibbs-type latent partition. Finally, we give explicit expressions of the predictive distributions of an mSSS.


2021 ◽  
Author(s):  
◽  
Roy Ken Costilla Monteagudo

<p>Model based approaches to cluster continuous and cross-sectional data are abundant and well established. In contrast to that, equivalent approaches for repeated ordinal data are less common and an active area of research. In this dissertation, we propose several models to cluster repeated ordinal data using finite mixtures. In doing so, we explore several ways of incorporating the correlation due to the repeated measurements while taking into account the ordinal nature of the data.   In particular, we extend the Proportional Odds model to incorporate latent random effects and latent transitional terms. These two ways of incorporating the correlation are also known as parameter and data dependent models in the time-series literature. In contrast to most of the existing literature, our aim is classification and not parameter estimation. This is, to provide flexible and parsimonious ways to estimate latent populations and classification probabilities for repeated ordinal data.   We estimate the models using Frequentist (Expectation-Maximization algorithm) and Bayesian (Markov Chain Monte Carlo) inference methods and compare advantages and disadvantages of both approaches with simulated and real datasets. In order to compare models, we use several information criteria: AIC, BIC, DIC and WAIC, as well as a Bayesian Non-Parametric approach (Dirichlet Process Mixtures). With regards to the applications, we illustrate the models using self-reported health status in Australia (poor to excellent), life satisfaction in New Zealand (completely agree to completely disagree) and agreement with a reference genome of infant gut bacteria (equal, segregating and variant) from baby stool samples.</p>


2021 ◽  
Author(s):  
◽  
Roy Ken Costilla Monteagudo

<p>Model based approaches to cluster continuous and cross-sectional data are abundant and well established. In contrast to that, equivalent approaches for repeated ordinal data are less common and an active area of research. In this dissertation, we propose several models to cluster repeated ordinal data using finite mixtures. In doing so, we explore several ways of incorporating the correlation due to the repeated measurements while taking into account the ordinal nature of the data.   In particular, we extend the Proportional Odds model to incorporate latent random effects and latent transitional terms. These two ways of incorporating the correlation are also known as parameter and data dependent models in the time-series literature. In contrast to most of the existing literature, our aim is classification and not parameter estimation. This is, to provide flexible and parsimonious ways to estimate latent populations and classification probabilities for repeated ordinal data.   We estimate the models using Frequentist (Expectation-Maximization algorithm) and Bayesian (Markov Chain Monte Carlo) inference methods and compare advantages and disadvantages of both approaches with simulated and real datasets. In order to compare models, we use several information criteria: AIC, BIC, DIC and WAIC, as well as a Bayesian Non-Parametric approach (Dirichlet Process Mixtures). With regards to the applications, we illustrate the models using self-reported health status in Australia (poor to excellent), life satisfaction in New Zealand (completely agree to completely disagree) and agreement with a reference genome of infant gut bacteria (equal, segregating and variant) from baby stool samples.</p>


Sign in / Sign up

Export Citation Format

Share Document