Partially collapsed Gibbs sampling for latent Dirichlet allocation

2019 ◽  
Vol 131 ◽  
pp. 208-218 ◽  
Author(s):  
Hongju Park ◽  
Taeyoung Park ◽  
Yung-Seop Lee
2017 ◽  
Vol 23 (2) ◽  
pp. 429-458
Author(s):  
Victor Araújo

Resumo A formação de governos multipartidários potencializa o risco de assimetria de informação entre principals e agentes, de maneira que os conflitos do gabinete sobre políticas se refletem no comportamento dos partidos no parlamento. Diversos estudos demonstram que o controle mútuo entre os partidos integrantes do gabinete é uma forma de compensar a perda de informação inerente à delegação. Enquanto a literatura costuma focar na fase de formulação das políticas, analisando os governos formados no Brasil entre 1995 e 2014, argumento que existe um conjunto mais diversificado de estratégias que permitem aos partidos escrutinar as políticas implementadas por seus parceiros de gabinete. Fazendo uso de análise de redes e técnicas quantitativas de análise de texto (método Gibbs Sampling, algoritmo bayesiano derivado do Latent Dirichlet allocation – LDA) mostro que, nas situações em que os portfólios ministeriais são distribuídos para atores com distintas preferências sobre políticas, os partidos intensificam o uso dos Requerimentos de Informação (RIC) para monitorar os ministérios e políticas que lhes interessam. A estrutura das redes de controle intragabinete varia em função da saliência dos ministérios: os partidos responsáveis pelos portfólios com maior dotação orçamentária são os atores com maior grau de centralidade nas redes de monitoramento mútuo.


Author(s):  
Bambang Subeno ◽  
Retno Kusumaningrum ◽  
Farikhin Farikhin

<span lang="EN-GB">Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.</span>


2016 ◽  
Vol 8 (4) ◽  
pp. 100-113 ◽  
Author(s):  
Xiongwen Pang ◽  
Benshuai Wan ◽  
Huifang Li ◽  
Weiwei Lin

Latent Dirichlet Allocation(LDA) is an efficient method of text mining,but applying LDA directly to Chinese micro-blog texts will not work well because micro-blogs are more social, brief, and closely related with each other. Based on LDA, this paper proposes a Micro-blog Relation LDA model (MR-LDA), which takes the relations between Chinese micro-blog documents and other Chinese micro-blog documents into consideration to help topic mining in micro-blog. The authors extend LDA in the following two points. First, they aggregate several Chinese micro-blogs as a single micro-blog document to solve the problem of short texts. Second, they model the generation process of Chinese micro-blogs more accurately by taking relationship between micro-blog documents into consideration. MR-LDA is more suitable to model Chinese micro-blog data. Gibbs sampling method is borrowed to inference the model. Experimental results on actual datasets show that MR-LDA model can offer an effective solution to text mining for Chinese micro-blog.


Author(s):  
Priyanka R. Patil ◽  
Shital A. Patil

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.


Sign in / Sign up

Export Citation Format

Share Document