scholarly journals The Latent Dirichlet Allocation model with covariates (LDAcov): A case study on the effect of fire on species composition in Amazonian forests

2021 ◽  
Author(s):  
Denis Valle ◽  
Gilson Shimizu ◽  
Rafael Izbicki ◽  
Leandro Maracahipes ◽  
Divino Vicente Silverio ◽  
...  
2021 ◽  
Author(s):  
Jorge Arturo Lopez

Extraction of topics from large text corpuses helps improve Software Engineering (SE) processes. Latent Dirichlet Allocation (LDA) represents one of the algorithmic tools to understand, search, exploit, and summarize a large corpus of data (documents), and it is often used to perform such analysis. However, calibration of the models is computationally expensive, especially if iterating over a large number of topics. Our goal is to create a simple formula allowing analysts to estimate the number of topics, so that the top X topics include the desired proportion of documents under study. We derived the formula from the empirical analysis of three SE-related text corpuses. We believe that practitioners can use our formula to expedite LDA analysis. The formula is also of interest to theoreticians, as it suggests that different SE text corpuses have similar underlying properties.


2017 ◽  
Vol 10 ◽  
pp. 403-421 ◽  
Author(s):  
Putu Manik Prihatini ◽  
I Ketut Gede Darma Putra ◽  
Ida Ayu Dwi Giriantari ◽  
Made Sudarma

Sign in / Sign up

Export Citation Format

Share Document