Supervised Dynamic Topic Models for Associative Topic Extraction with A Numerical Time Series

Author(s):  
Sungrae Park ◽  
Wonsung Lee ◽  
Il-Chul Moon
2016 ◽  
Vol 43 (1) ◽  
pp. 88-102 ◽  
Author(s):  
Sergey I. Nikolenko ◽  
Sergei Koltcov ◽  
Olessia Koltsova

Qualitative studies, such as sociological research, opinion analysis and media studies, can benefit greatly from automated topic mining provided by topic models such as latent Dirichlet allocation (LDA). However, examples of qualitative studies that employ topic modelling as a tool are currently few and far between. In this work, we identify two important problems along the way to using topic models in qualitative studies: lack of a good quality metric that closely matches human judgement in understanding topics and the need to indicate specific subtopics that a specific qualitative study may be most interested in mining. For the first problem, we propose a new quality metric, tf-idf coherence, that reflects human judgement more accurately than regular coherence, and conduct an experiment to verify this claim. For the second problem, we propose an interval semi-supervised approach (ISLDA) where certain predefined sets of keywords (that define the topics researchers are interested in) are restricted to specific intervals of topic assignments. Our experiments show that ISLDA is better for topic extraction than LDA in terms of tf-idf coherence, number of topics identified to predefined keywords and topic stability. We also present a case study on a Russian LiveJournal dataset aimed at ethnicity discourse analysis.


2015 ◽  
Vol 51 (5) ◽  
pp. 737-755 ◽  
Author(s):  
Sungrae Park ◽  
Wonsung Lee ◽  
Il-Chul Moon
Keyword(s):  

Author(s):  
Minor Eduardo Quesada Grosso ◽  
Edgar Casasola Murillo ◽  
Jorge Antonio Leoni de León

Abstract: Mining and exploitation of data in social networks has been the focus of many efforts, but despite the resources and energy invested, still remains a lot for doing given its complexity, which requires the adoption of a multidisciplinary approach.Specifically, on what concerns to this research, the content of the texts published regularly, and at a very rapid pace, at sites of microblogs (eg Twitter.com) can be used to analyze global and local trends. These trends are marked by microblogs emerging topics that are distinguished from others by a sudden and accelerated rate of posts related to the same topic; in other words, by an increment of popularity in relatively short periods, a day or a few hours, for example Wanner et al.The problem, then, is twofold, first to extract the topics, then to identify which of those topics are trending. A recent solution, known as Bursty Biterm Topic Model (BBTM) is an algorithm for identifying trending topics, with a good level of performance in Twitter, but it requires great amount of computer processing. Hence, this research aims to evaluate if it is possible to reduce the amount of processing required and getting equally good results. This reduction carry out by a discrimination of co-occurrences of words (biterms) used by BBTM to model trending topics. In contrast to our previous work, in this research, we carry on a more complete and exhaustive set of experiments.  Spanish Abstract: La minería y explotación de datos contenidos en las redes sociales ha sido foco de múltiples esfuerzos. Sin embargo, a pesar de los recursos y energía invertidos aún queda mucho por hacer dada su complejidad. Específicamente, esta investigación se centra en el contenido de los textos publicados regularmente, en los sitios de microblogs (por ejemplo, en Twitter.com) los cuales pueden ser utilizados para analizar tendencias. Estas ultimas son marcadas por temas emergentes que se distinguen de los demás por un súbito y acelerado aumento de publicaciones relacionadas al mismo tema; en otras palabras, por un incremento de popularidad en periodos relativamente cortos, de un día o de unas cuantas horas. En consecuencia, el problema es doble, primero extraer los temas sobre los cuáles se escribe y luego identificar cuáles de esos temas son tendencia. Una solución reciente, conocida como Bursty Biterm Topic Model (BBTM) es un algoritmo que utiliza coocurrencia de palabras (bitérminos) para identificación de temas emergentes y que cuenta con un buen nivel de resultados en Twitter. Sin embargo, su complejidad computacional es alta y requiere de una considerable cantidad de procesamiento computacional. De ahí, que esta investigación busca evaluar si es posible reducir la cantidad de procesamiento requerido y obtener resultados cuya calidad sean igualmente buenos. Esta reducción es llevada a cabo por una discriminación de las coocurrencias de palabras (bitérminos) utilizadas por BBTM para modelar los temas emergentes. En contraste al trabajo realizado previamente, en esta investigación, se llevan a cabo experimentos más completos y exhaustivos. 


2018 ◽  
Vol 44 (4) ◽  
pp. 719-754 ◽  
Author(s):  
Jing Li ◽  
Yan Song ◽  
Zhongyu Wei ◽  
Kam-Fai Wong

Conventional topic models are ineffective for topic extraction from microblog messages, because the data sparseness exhibited in short messages lacking structure and contexts results in poor message-level word co-occurrence patterns. To address this issue, we organize microblog messages as conversation trees based on their reposting and replying relations, and propose an unsupervised model that jointly learns word distributions to represent: (1) different roles of conversational discourse, and (2) various latent topics in reflecting content information. By explicitly distinguishing the probabilities of messages with varying discourse roles in containing topical words, our model is able to discover clusters of discourse words that are indicative of topical content. In an automatic evaluation on large-scale microblog corpora, our joint model yields topics with better coherence scores than competitive topic models from previous studies. Qualitative analysis on model outputs indicates that our model induces meaningful representations for both discourse and topics. We further present an empirical study on microblog summarization based on the outputs of our joint model. The results show that the jointly modeled discourse and topic representations can effectively indicate summary-worthy content in microblog conversations.


1994 ◽  
Vol 144 ◽  
pp. 279-282
Author(s):  
A. Antalová

AbstractThe occurrence of LDE-type flares in the last three cycles has been investigated. The Fourier analysis spectrum was calculated for the time series of the LDE-type flare occurrence during the 20-th, the 21-st and the rising part of the 22-nd cycle. LDE-type flares (Long Duration Events in SXR) are associated with the interplanetary protons (SEP and STIP as well), energized coronal archs and radio type IV emission. Generally, in all the cycles considered, LDE-type flares mainly originated during a 6-year interval of the respective cycle (2 years before and 4 years after the sunspot cycle maximum). The following significant periodicities were found:• in the 20-th cycle: 1.4, 2.1, 2.9, 4.0, 10.7 and 54.2 of month,• in the 21-st cycle: 1.2, 1.6, 2.8, 4.9, 7.8 and 44.5 of month,• in the 22-nd cycle, till March 1992: 1.4, 1.8, 2.4, 7.2, 8.7, 11.8 and 29.1 of month,• in all interval (1969-1992):a)the longer periodicities: 232.1, 121.1 (the dominant at 10.1 of year), 80.7, 61.9 and 25.6 of month,b)the shorter periodicities: 4.7, 5.0, 6.8, 7.9, 9.1, 15.8 and 20.4 of month.Fourier analysis of the LDE-type flare index (FI) yields significant peaks at 2.3 - 2.9 months and 4.2 - 4.9 months. These short periodicities correspond remarkably in the all three last solar cycles. The larger periodicities are different in respective cycles.


Sign in / Sign up

Export Citation Format

Share Document