scholarly journals Trending Topic Extraction Using Topic Models and Biterm Discrimination

Author(s):  
Minor Eduardo Quesada Grosso ◽  
Edgar Casasola Murillo ◽  
Jorge Antonio Leoni de León

Abstract: Mining and exploitation of data in social networks has been the focus of many efforts, but despite the resources and energy invested, still remains a lot for doing given its complexity, which requires the adoption of a multidisciplinary approach.Specifically, on what concerns to this research, the content of the texts published regularly, and at a very rapid pace, at sites of microblogs (eg Twitter.com) can be used to analyze global and local trends. These trends are marked by microblogs emerging topics that are distinguished from others by a sudden and accelerated rate of posts related to the same topic; in other words, by an increment of popularity in relatively short periods, a day or a few hours, for example Wanner et al.The problem, then, is twofold, first to extract the topics, then to identify which of those topics are trending. A recent solution, known as Bursty Biterm Topic Model (BBTM) is an algorithm for identifying trending topics, with a good level of performance in Twitter, but it requires great amount of computer processing. Hence, this research aims to evaluate if it is possible to reduce the amount of processing required and getting equally good results. This reduction carry out by a discrimination of co-occurrences of words (biterms) used by BBTM to model trending topics. In contrast to our previous work, in this research, we carry on a more complete and exhaustive set of experiments.  Spanish Abstract: La minería y explotación de datos contenidos en las redes sociales ha sido foco de múltiples esfuerzos. Sin embargo, a pesar de los recursos y energía invertidos aún queda mucho por hacer dada su complejidad. Específicamente, esta investigación se centra en el contenido de los textos publicados regularmente, en los sitios de microblogs (por ejemplo, en Twitter.com) los cuales pueden ser utilizados para analizar tendencias. Estas ultimas son marcadas por temas emergentes que se distinguen de los demás por un súbito y acelerado aumento de publicaciones relacionadas al mismo tema; en otras palabras, por un incremento de popularidad en periodos relativamente cortos, de un día o de unas cuantas horas. En consecuencia, el problema es doble, primero extraer los temas sobre los cuáles se escribe y luego identificar cuáles de esos temas son tendencia. Una solución reciente, conocida como Bursty Biterm Topic Model (BBTM) es un algoritmo que utiliza coocurrencia de palabras (bitérminos) para identificación de temas emergentes y que cuenta con un buen nivel de resultados en Twitter. Sin embargo, su complejidad computacional es alta y requiere de una considerable cantidad de procesamiento computacional. De ahí, que esta investigación busca evaluar si es posible reducir la cantidad de procesamiento requerido y obtener resultados cuya calidad sean igualmente buenos. Esta reducción es llevada a cabo por una discriminación de las coocurrencias de palabras (bitérminos) utilizadas por BBTM para modelar los temas emergentes. En contraste al trabajo realizado previamente, en esta investigación, se llevan a cabo experimentos más completos y exhaustivos. 

2021 ◽  
Vol 11 (18) ◽  
pp. 8708
Author(s):  
Yue Niu ◽  
Hongjie Zhang ◽  
Jing Li

In recent years, short texts have become a kind of prevalent text on the internet. Due to the short length of each text, conventional topic models for short texts suffer from the sparsity of word co-occurrence information. Researchers have proposed different kinds of customized topic models for short texts by providing additional word co-occurrence information. However, these models cannot incorporate sufficient semantic word co-occurrence information and may bring additional noisy information. To address these issues, we propose a self-aggregated topic model incorporating document embeddings. Aggregating short texts into long documents according to document embeddings can provide sufficient word co-occurrence information and avoid incorporating non-semantic word co-occurrence information. However, document embeddings of short texts contain a lot of noisy information resulting from the sparsity of word co-occurrence information. So we discard noisy information by changing the document embeddings into global and local semantic information. The global semantic information is the similarity probability distribution on the entire dataset and the local semantic information is the distances of similar short texts. Then we adopt a nested Chinese restaurant process to incorporate these two kinds of information. Finally, we compare our model to several state-of-the-art models on four real-world short texts corpus. The experiment results show that our model achieves better performances in terms of topic coherence and classification accuracy.


2018 ◽  
Vol 45 (4) ◽  
pp. 554-570 ◽  
Author(s):  
Jian Jin ◽  
Qian Geng ◽  
Haikun Mou ◽  
Chong Chen

Interdisciplinary studies are becoming increasingly popular, and research domains of many experts are becoming diverse. This phenomenon brings difficulty in recommending experts to review interdisciplinary submissions. In this study, an Author–Subject–Topic (AST) model is proposed with two versions. In the model, reviewers’ subject information is embedded to analyse topic distributions of submissions and reviewers’ publications. The major difference between the AST and Author–Topic models lies in the introduction of a ‘Subject’ layer, which supervises the generation of hierarchical topics and allows sharing of subjects among authors. To evaluate the performance of the AST model, papers in Information System and Management (a typical interdisciplinary domain) in a famous Chinese academic library are investigated. Comparative experiments are conducted, which show the effectiveness of the AST model in topic distribution analysis and reviewer recommendation for interdisciplinary studies.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Yanni Liu ◽  
Dongsheng Liu ◽  
Yuwei Chen

With the rapid development of mobile Internet, the social network has become an important platform for users to receive, release, and disseminate information. In order to get more valuable information and implement effective supervision on public opinions, it is necessary to study the public opinions, sentiment tendency, and the evolution of the hot events in social networks of a smart city. In view of social networks’ characteristics such as short text, rich topics, diverse sentiments, and timeliness, this paper conducts text modeling with words co-occurrence based on the topic model. Besides, the sentiment computing and the time factor are incorporated to construct the dynamic topic-sentiment mixture model (TSTS). Then, four hot events were randomly selected from the microblog as datasets to evaluate the TSTS model in terms of topic feature extraction, sentiment analysis, and time change. The results show that the TSTS model is better than the traditional models in topic extraction and sentiment analysis. Meanwhile, by fitting the time curve of hot events, the change rules of comments in the social network is obtained.


2020 ◽  
Author(s):  
Fernando Miró-Llinares

Nowadays it is easy to find public statements about the situation of freedom of expression in different democracies questioning the exercise of this right, perhaps as a result of the political tensions to which democratic states have been subjected in recent years. In this sense, Spain does not escape from these diagnoses. Both international indicators that try to measure the situation and evolution of freedom of expression in different States and academic scholars highlight the excessive criminalization of certain speeches that end up in criminal proceedings that sentence people who make offensive expressions, mainly through social networks. However, in order to reach this diagnosis it is necessary to put together all the symptoms that would lead us to that conclusion. Therefore, in this paper I analyze two main indicators that could shed more light on the state of freedom of expression in Spain and the impact that social networks have had on it. Firstly, I analyze the legislative evolution of expression offences since 1995, to evaluate the limits of certain expressions in order to reach the conclusion that, effectively, over the years the punitive scope of what cannot be expressed has been extended, thus limiting, at least in abstract, freedom of expression. Secondly, I analyze the jurisprudential evolution of all these crimes since 1995 to show that, indeed, the proliferation of sentences from 2015 to the present shows the increase in the criminalization of expressions that are made eminently through social networks such as Twitter and Facebook. To conclude, I reflect on the possibility that the latest absolutory sentence by the Constitutional Court of the singer of the band Def con Dos César Strawberry will increase the feeling that, from now on, all expression is admissible and, therefore, will increase free expression in general and, in particular, in social networks, since, it does not seem that our legislator is willing to rectify in its steps the excessive criminalization of certain offenses. I also reflect on the need to approach freedom of expression in a more empirical way and the need to evaluate not only the limitations that the law and judicial processes impose on freedom of expression, but also the extent to which citizens in general and, in particular, users of social networks, without the need to have gone through any criminal proceedings, have stopped expressing their opinions because only in this way will it be possible to determine the state of health of our right to freedom of expression.


2020 ◽  
Vol 17 (01) ◽  
pp. 240-255
Author(s):  
Gustavo Souza Santos

  As jornadas de junho se constituíram como um movimento de insurgência disposto sobre o Brasil no período de junho de 2013, com base nas iniciativas do MPL em protesto contra a tarifa do transporte público em São Paulo, mas cujo escopo se ampliou e abarcou uma série de demandas sociais cuja origem é o âmago da sociedade brasileira na extensão e nas particularidades do território nacional. Na dinâmica dos atos, redes de comunicação alternativa e autônoma foram instrumentos de movimentação, informação e coesão das manifestações, por meio de dispositivos e redes sociais. A proposta deste estudo foi refletir as dinâmicas das Jornadas de Junho, considerando a dimensão do ciberespaço como elemento aglutinador de práticas socioespaciais e de insurgência, na busca de uma aproximação entre Geografia e ciberespaço no contexto do exame do caso em questão. Palavras-chave: Jornadas de Junho. Movimentos Sociais. Espaço. Ciberespaço. Rede.   #VEMPRARUA: journeys of a space in network ABSTRACT The June days were constituted as an insurgency movement arranged over Brazil in the period of June 2013, based on the MPL initiatives in protest against the São Paulo public transportation fare, but whose scope has been expanded and encompassed a series of social demands whose origin is the core of Brazilian society in the extension and particularities of the national territory. In the dynamic of the acts, alternative and autonomous communication networks were instruments of movement, information and cohesion of the manifestations, through devices and social networks. The purpose of this study was to reflect the dynamics of the June Conference, considering the dimension of cyberspace as an agglutinating element of socio-spatial and insurgency practices, in the search for an approximation between Geography and cyberspace in the context of the examination of the case in question. Keywords: June Jorneys. Social Movements. Space. Cyberspace. Network.   #VEMPRARUA: jornadas de un espacio en red RESUMEN Las jornadas de junio se constituyeron como un movimiento de insurgencia dispuesto sobre Brasil en el período de junio de 2013, con base en las iniciativas del MPL en protesta contra la tarifa del transporte público en São Paulo, pero cuyo alcance se amplió y abarcó una serie de las demandas sociales cuyo origen es el núcleo de la sociedad brasileña en la extensión y en las particularidades del territorio nacional. En la dinámica de los actos, redes de comunicación alternativa y autónoma fueron instrumentos de movimiento, información y cohesión de las manifestaciones, por medio de dispositivos y redes sociales. La propuesta de este estudio fue reflejar las dinámicas de las Jornadas de Junio, considerando la dimensión del ciberespacio como elemento aglutinante de prácticas socioespaciales y de insurgencia, en la búsqueda de una aproximación entre Geografía y ciberespacio en el contexto del examen del caso en cuestión. Palabras clave: Jornadas de Junio. Movimientos Sociales. Espacio. Ciberespacio. Red.


2020 ◽  
Author(s):  
Diogo Nolasco ◽  
Jonice Oliveira

The rumor detection problem on social networks has attracted considerable attention in recent years with the rise of concerns about fake news and disinformation. Most previous works focused on detecting rumors by individual messages, classifying whether a post or blog entry is considered a rumor or not. This paper proposes a method for rumor detection on topic-level that identifies whether a social topic related to a scientific topic is a rumor. We propose the use of a topic model method on social and scientific domains and correlate the topics found to detect the most prone to be rumors. Results applied in the Zika epidemic scenario show evidence that the least correlated topics contain a mix of rumors and local community discussions.


Sign in / Sign up

Export Citation Format

Share Document