topic modeling
Recently Published Documents


TOTAL DOCUMENTS

2019
(FIVE YEARS 1182)

H-INDEX

42
(FIVE YEARS 13)

2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Author(s):  
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.


Author(s):  
Pooja Kherwa ◽  
Poonam Bansal

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


2022 ◽  
Vol 9 (2) ◽  
pp. 142-148
Author(s):  
Nico Manzonelli ◽  
Taylor Brown ◽  
Antonio Avellandea-Ruiz ◽  
William Bagley ◽  
Ian Kloo

Traditionally, a significant part of assessing information operations (IO) relies on subject matter experts’ time- intensive study of publicly available information (PAI). Now, with massive amounts PAI made available via the Internet, analysts are faced with the challenge of effectively leveraging massive quantities of PAI to draw meaningful conclusions. This paper presents an automated method for collecting and analyzing large amounts of PAI from China that could better inform assessments of IO campaigns. We implement a multi-model system that involves data acquisition via web scraping and analysis using natural language processing (NLP) techniques with a focus on topic modeling and sentiment analysis. After conducting a case study on China’s current relationship with Taiwan and comparing the results to validated research by a subject matter expert, it is clear that our methodology is valuable for drawing general conclusions and pinpointing important dialogue over a massive amount of PAI.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Krishnadas Nanath ◽  
Supriya Kaitheri ◽  
Sonia Malik ◽  
Shahid Mustafa

Purpose The purpose of this paper is to examine the factors that significantly affect the prediction of fake news from the virality theory perspective. The paper looks at a mix of emotion-driven content, sentimental resonance, topic modeling and linguistic features of news articles to predict the probability of fake news. Design/methodology/approach A data set of over 12,000 articles was chosen to develop a model for fake news detection. Machine learning algorithms and natural language processing techniques were used to handle big data with efficiency. Lexicon-based emotion analysis provided eight kinds of emotions used in the article text. The cluster of topics was extracted using topic modeling (five topics), while sentiment analysis provided the resonance between the title and the text. Linguistic features were added to the coding outcomes to develop a logistic regression predictive model for testing the significant variables. Other machine learning algorithms were also executed and compared. Findings The results revealed that positive emotions in a text lower the probability of news being fake. It was also found that sensational content like illegal activities and crime-related content were associated with fake news. The news title and the text exhibiting similar sentiments were found to be having lower chances of being fake. News titles with more words and content with fewer words were found to impact fake news detection significantly. Practical implications Several systems and social media platforms today are trying to implement fake news detection methods to filter the content. This research provides exciting parameters from a viral theory perspective that could help develop automated fake news detectors. Originality/value While several studies have explored fake news detection, this study uses a new perspective on viral theory. It also introduces new parameters like sentimental resonance that could help predict fake news. This study deals with an extensive data set and uses advanced natural language processing to automate the coding techniques in developing the prediction model.


2022 ◽  
Vol 6 (GROUP) ◽  
pp. 1-33
Author(s):  
Fayika Farhat Nova ◽  
Amanda Coupe ◽  
Elizabeth D. Mynatt ◽  
Shion Guha ◽  
Jessica A. Pater

A growing body of HCI research has sought to understand how online networks are utilized in the adoption and maintenance of disordered activities and behaviors associated with mental illness, including eating habits. However, individual-level influences over discrete online eating disorder (ED) communities are not yet well understood. This study reports results from a comprehensive network and content analysis (combining computational topic modeling and qualitative thematic analysis) of over 32,000 public tweets collected using popular ED-related hashtags during May 2020. Our findings indicate that this ED network in Twitter consists of multiple smaller ED communities where a majority of the nodes are exposed to unhealthy ED contents through retweeting certain influential central nodes. The emergence of novel linguistic indicators and trends (e.g., "#meanspo") also demonstrates the evolving nature of the ED network. This paper contextualizes ED influence in online communities through node-level participation and engagement, as well as relates emerging ED contents with established online behaviors, such as self-harassment.


2022 ◽  
Author(s):  
Rob Churchill ◽  
Lisa Singh

Topic models have been applied to everything from books to newspapers to social media posts in an effort to identify the most prevalent themes of a text corpus. We provide an in-depth analysis of unsupervised topic models from their inception to today. We trace the origins of different types of contemporary topic models, beginning in the 1990s, and we compare their proposed algorithms, as well as their different evaluation approaches. Throughout, we also describe settings in which topic models have worked well and areas where new research is needed, setting the stage for the next generation of topic models.


Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 189
Author(s):  
Álvaro de Pablo ◽  
Oscar Araque ◽  
Carlos A. Iglesias

The analysis of the content of posts written on social media has established an important line of research in recent years. The study of these texts, as well as their relationship with each other and their dependence on the platform on which they are written, enables the behavior analysis of users and their opinions with respect to different domains. In this work, a hybrid machine learning-based system has been developed to classify texts using topic modeling techniques and different word-vector representations, as well as traditional text representations. The system has been trained with ride-hailing posts extracted from Reddit, showing promising performance. Then, the generated models have been tested with data extracted from other sources such as Twitter and Google Play, classifying these texts without retraining any models and thus performing Transfer Learning. The obtained results show that our proposed architecture is effective when performing Transfer Learning from data-rich domains and applying them to other sources.


Sign in / Sign up

Export Citation Format

Share Document