Topic Modeling: Latent Semantic Analysis for the Social Sciences

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

Download Full-text

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

Entropy ◽

10.3390/e21070660 ◽

2019 ◽

Vol 21 (7) ◽

pp. 660 ◽

Cited By ~ 6

Author(s):

Sergei Koltcov ◽

Vera Ignatenko ◽

Olessia Koltsova

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Statistical Physics ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Model Parameters ◽

Theoretical Ground ◽

Text Documents ◽

Optimizing Model

Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.

Download Full-text

Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20220701.oa7 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-17

Author(s):

Pooja Kherwa ◽

Poonam Bansal

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Pattern Detection ◽

Future Research ◽

Detection Approach ◽

Semantic Spaces ◽

And Control ◽

The Impact

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

Download Full-text

A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data

Indian Journal of Science and Technology ◽

10.17485/ijst/v13i44.1479 ◽

2020 ◽

Vol 13 (44) ◽

pp. 4474-4482

Author(s):

Vasantha Kumari Garbhapu ◽

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Word Association ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Data Set ◽

Document Similarity ◽

Dirichlet Allocation

Objective: To compare the topic modeling techniques, as no free lunch theorem states that under a uniform distribution over search problems, all machine learning algorithms perform equally. Hence, here, we compare Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to identify better performer for English bible data set which has not been studied yet. Methods: This comparative study divided into three levels: In the first level, bible data was extracted from the sources and preprocessed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the preprocessed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, Latent Semantic analysis and Latent Dirichlet Allocations methods were applied over the resultant corpus to study the feasibility of the techniques. Findings: Based on our evaluation, we observed that the LDA achieves 60 to 75% superior performance when compared to LSA using document similarity within-corpus, document similarity with the unseen document. Additionally, LDA showed better coherence score (0.58018) than LSA (0.50395). Moreover, when compared to any word within-corpus, the word association showed better results with LDA. Some words have homonyms based on the context; for example, in the bible; bear has a meaning of punishment and birth. In our study, LDA word association results are almost near to human word associations when compared to LSA. Novelty: LDA was found to be the computationally efficient and interpretable method in adopting the English Bible dataset of New International Version that was not yet created. Keywords: Topic modeling; LSA; LDA; word association; document similarity;Bible data set

Download Full-text

Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis

10.1063/1.5139825 ◽

2019 ◽

Author(s):

Siti Qomariyah ◽

Nur Iriawan ◽

Kartika Fithriasari

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Twitter Data ◽

Dirichlet Allocation

Download Full-text

Word2vec-based latent semantic analysis (W2V-LSA) for topic modeling: A study on blockchain technology trend analysis

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113401 ◽

2020 ◽

Vol 152 ◽

pp. 113401 ◽

Cited By ~ 5

Author(s):

Suhyeon Kim ◽

Haecheong Park ◽

Junghye Lee

Keyword(s):

Trend Analysis ◽

Latent Semantic Analysis ◽

Topic Modeling ◽

Semantic Analysis ◽

Blockchain Technology ◽

Technology Trend

Download Full-text

Topic modeling for frame analysis: A study of media debates on climate change in India and USA

Global Media and Communication ◽

10.1177/17427665211023984 ◽

2021 ◽

pp. 174276652110239

Author(s):

Tuukka Ylä-Anttila ◽

Veikko Eranti ◽

Anna Kukkonen

Keyword(s):

Social Sciences ◽

Climate Change ◽

Topic Modeling ◽

Frame Analysis ◽

Topic Models ◽

Specific Data ◽

The Social ◽

Public Debates

We argue that ‘topics’ of topic models can be used as a useful proxy for frames if (1) frames are operationalized as connections between concepts; (2) theme-specific data are used; and (3) topics are validated in terms of frame analysis. Demonstrating this, we analyse 12 climate change frames used by NGOs, governments and experts in Indian and US media, gathered by topic modeling. We contribute methodologically to topic modeling in the social sciences and frame analysis of public debates, and empirically to research on climate change media debates.

Download Full-text

Humanistic interpretation and machine learning

Synthese ◽

10.1007/s11229-020-02806-w ◽

2020 ◽

Author(s):

Juho Pääkkönen ◽

Petri Ylikoski

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Unsupervised Learning ◽

Text Analysis ◽

Topic Modeling ◽

Scientific Evidence ◽

Original Text ◽

The Social ◽

Unsupervised Approach ◽

Social Scientific

Abstract This paper investigates how unsupervised machine learning methods might make hermeneutic interpretive text analysis more objective in the social sciences. Through a close examination of the uses of topic modeling—a popular unsupervised approach in the social sciences—it argues that the primary way in which unsupervised learning supports interpretation is by allowing interpreters to discover unanticipated information in larger and more diverse corpora and by improving the transparency of the interpretive process. This view highlights that unsupervised modeling does not eliminate the researchers’ judgments from the process of producing evidence for social scientific theories. The paper shows this by distinguishing between two prevalent attitudes toward topic modeling, i.e., topic realism and topic instrumentalism. Under neither can modeling provide social scientific evidence without the researchers’ interpretive engagement with the original text materials. Thus the unsupervised text analysis cannot improve the objectivity of interpretation by alleviating the problem of underdetermination in interpretive debate. The paper argues that the sense in which unsupervised methods can improve objectivity is by providing researchers with the resources to justify to others that their interpretations are correct. This kind of objectivity seeks to reduce suspicions in collective debate that interpretations are the products of arbitrary processes influenced by the researchers’ idiosyncratic decisions or starting points. The paper discusses this view in relation to alternative approaches to formalizing interpretation and identifies several limitations on what unsupervised learning can be expected to achieve in terms of supporting interpretive work.

Download Full-text

Approaches and Methodologies in the Social Sciences

10.1017/cbo9780511801938 ◽

2008 ◽

Cited By ~ 88

Keyword(s):

Social Sciences ◽

The Social

Download Full-text