A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

Download Full-text

Estimating Topic Modeling Performance with Sharma–Mittal Entropy

Entropy ◽

10.3390/e21070660 ◽

2019 ◽

Vol 21 (7) ◽

pp. 660 ◽

Cited By ~ 6

Author(s):

Sergei Koltcov ◽

Vera Ignatenko ◽

Olessia Koltsova

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Statistical Physics ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Probabilistic Latent Semantic Analysis ◽

Model Parameters ◽

Theoretical Ground ◽

Text Documents ◽

Optimizing Model

Topic modeling is a popular approach for clustering text documents. However, current tools have a number of unsolved problems such as instability and a lack of criteria for selecting the values of model parameters. In this work, we propose a method to solve partially the problems of optimizing model parameters, simultaneously accounting for semantic stability. Our method is inspired by the concepts from statistical physics and is based on Sharma–Mittal entropy. We test our approach on two models: probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA) with Gibbs sampling, and on two datasets in different languages. We compare our approach against a number of standard metrics, each of which is able to account for just one of the parameters of our interest. We demonstrate that Sharma–Mittal entropy is a convenient tool for selecting both the number of topics and the values of hyper-parameters, simultaneously controlling for semantic stability, which none of the existing metrics can do. Furthermore, we show that concepts from statistical physics can be used to contribute to theory construction for machine learning, a rapidly-developing sphere that currently lacks a consistent theoretical ground.

Download Full-text

Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20220701.oa7 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-17

Author(s):

Pooja Kherwa ◽

Poonam Bansal

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Pattern Detection ◽

Future Research ◽

Detection Approach ◽

Semantic Spaces ◽

And Control ◽

The Impact

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

Download Full-text

A comparative analysis of Latent Semantic analysis and Latent Dirichlet allocation topic modeling methods using Bible data

Indian Journal of Science and Technology ◽

10.17485/ijst/v13i44.1479 ◽

2020 ◽

Vol 13 (44) ◽

pp. 4474-4482

Author(s):

Vasantha Kumari Garbhapu ◽

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Word Association ◽

Machine Learning Algorithms ◽

Superior Performance ◽

Data Set ◽

Document Similarity ◽

Dirichlet Allocation

Objective: To compare the topic modeling techniques, as no free lunch theorem states that under a uniform distribution over search problems, all machine learning algorithms perform equally. Hence, here, we compare Latent Semantic Analysis (LSA) or Latent Dirichlet Allocation (LDA) to identify better performer for English bible data set which has not been studied yet. Methods: This comparative study divided into three levels: In the first level, bible data was extracted from the sources and preprocessed to remove the words and characters which were not useful to obtain the semantic structures or necessary patterns to make the meaningful corpus. In the second level, the preprocessed data were converted into a bag of words and numerical statistic TF-IDF (Term Frequency – Inverse Document Frequency) is used to assess how relevant a word is to a document in a corpus. In the third level, Latent Semantic analysis and Latent Dirichlet Allocations methods were applied over the resultant corpus to study the feasibility of the techniques. Findings: Based on our evaluation, we observed that the LDA achieves 60 to 75% superior performance when compared to LSA using document similarity within-corpus, document similarity with the unseen document. Additionally, LDA showed better coherence score (0.58018) than LSA (0.50395). Moreover, when compared to any word within-corpus, the word association showed better results with LDA. Some words have homonyms based on the context; for example, in the bible; bear has a meaning of punishment and birth. In our study, LDA word association results are almost near to human word associations when compared to LSA. Novelty: LDA was found to be the computationally efficient and interpretable method in adopting the English Bible dataset of New International Version that was not yet created. Keywords: Topic modeling; LSA; LDA; word association; document similarity;Bible data set

Download Full-text

Fast Memory Integration Facilitated by Schema Consistency

10.1101/253393 ◽

2018 ◽

Cited By ~ 2

Author(s):

Qiong Zhang ◽

Vencislav Popov ◽

Griffin E. Koch ◽

Regina C. Calloway ◽

Marc N. Coutanche

Keyword(s):

Prior Knowledge ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Alternative Mechanism ◽

Paired Associates ◽

Fast Learning ◽

Text Corpora ◽

Memory Integration ◽

New Information ◽

Fast Memory

AbstractMany of our everyday decisions are based not only on memories of direct experiences, but on memories that are integrated across multiple distinct experiences. Sometimes memory integration between existing memories and newly learnt information occurs rapidly, without requiring inference at the time of a decision. Such fast memory integration is known to be supported by the hippocampus but not the neocortex. In this study, we explore an alternative mechanism of fast memory integration, through related prior knowledge (i.e., schema), which is associated with neocortical learning. Paired associates were selected to be schema consistent or inconsistent, and confirmed with a latent semantic analysis of text corpora. We observed that after enabling fast learning by using material that is consistent with a schema, faster memory integration can occur. This result suggests that the hippocampus-mediated integration of new information is not the only available mechanism that supports fast memory integration.

Download Full-text