topic model
Recently Published Documents





2022 ◽  
Vol 9 (3) ◽  
pp. 1-22
Mohammad Daradkeh

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.

2022 ◽  
Vol 34 (3) ◽  
pp. 1-13
Jianzu Wu ◽  
Kunxin Zhang

This article examines the policy implementation literature using a text mining technique, known as a structural topic model (STM), to conduct a comprehensive analysis of 547 articles published by 11 major journals between 2000 and 2019. The subject analyzed was the policy implementation literature, and the search included titles, keywords, and abstracts. The application of the STM not only allowed us to provide snapshots of different research topics and variation across covariates but also let us track the evolution and influence of topics over time. Examining the policy implementation literature has contributed to the understanding of public policy areas; the authors also provided recommendations for future studies in policy implementation.

2022 ◽  
Vol 40 (3) ◽  
pp. 1-30
Qianqian Xie ◽  
Yutao Zhu ◽  
Jimin Huang ◽  
Pan Du ◽  
Jian-Yun Nie

Due to the overload of published scientific articles, citation recommendation has long been a critical research problem for automatically recommending the most relevant citations of given articles. Relational topic models (RTMs) have shown promise on citation prediction via joint modeling of document contents and citations. However, existing RTMs can only capture pairwise or direct (first-order) citation relationships among documents. The indirect (high-order) citation links have been explored in graph neural network–based methods, but these methods suffer from the well-known explainability problem. In this article, we propose a model called Graph Neural Collaborative Topic Model that takes advantage of both relational topic models and graph neural networks to capture high-order citation relationships and to have higher explainability due to the latent topic semantic structure. Experiments on three real-world citation datasets show that our model outperforms several competitive baseline methods on citation recommendation. In addition, we show that our approach can learn better topics than the existing approaches. The recommendation results can be well explained by the underlying topics.

2022 ◽  
Vol 22 (1) ◽  
Guanglei Yu ◽  
Linlin Zhang ◽  
Ying Zhang ◽  
Jiaqi Zhou ◽  
Tao Zhang ◽  

Abstract Background The greatly accelerated development of information technology has conveniently provided adoption for risk stratification, which means more beneficial for both patients and clinicians. Risk stratification offers accurate individualized prevention and therapeutic decision making etc. Hospital discharge records (HDRs) routinely include accurate conclusions of diagnoses of the patients. For this reason, in this paper, we propose an improved model for risk stratification in a supervised fashion by exploring HDRs about coronary heart disease (CHD). Methods We introduced an improved four-layer supervised latent Dirichlet allocation (sLDA) approach called Hierarchical sLDA model, which categorized patient features in HDRs as patient feature-value pairs in one-hot way according to clinical guidelines for lab test of CHD. To address the data missing and imbalance problem, RFs and SMOTE methods are used respectively. After TF-IDF processing of datasets, variational Bayes expectation-maximization method and generalized linear model were used to recognize the latent clinical state of a patient, i.e., risk stratification, as well as to predict CHD. Accuracy, macro-F1, training and testing time performance were used to evaluate the performance of our model. Results According to the characteristics of our datasets, i.e., patient feature-value pairs, we construct a supervised topic model by adding one more Dirichlet distribution hyperparameter to sLDA. Compared with established supervised algorithm Multi-class sLDA model, we demonstrate that our proposed approach enhances training time by 59.74% and testing time by 25.58% but almost no loss of average prediction accuracy on our datasets. Conclusions A model for risk stratification and prediction of CHD based on sLDA model was proposed. Experimental results show that Hierarchical sLDA model we proposed is competitive in time performance and accuracy. Hierarchical processing of patient features can significantly improve the disadvantages of low efficiency and time-consuming Gibbs sampling of sLDA model.

2022 ◽  
Bing Sun ◽  
Zhuofang Ju

Abstract Under the background of green development, new energy vehicles(NEVs), as an important strategic emerging industry, play a crucial role in energy conservation and emission reduction. In the post-epidemic era, steadily promoting the promotion of NEVs will be a hot topic. Based on heterogeneous source data, combined with the Latent Dirichlet Allocation (LDA) topic model, Social Network Analysis (SNA), and econometric methods, this paper explores whether individual purchase decisions and company-level cooperative research and development will promote the promotion of new energy vehicles. The results show that whether BEV, HEV, or PHEV, users are more concerned about space dimension, power performance, and design style; Patent collaboration network analysis indicates that NEV enterprises are establishing close partnerships, which will urge the promotion of NEVs; For BEV and HEV models, new energy vehicle companies will invest more patents and R&D investment will better expedite the advancement of NEVs.

2022 ◽  
Yuening Wang ◽  
Rodrigo Benavides ◽  
Luda Diatchenko ◽  
Audrey Grant ◽  
Yue Li

Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. To enable systematic investigation of entire structured phenomes, we present graph embedded topic model (GETM). We offer two main methodological contributions in GETM. First, to aid topic inference, we integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Second, leveraging deep learning techniques, we developed a variational autoencoder framework to infer patient phenotypic mixture. For interpretability, we use a linear decoder to simultaneously infer the bi-modal distributions of the disease conditions and medications. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains conditions and medications for 457,461 individuals. Compared to existing methods, GETM demonstrates overall superior performance in imputing missing conditions and medications. Here, we focused on characterizing pain phenotypes recorded in the questionnaire of the UKB individuals. GETM accurately predicts the status of chronic musculoskeletal (CMK) pain, chronic pain by body-site, and non-specific chronic pain using past conditions and medications. Our analyses revealed not only the known pain-related topics but also the surprising predominance of medications and conditions in the cardiovascular category among the most predictive topics across chronic pain phenotypes.

2022 ◽  
Vol 2022 ◽  
pp. 1-12
Jialin Ma ◽  
Xiaoqiang Gong ◽  
Zhaojun Wang ◽  
Qian Xie

Syndrome differentiation is the most basic diagnostic method in traditional Chinese medicine (TCM). The process of syndrome differentiation is difficult and challenging due to its complexity, diversity, and vagueness. Recently, artificial intelligent methods have been introduced to discover the regularities of syndrome differentiation from TCM medical records, but the existing DM algorithms failed to consider how a syndrome is generated according to TCM theories. In this paper, we propose a novel topic model framework named syndrome differentiation topic model (SDTM) to dynamically characterize the process of syndrome differentiation. The SDTM framework utilizes latent Dirichlet allocation (LDA) to discover the latent semantic relationship between symptoms and syndromes in mass of Chinese medical records. We also use similarity measurement method to make the uninterpretable topics correspond with the labeled syndromes. Finally, Bayesian method is used in the final differentiated syndromes. Experimental results show the superiority of SDTM over existing topic models for the task of syndrome differentiation.

2022 ◽  
Shixiong Wang ◽  
Yajuan Xu ◽  
Xianyun Tian ◽  
Yu Song ◽  
Yanyu Luo ◽  

Abstract Background: The use of social media before bedtime usually results in late bedtimes, which is a prevalent cause of insufficient sleep among the general population of most countries. However, it is still unclear how people with late bedtimes use social media, which is crucial for adopting targeted behavior interventions to prevent insufficient sleep. Methods: In this study, we randomly selected 100000 users from Sina Weibo and collected all their posting through web crawling. The posting time was proposed as a proxy to identify nights on which a user stays up late. A text classifier and topic model were developed to identify the emotional states and themes of their posts. We also analyzed their posting/reposting activity, time-use patterns, and geographical distribution. Results: Our analyses show that habitually late sleepers express fewer emotions and use social media more for entertainment and getting information. People who rarely stay up late feel worse when staying up late, and they use social media more for emotional expression. People with late bedtimes mainly live in developed areas and use smartphones more when staying up late. Conclusion: This study depicts the online behavior of people with late bedtimes, which helps understand them and thereby adopt appropriately targeted interventions to avoid insufficient sleep.

Sign in / Sign up

Export Citation Format

Share Document