latent dirichlet allocation Latest Research Papers

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.

Download Full-text

Citizen Participation in the Co-Production of Urban Natural Resource Assets

Journal of Global Information Management ◽

10.4018/jgim.291514 ◽

2022 ◽

Vol 30 (6) ◽

pp. 1-21

Author(s):

Lei Li ◽

Shaojun Ma ◽

Runqi Wang ◽

Yiping Wang ◽

Yilin Zheng

Keyword(s):

Natural Resources ◽

Citizen Participation ◽

Language Processing ◽

Natural Resource ◽

Urban Areas ◽

Latent Dirichlet Allocation ◽

Learning Algorithm ◽

Analysis Method ◽

Natural Landscape ◽

Key Factor

Abundant natural resources are the basis of urbanisation and industrialisation. Citizens are the key factor in promoting a sustainable supply of natural resources and the high-quality development of urban areas. This study focuses on the co-production behaviours of citizens regarding urban natural resource assets in the age of big data, and uses the latent Dirichlet allocation algorithm and the stepwise regression analysis method to evaluate citizens’ experiences and feelings related to the urban capitalisation of natural resources. Results show that, firstly, the machine learning algorithm based on natural language processing can effectively identify and deal with the demands of urban natural resource assets. Secondly, in the experience of urban natural resources, citizens pay more attention to the combination of history, culture, infrastructure and natural landscape. Unique natural resource can enhance citizens’ sense of participation. Finally, the scenery, entertainment and quality and value of urban natural resources are the influencing factors of citizens’ satisfaction.

Download Full-text

Topic Modeling Using Latent Dirichlet allocation

ACM Computing Surveys ◽

10.1145/3462478 ◽

2022 ◽

Vol 54 (7) ◽

pp. 1-35

Author(s):

Uttam Chauhan ◽

Apurva Shah

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Research Work ◽

Topic Models ◽

Small Subset ◽

Distributed Environment ◽

Future Directions ◽

Probabilistic Topic Modeling ◽

Modeling Techniques ◽

Evaluation Techniques

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

Download Full-text

Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search Personalization

ACM Transactions on Information Systems ◽

10.1145/3476106 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-24

Author(s):

Jiashu Zhao ◽

Jimmy Xiangji Huang ◽

Hongbo Deng ◽

Yi Chang ◽

Long Xia

Keyword(s):

Probabilistic Model ◽

Large Scale ◽

Web Search ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

User Profile ◽

New Approach ◽

Latent Topic ◽

Search History ◽

Search Logs

In this article, we propose a Latent Dirichlet Allocation– (LDA) based topic-graph probabilistic personalization model for Web search. This model represents a user graph in a latent topic graph and simultaneously estimates the probabilities that the user is interested in the topics, as well as the probabilities that the user is not interested in the topics. For a given query issued by the user, the webpages that have higher relevancy to the interested topics are promoted, and the webpages more relevant to the non-interesting topics are penalized. In particular, we simulate a user’s search intent by building two profiles: A positive user profile for the probabilities of the user is interested in the topics and a corresponding negative user profile for the probabilities of being not interested in the the topics. The profiles are estimated based on the user’s search logs. A clicked webpage is assumed to include interesting topics. A skipped (viewed but not clicked) webpage is assumed to cover some non-interesting topics to the user. Such estimations are performed in the latent topic space generated by LDA. Moreover, a new approach is proposed to estimate the correlation between a given query and the user’s search history so as to determine how much personalization should be considered for the query. We compare our proposed models with several strong baselines including state-of-the-art personalization approaches. Experiments conducted on a large-scale real user search log collection illustrate the effectiveness of the proposed models.

Download Full-text

Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

International Journal of E-Health and Medical Communications ◽

10.4018/ijehmc.20220701.oa7 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-17

Author(s):

Pooja Kherwa ◽

Poonam Bansal

Keyword(s):

Latent Semantic Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Pattern Detection ◽

Future Research ◽

Detection Approach ◽

Semantic Spaces ◽

And Control ◽

The Impact

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.

Download Full-text

Root cause analysis of COVID-19 cases by enhanced text mining process

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1807-1817 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1807

Author(s):

Sujatha Arun Kokatnoor ◽

Balachandran Krishnan

Keyword(s):

Dirichlet Process ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Learning Approaches ◽

The Public ◽

Root Cause ◽

Document Frequency ◽

Coherence Score ◽

Index Value

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>

Download Full-text

Global Multi-Source Information Fusion Management and Deep Learning Optimization for Tourism

Journal of Organizational and End User Computing ◽

10.4018/joeuc.294902 ◽

2022 ◽

Vol 34 (3) ◽

pp. 1-21

Author(s):

Xue Yu

Keyword(s):

Recommendation System ◽

Latent Dirichlet Allocation ◽

Target Location ◽

Geographical Location ◽

Cold Start ◽

User Preferences ◽

Personalized Recommendation ◽

Interest Point ◽

Source Information ◽

Interest Points

The purpose is to solve the problems of sparse data information, low recommendation precision and recall rate and cold start of the current tourism personalized recommendation system. First, a context based personalized recommendation model (CPRM) is established by using the labeled-LDA (Labeled Latent Dirichlet Allocation) algorithm. The precision and recall of interest point recommendation are improved by mining the context information in unstructured text. Then, the interest point recommendation framework based on convolutional neural network (IPRC) is established. The semantic and emotional information in the comment text is extracted to identify user preferences, and the score of interest points in the target location is predicted combined with the influence factors of geographical location. Finally, real datasets are adopted to evaluate the recommendation precision and recall of the above two models and their performance of solving the cold start problem.

Download Full-text

Aspect Based Sentiment Analysis of Unlabeled Reviews Using Linguistic Rule Based LDA

Journal of Cases on Information Technology ◽

10.4018/jcit.20220701.oa3 ◽

2022 ◽

Vol 24 (3) ◽

pp. 1-19

Author(s):

Nikhlesh Pathik ◽

Pragya Shukla

Keyword(s):

Social Networks ◽

Sentiment Analysis ◽

Domain Knowledge ◽

Latent Dirichlet Allocation ◽

Rule Based ◽

Aspect Extraction ◽

Linguistic Rule ◽

Digital Era ◽

Average Accuracy ◽

Linguistic Rules

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.

Download Full-text

PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination

Tsinghua Science & Technology ◽

10.26599/tst.2021.9010059 ◽

2022 ◽

Vol 27 (4) ◽

pp. 680-691

Author(s):

Yilin Kang ◽

Renwei Ou ◽

Yi Zhang ◽

Hongling Li ◽

Shasha Tian

Keyword(s):

Latent Dirichlet Allocation ◽

Knowledge Graph ◽

Government Department ◽

Dirichlet Allocation

Download Full-text

Aspect Based Sentiment Analysis of Unlabeled Reviews using Linguistic Rule Based LDA

Journal of Cases on Information Technology ◽

10.4018/jcit.20220801oa05 ◽

2022 ◽

Vol 24 (3) ◽

pp. 0-0

Keyword(s):

Social Networks ◽

Sentiment Analysis ◽

Domain Knowledge ◽

Latent Dirichlet Allocation ◽

Rule Based ◽

Aspect Extraction ◽

Linguistic Rule ◽

Digital Era ◽

Average Accuracy ◽

Linguistic Rules

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.

Download Full-text

latent dirichlet allocation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Analyzing Sentiments and Diffusion Characteristics of COVID-19 Vaccine Misinformation Topics in Social Media

Citizen Participation in the Co-Production of Urban Natural Resource Assets

Topic Modeling Using Latent Dirichlet allocation

Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search Personalization

Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

Root cause analysis of COVID-19 cases by enhanced text mining process

Global Multi-Source Information Fusion Management and Deep Learning Optimization for Tourism

Aspect Based Sentiment Analysis of Unlabeled Reviews Using Linguistic Rule Based LDA

PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination

Aspect Based Sentiment Analysis of Unlabeled Reviews using Linguistic Rule Based LDA

Export Citation Format

latent dirichlet allocationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Analyzing Sentiments and Diffusion Characteristics of COVID-19 Vaccine Misinformation Topics in Social Media

Citizen Participation in the Co-Production of Urban Natural Resource Assets

Topic Modeling Using Latent Dirichlet allocation

Are Topics Interesting or Not? An LDA-based Topic-graph Probabilistic Model for Web Search Personalization

Semantic Pattern Detection in COVID-19 Using Contextual Clustering and Intelligent Topic Modeling

Root cause analysis of COVID-19 cases by enhanced text mining process

Global Multi-Source Information Fusion Management and Deep Learning Optimization for Tourism

Aspect Based Sentiment Analysis of Unlabeled Reviews Using Linguistic Rule Based LDA

PG-CODE: Latent dirichlet allocation embedded policy knowledge graph for government department coordination

Aspect Based Sentiment Analysis of Unlabeled Reviews using Linguistic Rule Based LDA

latent dirichlet allocation
Recently Published Documents