latent dirichlet allocation
Recently Published Documents


TOTAL DOCUMENTS

1386
(FIVE YEARS 823)

H-INDEX

37
(FIVE YEARS 13)

2022 ◽  
Vol 9 (3) ◽  
pp. 1-22
Author(s):  
Mohammad Daradkeh

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.


2022 ◽  
Vol 30 (6) ◽  
pp. 1-21
Author(s):  
Lei Li ◽  
Shaojun Ma ◽  
Runqi Wang ◽  
Yiping Wang ◽  
Yilin Zheng

Abundant natural resources are the basis of urbanisation and industrialisation. Citizens are the key factor in promoting a sustainable supply of natural resources and the high-quality development of urban areas. This study focuses on the co-production behaviours of citizens regarding urban natural resource assets in the age of big data, and uses the latent Dirichlet allocation algorithm and the stepwise regression analysis method to evaluate citizens’ experiences and feelings related to the urban capitalisation of natural resources. Results show that, firstly, the machine learning algorithm based on natural language processing can effectively identify and deal with the demands of urban natural resource assets. Secondly, in the experience of urban natural resources, citizens pay more attention to the combination of history, culture, infrastructure and natural landscape. Unique natural resource can enhance citizens’ sense of participation. Finally, the scenery, entertainment and quality and value of urban natural resources are the influencing factors of citizens’ satisfaction.


2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Author(s):  
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.


2022 ◽  
Vol 40 (3) ◽  
pp. 1-24
Author(s):  
Jiashu Zhao ◽  
Jimmy Xiangji Huang ◽  
Hongbo Deng ◽  
Yi Chang ◽  
Long Xia

In this article, we propose a Latent Dirichlet Allocation– (LDA) based topic-graph probabilistic personalization model for Web search. This model represents a user graph in a latent topic graph and simultaneously estimates the probabilities that the user is interested in the topics, as well as the probabilities that the user is not interested in the topics. For a given query issued by the user, the webpages that have higher relevancy to the interested topics are promoted, and the webpages more relevant to the non-interesting topics are penalized. In particular, we simulate a user’s search intent by building two profiles: A positive user profile for the probabilities of the user is interested in the topics and a corresponding negative user profile for the probabilities of being not interested in the the topics. The profiles are estimated based on the user’s search logs. A clicked webpage is assumed to include interesting topics. A skipped (viewed but not clicked) webpage is assumed to cover some non-interesting topics to the user. Such estimations are performed in the latent topic space generated by LDA. Moreover, a new approach is proposed to estimate the correlation between a given query and the user’s search history so as to determine how much personalization should be considered for the query. We compare our proposed models with several strong baselines including state-of-the-art personalization approaches. Experiments conducted on a large-scale real user search log collection illustrate the effectiveness of the proposed models.


Author(s):  
Pooja Kherwa ◽  
Poonam Bansal

The Covid-19 pandemic is the deadliest outbreak in our living memory. So, it is need of hour, to prepare the world with strategies to prevent and control the impact of the epidemics. In this paper, a novel semantic pattern detection approach in the Covid-19 literature using contextual clustering and intelligent topic modeling is presented. For contextual clustering, three level weights at term level, document level, and corpus level are used with latent semantic analysis. For intelligent topic modeling, semantic collocations using pointwise mutual information(PMI) and log frequency biased mutual dependency(LBMD) are selected and latent dirichlet allocation is applied. Contextual clustering with latent semantic analysis presents semantic spaces with high correlation in terms at corpus level. Through intelligent topic modeling, topics are improved in the form of lower perplexity and highly coherent. This research helps in finding the knowledge gap in the area of Covid-19 research and offered direction for future research.


Author(s):  
Sujatha Arun Kokatnoor ◽  
Balachandran Krishnan

<p>The main focus of this research is to find the reasons behind the fresh cases of COVID-19 from the public’s perception for data specific to India. The analysis is done using machine learning approaches and validating the inferences with medical professionals. The data processing and analysis is accomplished in three steps. First, the dimensionality of the vector space model (VSM) is reduced with improvised feature engineering (FE) process by using a weighted term frequency-inverse document frequency (TF-IDF) and forward scan trigrams (FST) followed by removal of weak features using feature hashing technique. In the second step, an enhanced K-means clustering algorithm is used for grouping, based on the public posts from Twitter®. In the last step, latent dirichlet allocation (LDA) is applied for discovering the trigram topics relevant to the reasons behind the increase of fresh COVID-19 cases. The enhanced K-means clustering improved Dunn index value by 18.11% when compared with the traditional K-means method. By incorporating improvised two-step FE process, LDA model improved by 14% in terms of coherence score and by 19% and 15% when compared with latent semantic analysis (LSA) and hierarchical dirichlet process (HDP) respectively thereby resulting in 14 root causes for spike in the disease.</p>


2022 ◽  
Vol 34 (3) ◽  
pp. 1-21
Author(s):  
Xue Yu

The purpose is to solve the problems of sparse data information, low recommendation precision and recall rate and cold start of the current tourism personalized recommendation system. First, a context based personalized recommendation model (CPRM) is established by using the labeled-LDA (Labeled Latent Dirichlet Allocation) algorithm. The precision and recall of interest point recommendation are improved by mining the context information in unstructured text. Then, the interest point recommendation framework based on convolutional neural network (IPRC) is established. The semantic and emotional information in the comment text is extracted to identify user preferences, and the score of interest points in the target location is predicted combined with the influence factors of geographical location. Finally, real datasets are adopted to evaluate the recommendation precision and recall of the above two models and their performance of solving the cold start problem.


2022 ◽  
Vol 24 (3) ◽  
pp. 1-19
Author(s):  
Nikhlesh Pathik ◽  
Pragya Shukla

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.


2022 ◽  
Vol 24 (3) ◽  
pp. 0-0

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.


Sign in / Sign up

Export Citation Format

Share Document