Cluster Analysis for Internet Public Sentiment in Universities by Combining Methods

Na Zheng; Jie Yu Wu

doi:10.3991/ijes.v6i3.9670

Cluster Analysis for Internet Public Sentiment in Universities by Combining Methods

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v6i3.9670 ◽

2018 ◽

Vol 6 (3) ◽

pp. 60

Author(s):

Na Zheng ◽

Jie Yu Wu

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Semantic Information ◽

Information Leakage ◽

Text Clustering ◽

Text Similarity ◽

Public Sentiment ◽

Combining Methods ◽

Dirichlet Allocation

A clustering method based on the Latent Dirichlet Allocation and the VSM model to compute the text similarity is presented. The Latent Dirichlet Allocation subject models and the VSM vector space model weights strategy are used respectively to calculate the text similarity. The linear combination of the two results is used to get the text similarity. Then the k-means clustering algorithm is chosen for cluster analysis. It can not only solve the deep semantic information leakage problems of traditional text clustering, but also solve the problem of the LDA that could not distinguish the texts because of too much dimension reduction. So the deep semantic information is mined from the text, and the clustering efficiency is improved. Through the comparisons with the traditional methods, the result shows that this algorithm can improve the performance of text clustering.

Download Full-text

A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-319-12096-6_5 ◽

2014 ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Jungang Xu ◽

Shilong Zhou ◽

Lin Qiu ◽

Shengyuan Liu ◽

Pengfei Li

Keyword(s):

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Document Clustering ◽

Dirichlet Allocation

Download Full-text

Applying Text Mining, Clustering Analysis, and Latent Dirichlet Allocation Techniques for Topic Classification of Environmental Education Journals

Sustainability ◽

10.3390/su131910856 ◽

2021 ◽

Vol 13 (19) ◽

pp. 10856

Author(s):

I-Cheng Chang ◽

Tai-Kuei Yu ◽

Yu-Jie Chang ◽

Tai-Yi Yu

Keyword(s):

Artificial Intelligence ◽

Cluster Analysis ◽

Text Mining ◽

Environmental Education ◽

Hierarchical Clustering ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Word Analysis ◽

Dirichlet Allocation

Facing the big data wave, this study applied artificial intelligence to cite knowledge and find a feasible process to play a crucial role in supplying innovative value in environmental education. Intelligence agents of artificial intelligence and natural language processing (NLP) are two key areas leading the trend in artificial intelligence; this research adopted NLP to analyze the research topics of environmental education research journals in the Web of Science (WoS) database during 2011–2020 and interpret the categories and characteristics of abstracts for environmental education papers. The corpus data were selected from abstracts and keywords of research journal papers, which were analyzed with text mining, cluster analysis, latent Dirichlet allocation (LDA), and co-word analysis methods. The decisions regarding the classification of feature words were determined and reviewed by domain experts, and the associated TF-IDF weights were calculated for the following cluster analysis, which involved a combination of hierarchical clustering and K-means analysis. The hierarchical clustering and LDA decided the number of required categories as seven, and the K-means cluster analysis classified the overall documents into seven categories. This study utilized co-word analysis to check the suitability of the K-means classification, analyzed the terms with high TF-IDF wights for distinct K-means groups, and examined the terms for different topics with the LDA technique. A comparison of the results demonstrated that most categories that were recognized with K-means and LDA methods were the same and shared similar words; however, two categories had slight differences. The involvement of field experts assisted with the consistency and correctness of the classified topics and documents.

Download Full-text

Predicting Component Failures Using Latent Dirichlet Allocation

Mathematical Problems in Engineering ◽

10.1155/2015/562716 ◽

2015 ◽

Vol 2015 ◽

pp. 1-15

Author(s):

Hailin Liu ◽

Ling Xu ◽

Mengning Yang ◽

Meng Yan ◽

Xiaohong Zhang

Keyword(s):

Latent Dirichlet Allocation ◽

Semantic Information ◽

Topic Model ◽

Source Code ◽

Statistical Correlation ◽

Similarity Matrix ◽

Component Failure ◽

Program Behavior ◽

Component Failures ◽

Dirichlet Allocation

Latent Dirichlet Allocation (LDA) is a statistical topic model that has been widely used to abstract semantic information from software source code. Failure refers to an observable error in the program behavior. This work investigates whether semantic information and failures recorded in the history can be used to predict component failures. We use LDA to abstract topics from source code and a new metric (topic failure density) is proposed by mapping failures to these topics. Exploring the basic information of topics from neighboring versions of a system, we obtain a similarity matrix. Multiply the Topic Failure Density (TFD) by the similarity matrix to get the TFD of the next version. The prediction results achieve an average 77.8% agreement with the real failures by considering the top 3 and last 3 components descending ordered by the number of failures. We use the Spearman coefficient to measure the statistical correlation between the actual and estimated failure rate. The validation results range from 0.5342 to 0.8337 which beats the similar method. It suggests that our predictor based on similarity of topics does a fine job of component failure prediction.

Download Full-text

A WORD POSITION-RELATED LDA MODEL

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001411008890 ◽

2011 ◽

Vol 25 (06) ◽

pp. 909-925 ◽

Cited By ~ 3

Author(s):

LIDONG ZHAI ◽

ZHAOYUN DING ◽

YAN JIA ◽

BIN ZHOU

Keyword(s):

Experimental Data ◽

Probabilistic Model ◽

Latent Dirichlet Allocation ◽

Semantic Information ◽

Experimental Results ◽

New Method ◽

Word Position ◽

Average Improvement ◽

Latent Topics ◽

Dirichlet Allocation

LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.

Download Full-text

Topic Modelling and Clustering of Disaster-Related Tweets using Bilingual Latent Dirichlet Allocation and Incremental Clustering Algorithm with Support Vector Machines for Need Assessment

10.1109/icsecs52883.2021.00041 ◽

2021 ◽

Author(s):

Lady Angelica Buen Guerzo ◽

Hans Aaron O. Kilkenny ◽

Raphael Noel D. Osorio ◽

Andrei Hart E. Villegas ◽

Charmaine S. Ponay

Keyword(s):

Support Vector Machines ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Topic Modelling ◽

Incremental Clustering ◽

Need Assessment ◽

Vector Machines ◽

Dirichlet Allocation

Download Full-text

An object-oriented clustering algorithm for VHR panchromatic images using nonparametric latent Dirichlet allocation

2012 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2012.6351028 ◽

2012 ◽

Cited By ~ 1

Author(s):

Yinfeng Qi ◽

Hong Tang ◽

Yang Shu ◽

Li Shen ◽

Jianwei Yue ◽

...

Keyword(s):

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Object Oriented ◽

Dirichlet Allocation

Download Full-text

A MRF-Based Clustering Algorithm for Remote Sensing Images by Using the Latent Dirichlet Allocation Model

Procedia Earth and Planetary Science ◽

10.1016/j.proeps.2011.09.056 ◽

2011 ◽

Vol 2 ◽

pp. 358-363 ◽

Cited By ~ 3

Author(s):

Hong Tang ◽

Li Shen ◽

Xin Yang ◽

Yinfeng Qi ◽

Weiguo Jiang ◽

...

Keyword(s):

Remote Sensing ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Remote Sensing Images ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Dirichlet Allocation

Download Full-text

Effectiveness of Latent Dirichlet Allocation Model for Semantic Information Retrieval on Malay Document

2018 Fourth International Conference on Information Retrieval and Knowledge Management (CAMP) ◽

10.1109/infrkm.2018.8464782 ◽

2018 ◽

Cited By ~ 2

Author(s):

Nurul Syeilla Syazhween Binti Zulkefli ◽

Nurazzah Binti Abdul Rahman ◽

Mazidah Binti Puteh ◽

Zainab Binti Abu Bakar

Keyword(s):

Information Retrieval ◽

Latent Dirichlet Allocation ◽

Semantic Information ◽

Allocation Model ◽

Latent Dirichlet Allocation Model ◽

Semantic Information Retrieval ◽

Dirichlet Allocation

Download Full-text

Combination of Latent Dirichlet Allocation (LDA) and Term Frequency-Inverse Cluster Frequency (TFxICF) in Indonesian text clustering with labeling

2016 4th International Conference on Information and Communication Technology (ICoICT) ◽

10.1109/icoict.2016.7571885 ◽

2016 ◽

Author(s):

Lya Hulliyyatus Suadaa ◽

Ayu Purwarianti

Keyword(s):

Latent Dirichlet Allocation ◽

Text Clustering ◽

Term Frequency ◽

Cluster Frequency ◽

Dirichlet Allocation

Download Full-text

A Text Hybrid Clustering Algorithm Based on HowNet Semantics

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.474-476.2071 ◽

2011 ◽

Vol 474-476 ◽

pp. 2071-2078 ◽

Cited By ~ 2

Author(s):

Zheng Yu Zhu ◽

Shu Jia Dong ◽

Chun Lei Yu ◽

Jie He

Keyword(s):

Clustering Algorithm ◽

Semantic Information ◽

Clustering Algorithms ◽

Improved Genetic Algorithm ◽

Maximum Weight ◽

Text Similarity ◽

Hybrid Clustering ◽

Lower Accuracy ◽

Similarity Computation ◽

Maximum Weight Matching

Many existing text clustering algorithms overlook the semantic information between words and so they possess a lower accuracy of text similarity computation. A new text hybrid clustering algorithm (HCA) based on HowNet semantics has been proposed in this paper. It calculates the semantic similarity of words by using the words’ semantic concept description in HowNet and then combines it with the method of maximum weight matching of bipartite graph to calculate a semantic-based text similarity. Based on the new text similarity and by combining an improved genetic algorithm with k-medoids algorithm, HCA has been designed. The comparative experiments show that: 1) compared with two existing traditional clustering algorithms, HCA can get better quality and 2) when their text cosine similarity is replaced with the new semantic-based text similarity, all the qualities of the three clustering algorithms can be improved significantly.

Download Full-text