A novel word clustering algorithm based on latent semantic analysis

Online discussion forums have rapidly gained usage in e-learning systems. This has placed a heavy burden on course instructors in terms of moderating student discussions. Previous methods of assessing student participation in online discussions followed strictly quantitative approaches that did not necessarily capture the students’ effort. Along with this growth in usage there is a need for accelerated knowledge extraction tools for analysing and presenting online messages in a useful and meaningful manner. This article discussed a qualitative approach which involves content analysis of the discussions and generation of clustered keywords which can be used to identify topics of discussion. The authors applied a new k-means++ clustering algorithm with latent semantic analysis to assess the topics expressed by students in online discussion forums. The proposed algorithm was then compared with the standard k-means++ algorithm. Using the Moodle course management forum to validate the proposed algorithm, the authors show that the k-mean++ clustering algorithm with latent semantic analysis performs better than a stand-alone k-means++.

Download Full-text

Tag clustering algorithm LMMSK: improved K-means algorithm based on latent semantic analysis

Journal of Systems Engineering and Electronics ◽

10.21629/jsee.2017.02.18 ◽

2017 ◽

Vol 28 (2) ◽

pp. 374 ◽

Cited By ~ 5

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Tag Clustering

Download Full-text

Chinese-English Cross-Language Text Clustering Algorithm Based on Latent Semantic Analysis

10.22323/1.300.0007 ◽

2018 ◽

Author(s):

Huihong Lan ◽

Jinde Huang

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Text Clustering ◽

Cross Language ◽

Language Text

Download Full-text

Bert-Based Latent Semantic Analysis (Bert-LSA): A Case Study on Geospatial Data Technology and Application Trend Analysis

Applied Sciences ◽

10.3390/app112411897 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11897

Author(s):

Quanying Cheng ◽

Yunqiang Zhu ◽

Jia Song ◽

Hongyun Zeng ◽

Shu Wang ◽

...

Keyword(s):

Trend Analysis ◽

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Relevant Literature ◽

Geospatial Data ◽

Probabilistic Latent Semantic Analysis ◽

Data Resource ◽

Topic Analysis ◽

Text Content

Geospatial data is an indispensable data resource for research and applications in many fields. The technologies and applications related to geospatial data are constantly advancing and updating, so identifying the technologies and applications among them will help foster and fund further innovation. Through topic analysis, new research hotspots can be discovered by understanding the whole development process of a topic. At present, the main methods to determine topics are peer review and bibliometrics, however they just review relevant literature or perform simple frequency analysis. This paper proposes a new topic discovery method, which combines a word embedding method, based on a pre-trained model, Bert, and a spherical k-means clustering algorithm, and applies the similarity between literature and topics to assign literature to different topics. The proposed method was applied to 266 pieces of literature related to geospatial data over the past five years. First, according to the number of publications, the trend analysis of technologies and applications related to geospatial data in several leading countries was conducted. Then, the consistency of the proposed method and the existing method PLSA (Probabilistic Latent Semantic Analysis) was evaluated by using two similar consistency evaluation indicators (i.e., U-Mass and NMPI). The results show that the method proposed in this paper can well reveal text content, determine development trends, and produce more coherent topics, and that the overall performance of Bert-LSA is better than PLSA using NPMI and U-Mass. This method is not limited to trend analysis using the data in this paper; it can also be used for the topic analysis of other types of texts.

Download Full-text

Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.5094 ◽

2018 ◽

Vol 31 (13) ◽

pp. e5094 ◽

Cited By ~ 2

Author(s):

Karthick Seshadri ◽

K. Viswanathan Iyer ◽

Mercy Shalinie S

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Document Clustering

Download Full-text

Research on the text clustering algorithm based on latent semantic analysis and optimization

2011 IEEE International Conference on Computer Science and Automation Engineering ◽

10.1109/csae.2011.5952891 ◽

2011 ◽

Cited By ~ 1

Author(s):

Wang Chun-hong ◽

Nan Li-Li ◽

Ren Yao-Peng

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Text Clustering

Download Full-text

Improving Website Usability with Latent Semantic Analysis

PsycEXTRA Dataset ◽

10.1037/e577712012-027 ◽

2006 ◽

Author(s):

Sarah A. Nuehring ◽

Peter W. Foltz

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Website Usability

Download Full-text

Task Estimation Using Latent Semantic Analysis of Visual Scenes and Spoken Words

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.1473 ◽

2012 ◽

Vol 132 (9) ◽

pp. 1473-1480

Author(s):

Masashi Kimura ◽

Shinta Sawada ◽

Yurie Iribe ◽

Kouichi Katsurada ◽

Tsuneo Nitta

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Spoken Words ◽

Visual Scenes

Download Full-text

Similarity Detection Using Latent Semantic Analysis Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.124 ◽

2018 ◽

Vol 6 (8) ◽

pp. 102

Author(s):

Priyanka R. Patil ◽

Shital A. Patil

Keyword(s):

Latent Semantic Analysis ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Mining Method ◽

Research Papers ◽

Information Measures ◽

Automated Software ◽

Day By Day ◽

Ways Of Life ◽

Dirichlet Allocation

Similarity View is an application for visually comparing and exploring multiple models of text and collection of document. Friendbook finds ways of life of clients from client driven sensor information, measures the closeness of ways of life amongst clients, and prescribes companions to clients if their ways of life have high likeness. Roused by demonstrate a clients day by day life as life records, from their ways of life are separated by utilizing the Latent Dirichlet Allocation Algorithm. Manual techniques can't be utilized for checking research papers, as the doled out commentator may have lacking learning in the exploration disciplines. For different subjective views, causing possible misinterpretations. An urgent need for an effective and feasible approach to check the submitted research papers with support of automated software. A method like text mining method come to solve the problem of automatically checking the research papers semantically. The proposed method to finding the proper similarity of text from the collection of documents by using Latent Dirichlet Allocation (LDA) algorithm and Latent Semantic Analysis (LSA) with synonym algorithm which is used to find synonyms of text index wise by using the English wordnet dictionary, another algorithm is LSA without synonym used to find the similarity of text based on index. LSA with synonym rate of accuracy is greater when the synonym are consider for matching.

Download Full-text

LATENT-SEMANTIC ANALYSIS, SOCIAL NETWORKS AND NON-STRUCTURED DATA: INTERACTION METHOD

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-29 ◽

2019 ◽

Keyword(s):

Social Networks ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Statistical Processing ◽

Unstructured Data ◽

Average Score ◽

Text Type ◽

Internet Users ◽

The Matrix ◽

Human Thinking

This article examines the method of latent-semantic analysis, its advantages, disadvantages, and the possibility of further transformation for use in arrays of unstructured data, which make up most of the information that Internet users deal with. To extract context-dependent word meanings through the statistical processing of large sets of textual data, an LSA method is used, based on operations with numeric matrices of the word-text type, the rows of which correspond to words, and the columns of text units to texts. The integration of words into themes and the representation of text units in the theme space is accomplished by applying one of the matrix expansions to the matrix data: singular decomposition or factorization of nonnegative matrices. The results of LSA studies have shown that the content of the similarity of words and text is obtained in such a way that the results obtained closely coincide with human thinking. Based on the methods described above, the author has developed and proposed a new way of finding semantic links between unstructured data, namely, information on social networks. The method is based on latent-semantic and frequency analyzes and involves processing the search result received, splitting each remaining text (post) into separate words, each of which takes the round in n words right and left, counting the number of occurrences of each term, working with a pre-created semantic resource (dictionary, ontology, RDF schema, ...). The developed method and algorithm have been tested on six well-known social networks, the interaction of which occurs through the ARI of the respective social networks. The average score for author's results exceeded that of their own social network search. The results obtained in the course of this dissertation can be used in the development of recommendation, search and other systems related to the search, rubrication and filtering of information.

Download Full-text