scholarly journals MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS

2020 ◽  
Vol 0 (4) ◽  
pp. 95-105
Author(s):  
Z. V. Ostapiuk ◽  
T. O. Korotyeyeva
Author(s):  
Vivek Mehta ◽  
Seema Bawa ◽  
Jasmeet Singh

AbstractA massive amount of textual data now exists in digital repositories in the form of research articles, news articles, reviews, Wikipedia articles, and books, etc. Text clustering is a fundamental data mining technique to perform categorization, topic extraction, and information retrieval. Textual datasets, especially which contain a large number of documents are sparse and have high dimensionality. Hence, traditional clustering techniques such as K-means, Agglomerative clustering, and DBSCAN cannot perform well. In this paper, a clustering technique especially suitable to large text datasets is proposed that overcome these limitations. The proposed technique is based on word embeddings derived from a recent deep learning model named “Bidirectional Encoders Representations using Transformers”. The proposed technique is named as WEClustering. The proposed technique deals with the problem of high dimensionality in an effective manner, hence, more accurate clusters are formed. The technique is validated on several datasets of varying sizes and its performance is compared with other widely used and state of the art clustering techniques. The experimental comparison shows that the proposed clustering technique gives a significant improvement over other techniques as measured by metrics such Purity and Adjusted Rand Index.


2010 ◽  
Vol 30 (7) ◽  
pp. 1933-1935 ◽  
Author(s):  
Wen-ming ZHANG ◽  
Jiang WU ◽  
Xiao-jiao YUAN

2017 ◽  
Author(s):  
Su-Youn Yoon ◽  
Chong Min Lee ◽  
Ikkyu Choi ◽  
Xinhao Wang ◽  
Matthew Mulholland ◽  
...  

Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


2019 ◽  
Vol 3 (2) ◽  
pp. 159-183 ◽  
Author(s):  
Vijaya Kumari Yeruva ◽  
Sidrah Junaid ◽  
Yugyung Lee

Sign in / Sign up

Export Citation Format

Share Document