MULTITOPIC TEXT CLUSTERING AND CLUSTER LABELING USING CONTEXTUALIZED WORD EMBEDDINGS

AbstractA massive amount of textual data now exists in digital repositories in the form of research articles, news articles, reviews, Wikipedia articles, and books, etc. Text clustering is a fundamental data mining technique to perform categorization, topic extraction, and information retrieval. Textual datasets, especially which contain a large number of documents are sparse and have high dimensionality. Hence, traditional clustering techniques such as K-means, Agglomerative clustering, and DBSCAN cannot perform well. In this paper, a clustering technique especially suitable to large text datasets is proposed that overcome these limitations. The proposed technique is based on word embeddings derived from a recent deep learning model named “Bidirectional Encoders Representations using Transformers”. The proposed technique is named as WEClustering. The proposed technique deals with the problem of high dimensionality in an effective manner, hence, more accurate clusters are formed. The technique is validated on several datasets of varying sizes and its performance is compared with other widely used and state of the art clustering techniques. The experimental comparison shows that the proposed clustering technique gives a significant improvement over other techniques as measured by metrics such Purity and Adjusted Rand Index.

Download Full-text

Improvement of Short Text Clustering Based on Weighted Word Embeddings

Web Information Systems and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60029-7_24 ◽

2020 ◽

pp. 260-267

Author(s):

Nan Yang ◽

Qing Liu ◽

Yaping Li

Keyword(s):

Text Clustering ◽

Word Embeddings ◽

Short Text ◽

Short Text Clustering

Download Full-text

Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction

10.3115/v1/n15-1155 ◽

2015 ◽

Cited By ~ 5

Author(s):

Mo Yu ◽

Matthew R. Gormley ◽

Mark Dredze

Keyword(s):

Relation Extraction ◽

Word Embeddings ◽

Fine Grained

Download Full-text

Inherently Interpretable Sparse Word Embeddings through Sparse Coding

10.36934/t2020-011 ◽

2020 ◽

Author(s):

Adly Templeton

Keyword(s):

Sparse Coding ◽

Word Embeddings

Download Full-text

K-means text clustering algorithm based on density and nearest neighbor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01933 ◽

2010 ◽

Vol 30 (7) ◽

pp. 1933-1935 ◽

Cited By ~ 6

Author(s):

Wen-ming ZHANG ◽

Jiang WU ◽

Xiao-jiao YUAN

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Text Clustering

Download Full-text

Off-Topic Spoken Response Detection with Word Embeddings

10.21437/interspeech.2017-388 ◽

2017 ◽

Author(s):

Su-Youn Yoon ◽

Chong Min Lee ◽

Ikkyu Choi ◽

Xinhao Wang ◽

Matthew Mulholland ◽

...

Keyword(s):

Word Embeddings ◽

Response Detection

Download Full-text

An Improved B-hill Climbing Optimization Technique for Solving the Text Documents Clustering Problem

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405614666180903112541 ◽

2020 ◽

Vol 16 (4) ◽

pp. 296-306 ◽

Cited By ~ 3

Author(s):

Laith Mohammad Abualigah ◽

Essam Said Hanandeh ◽

Ahamad Tajudin Khader ◽

Mohammed Abdallh Otair ◽

Shishir Kumar Shandilya

Keyword(s):

Optimization Technique ◽

Document Clustering ◽

Text Clustering ◽

Hill Climbing ◽

Text Documents ◽

Clustering Problem ◽

Text Document ◽

Text Information ◽

Amount Of Knowledge ◽

The Hill

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity

Journal of Healthcare Informatics Research ◽

10.1007/s41666-019-00052-5 ◽

2019 ◽

Vol 3 (2) ◽

pp. 159-183 ◽

Cited By ~ 1

Author(s):

Vijaya Kumari Yeruva ◽

Sidrah Junaid ◽

Yugyung Lee

Keyword(s):

Topic Modeling ◽

Word Embeddings

Download Full-text