A Visual Analytics Approach for Interactive Document Clustering

2020 ◽  
Vol 10 (1) ◽  
pp. 1-33
Author(s):  
Ehsan Sherkat ◽  
Evangelos E. Milios ◽  
Rosane Minghim
2013 ◽  
Vol 22 (05) ◽  
pp. 1360008 ◽  
Author(s):  
PATRICIA J. CROSSNO ◽  
ANDREW T. WILSON ◽  
TIMOTHY M. SHEAD ◽  
WARREN L. DAVIS ◽  
DANIEL M. DUNLAVY

We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora.


Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


Heliyon ◽  
2020 ◽  
Vol 6 (12) ◽  
pp. e05733
Author(s):  
Mohammad Fadhli Asli ◽  
Muzaffar Hamzah ◽  
Ag Asri Ag Ibrahim ◽  
Enna Ayub
Keyword(s):  

Author(s):  
Ruina Bai ◽  
Ruizhang Huang ◽  
Yanping Chen ◽  
Yongbin Qin

Sign in / Sign up

Export Citation Format

Share Document