Efficient sparse spherical k-means for document clustering

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Deep Multi-view Document Clustering with Enhanced Semantic Embedding

Information Sciences ◽

10.1016/j.ins.2021.02.027 ◽

2021 ◽

Author(s):

Ruina Bai ◽

Ruizhang Huang ◽

Yanping Chen ◽

Yongbin Qin

Keyword(s):

Document Clustering ◽

Semantic Embedding

Download Full-text

Automatic trend detection: Time-biased document clustering

Knowledge-Based Systems ◽

10.1016/j.knosys.2021.106907 ◽

2021 ◽

pp. 106907

Author(s):

Sahar Behpour ◽

Mohammadmahdi Mohammadi ◽

Mark V. Albert ◽

Zinat S. Alam ◽

Lingling Wang ◽

...

Keyword(s):

Document Clustering ◽

Trend Detection ◽

Detection Time

Download Full-text

Similarity Measure Approaches Applied in Text Document Clustering for Information Retrieval

2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC) ◽

10.1109/pdgc50313.2020.9315851 ◽

2020 ◽

Author(s):

Naveen Kumar ◽

Sanjay Kumar Yadav ◽

Divakar Singh Yadav

Keyword(s):

Information Retrieval ◽

Similarity Measure ◽

Document Clustering ◽

Text Document

Download Full-text

Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling

Expert Systems with Applications ◽

10.1016/j.eswa.2021.114652 ◽

2021 ◽

Vol 172 ◽

pp. 114652

Author(s):

Nabil Alami ◽

Mohammed Meknassi ◽

Noureddine En-nahnahi ◽

Yassine El Adlouni ◽

Ouafae Ammor

Keyword(s):

Neural Networks ◽

Topic Modeling ◽

Document Clustering ◽

Text Summarization ◽

Arabic Text ◽

Arabic Text Summarization ◽

Unsupervised Neural Networks

Download Full-text

Natural language processing methods for knowledge management—Applying document clustering for fast search and grouping of engineering documents

Concurrent Engineering ◽

10.1177/1063293x20982973 ◽

2021 ◽

pp. 1063293X2098297

Author(s):

Ivar Örn Arnarsson ◽

Otto Frost ◽

Emil Gustavsson ◽

Mats Jirstrand ◽

Johan Malmqvist

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Clustering Algorithms ◽

Document Clustering ◽

Unstructured Data ◽

Free Text ◽

Engineering Change ◽

Engineering Documents

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.

Download Full-text