scholarly journals Efficient sparse spherical k-means for document clustering

Author(s):  
Johannes Knittel ◽  
Steffen Koch ◽  
Thomas Ertl
Keyword(s):  
Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


Author(s):  
Ruina Bai ◽  
Ruizhang Huang ◽  
Yanping Chen ◽  
Yongbin Qin

2021 ◽  
pp. 106907
Author(s):  
Sahar Behpour ◽  
Mohammadmahdi Mohammadi ◽  
Mark V. Albert ◽  
Zinat S. Alam ◽  
Lingling Wang ◽  
...  

2021 ◽  
Vol 172 ◽  
pp. 114652
Author(s):  
Nabil Alami ◽  
Mohammed Meknassi ◽  
Noureddine En-nahnahi ◽  
Yassine El Adlouni ◽  
Ouafae Ammor

2021 ◽  
pp. 1063293X2098297
Author(s):  
Ivar Örn Arnarsson ◽  
Otto Frost ◽  
Emil Gustavsson ◽  
Mats Jirstrand ◽  
Johan Malmqvist

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.


2020 ◽  
Vol 10 (1) ◽  
pp. 1-33
Author(s):  
Ehsan Sherkat ◽  
Evangelos E. Milios ◽  
Rosane Minghim

Sign in / Sign up

Export Citation Format

Share Document