An Improved Clustering Method for Text Documents Using Neutrosophic Logic

Author(s):  
Nadeem Akhtar ◽  
Mohammad Naved Qureshi ◽  
Mohd Vasim Ahamad
2016 ◽  
Vol 36 (4) ◽  
pp. 197-203
Author(s):  
Wesam AbdulKarem Hamood ◽  
◽  
Mohammad Naved Qureshi

Author(s):  
Rizal Tjut Adek ◽  
Rozzy Kesuma Dinata ◽  
Ananda Ditha

The rapid progress in the field of information technology, especially the internet, has given birth to a lot of information. The ease of publishing an article on a website causes an explosion of news pages which will certainly confuse readers. The diversity and the increasing number of news articles make it increasingly difficult for internet users to find news and large piles of news data on online newspaper sites in Aceh. The grouping of text documents is needed to classify news in online newspapers in Aceh based on the content contained in news articles. In this study, the process of grouping online news in Aceh was tried using the Agglomerative Hierarchical Clustering method. News is grouped with a Bottom-Up design strategy that starts with placing each object as a cluster then combined into a larger cluster based on the similarity of keywords in each news, then the cluster results are compared and put into each news category. The research design was carried out in a structured manner using data flow diagrams in forming the research framework. The study was conducted by taking online news text data on 10 online news websites in Aceh from July 2016 to March 2017 with 1000 randomly generated documents. The process of crawling news data is done using a php script which will only take text files from the news on the website. News grouping is done based on religion, politics, law, sports, tourism, education, culture, economy and technology. The results of the grouping performance of the Agglomerative Hierarchical Clustering method in this study have an average accuracy of 89.84%.


2018 ◽  
Vol 7 (3) ◽  
pp. 213-224
Author(s):  
Rafał Woźniak ◽  
Piotr Ożdżyński ◽  
Danuta Zakrzewska

The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.


2017 ◽  
Vol 79 (5) ◽  
Author(s):  
Athraa Jasim Mohammed ◽  
Yuhanis Yusof ◽  
Husniza Husni

Text clustering is one of the text mining tasks that is employed in search engines. Discovering the optimal number of clusters for a dataset or repository is a challenging problem. Various clustering algorithms have been reported in the literature but most of them rely on a pre-defined value of the k clusters. In this study, a variant of Firefly algorithm, termed as FireflyClust, is proposed to automatically cluster text documents in a hierarchical manner. The proposed clustering method operates based on five phases: data pre-processing, clustering, item re-location, cluster selection and cluster refinement. Experiments are undertaken based on different selections of threshold value. Results on the TREC collection named TR11, TR12, TR23 and TR45, showed that the FireflyClust is a better approach than the Bisect K-means, hybrid Bisect K-means and Practical General Stochastic Clustering Method. Such a result would enlighten the directions in developing a better information retrieval engine for this dynamic and fast growing big data era.


Sign in / Sign up

Export Citation Format

Share Document