Document Categorization Using Graph Structuring

Author(s):  
Sandipan Sarma ◽  
Punyajoy Saha ◽  
Jaya Sil
Algorithms ◽  
2016 ◽  
Vol 9 (2) ◽  
pp. 27 ◽  
Author(s):  
Abdullah Ayedh ◽  
Guanzheng TAN ◽  
Khaled Alwesabi ◽  
Hamdi Rajeh

2006 ◽  
Vol 42 (2) ◽  
pp. 387-406 ◽  
Author(s):  
Sun Lee Bang ◽  
Jae Dong Yang ◽  
Hyung Jeong Yang

Author(s):  
Vimuktha E. Salis ◽  
Ranjana S. Chakrasali ◽  
Chowdaiah Pathanjali

Author(s):  
Jan Žižka ◽  
František Dařena

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.


2014 ◽  
Vol 4 (3) ◽  
pp. 1-13
Author(s):  
Khadoudja Ghanem

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.


Sign in / Sign up

Export Citation Format

Share Document