An Intelligent framework for E-Recruitment System Based on Text Categorization and Semantic Analysis

Author(s):  
Razkeen Shaikh ◽  
Nikita Phulkar ◽  
Harsha Bhute ◽  
Sana Kauser Shaikh ◽  
Prajakta Bhapkar
2014 ◽  
Vol 4 (3) ◽  
pp. 1-13
Author(s):  
Khadoudja Ghanem

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.


2009 ◽  
Vol 18 (02) ◽  
pp. 239-272 ◽  
Author(s):  
SUJEEVAN ASEERVATHAM

Kernels are widely used in Natural Language Processing as similarity measures within inner-product based learning methods like the Support Vector Machine. The Vector Space Model (VSM) is extensively used for the spatial representation of the documents. However, it is purely a statistical representation. In this paper, we present a Concept Vector Space Model (CVSM) representation which uses linguistic prior knowledge to capture the meanings of the documents. We also propose a linear kernel and a latent kernel for this space. The linear kernel takes advantage of the linguistic concepts whereas the latent kernel combines statistical and linguistic concepts. Indeed, the latter kernel uses latent concepts extracted by the Latent Semantic Analysis (LSA) in the CVSM. The kernels were evaluated on a text categorization task in the biomedical domain. The Ohsumed corpus, well known for being difficult to categorize, was used. The results have shown that the CVSM improves performance compared to the VSM.


2011 ◽  
Vol 32 (3) ◽  
pp. 441-448 ◽  
Author(s):  
Zhixing Li ◽  
Zhongyang Xiong ◽  
Yufang Zhang ◽  
Chunyong Liu ◽  
Kuan Li

2009 ◽  
Vol 34 ◽  
pp. 443-498 ◽  
Author(s):  
E. Gabrilovich ◽  
S. Markovitch

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.


Author(s):  
Khadoudja Ghanem

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.


Sign in / Sign up

Export Citation Format

Share Document