Semantic Similarity Metric and its Application in Text Classification
Text classification is the task of assigning natural language textual documents to predefined categories based on their context. The main concern is this paper is to improve the accuracy of text classification system combined an improved CHI method and semantic similarity metric. Firstly, use an improved CHI method to select features from the raw features aim to reduce the dimensions of the features. Secondly, calculates the semantic distance between text feature vector and categorization feature vector so as to determine the document categorization. Finally, we carried out a series of experiments compared with other methods using the F1-measure. Experimental results show that our new method makes an important improvement in all categories.