Support Vector Machines for Text Categorization in Chinese Question Classification

Author(s):  
Xu-dong Lin ◽  
Hong Peng ◽  
Bo Liu
Author(s):  
Cecilio Angulo ◽  
Luis Gonzalez-Abril

Support Vector Machines -- SVMs -- are learning machines, originally designed for bi-classification problems, implementing the well-known Structural Risk Minimization (SRM) inductive principle to obtain good generalization on a limited number of learning patterns (Vapnik, 1998). The optimization criterion for these machines is maximizing the margin between two classes, i.e. the distance between two parallel hyperplanes that split the vectors of each one of the two classes, since larger is the margin separating classes, smaller is the VC dimension of the learning machine, which theoretically ensures a good generalization performance (Vapnik, 1998), as it has been demonstrated in a number of real applications (Cristianini, 2000). In its formulation is applicable the kernel trick, which improves the capacity of these algorithms, learning not being directly performed in the original space of data but in a new space called feature space; for this reason this algorithm is one of the most representative of the called Kernel Machines (KMs). Main theory was originally developed on the sixties and seventies by V. Vapnik and A. Chervonenkis (Vapnik et al., 1963, Vapnik et al., 1971, Vapnik, 1995, Vapnik, 1998), on the basis of a separable binary classification problem, however generalization in the use of these learning algorithms did not take place until the nineties (Boser et al., 1992). SVMs has been used thoroughly in any kind of learning problems, mainly in classification problems, although also in other problems like regression (Schölkopf et al., 2004) or clustering (Ben-Hur et al., 2001). The fields of Optic Character Recognition (Cortes et al., 1995) and Text Categorization (Sebastiani, 2002) were the most important initial applications where SVMs were used. With the extended application of new kernels, novel applications have taken place in the field of Bioinformatics, concretely many works are related with the classification of data in Genetic Expression (Microarray Gene Expression) (Brown et al., 1997) and detecting structures between proteins and their relationship with the chains of DNA (Jaakkola et al., 2000). Other applications include image identification, voice recognition, prediction in time series, etc. A more extensive list of applications can be found in (Guyon, 2006).


2010 ◽  
pp. 1778-1787
Author(s):  
Dion Hoe-Lian Goh ◽  
Khasfariyati Razikin ◽  
Alton Y.K. Chua ◽  
Chei Sian Lee ◽  
Schubert Foo

Social tagging is the process of assigning and sharing among users freely selected terms of resources. This approach enables users to annotate/ describe resources, and also allows users to locate new resources through the collective intelligence of other users. Social tagging offers a new avenue for resource discovery as compared to taxonomies and subject directories created by experts. This chapter investigates the effectiveness of tags as resource descriptors and is achieved using text categorization via support vector machines (SVM). Two text categorization experiments were done for this research, and tags and Web pages from del.icio. us were used. The first study concentrated on the use of terms as its features while the second used both terms and its tags as part of its feature set. The experiments yielded a macroaveraged precision, recall, and F-measure scores of 52.66%, 54.86%, and 52.05%, respectively. In terms of microaveraged values, the experiments obtained 64.76% for precision, 54.40% for recall, and 59.14% for F-measure. The results suggest that the tags were not always reliable indicators of the resource contents. At the same time, the results from the terms-only experiment were better compared to the experiment with both terms and tags. Implications of our work and opportunities for future work are also discussed.


Author(s):  
Dion Hoe-Lian Goh ◽  
Khasfariyati Razikin ◽  
Alton Y.K. Chua ◽  
Chei Sian Lee ◽  
Schubert Foo

Social tagging is the process of assigning and sharing among users freely selected terms of resources. This approach enables users to annotate/describe resources, and also allows users to locate new resources through the collective intelligence of other users. Social tagging offers a new avenue for resource discovery as compared to taxonomies and subject directories created by experts. This chapter investigates the effectiveness of tags as resource descriptors and is achieved using text categorization via support vector machines (SVM). Two text categorization experiments were done for this research, and tags and Web pages from del.icio.us were used. The first study concentrated on the use of terms as its features while the second used both terms and its tags as part of its feature set. The experiments yielded a macroaveraged precision, recall, and F-measure scores of 52.66%, 54.86%, and 52.05%, respectively. In terms of microaveraged values, the experiments obtained 64.76% for precision, 54.40% for recall, and 59.14% for F-measure. The results suggest that the tags were not always reliable indicators of the resource contents. At the same time, the results from the terms-only experiment were better compared to the experiment with both terms and tags. Implications of our work and opportunities for future work are also discussed.


Sign in / Sign up

Export Citation Format

Share Document