Support Vector Machines for Text Categorization in Chinese Question Classification

Support Vector Machines -- SVMs -- are learning machines, originally designed for bi-classification problems, implementing the well-known Structural Risk Minimization (SRM) inductive principle to obtain good generalization on a limited number of learning patterns (Vapnik, 1998). The optimization criterion for these machines is maximizing the margin between two classes, i.e. the distance between two parallel hyperplanes that split the vectors of each one of the two classes, since larger is the margin separating classes, smaller is the VC dimension of the learning machine, which theoretically ensures a good generalization performance (Vapnik, 1998), as it has been demonstrated in a number of real applications (Cristianini, 2000). In its formulation is applicable the kernel trick, which improves the capacity of these algorithms, learning not being directly performed in the original space of data but in a new space called feature space; for this reason this algorithm is one of the most representative of the called Kernel Machines (KMs). Main theory was originally developed on the sixties and seventies by V. Vapnik and A. Chervonenkis (Vapnik et al., 1963, Vapnik et al., 1971, Vapnik, 1995, Vapnik, 1998), on the basis of a separable binary classification problem, however generalization in the use of these learning algorithms did not take place until the nineties (Boser et al., 1992). SVMs has been used thoroughly in any kind of learning problems, mainly in classification problems, although also in other problems like regression (Schölkopf et al., 2004) or clustering (Ben-Hur et al., 2001). The fields of Optic Character Recognition (Cortes et al., 1995) and Text Categorization (Sebastiani, 2002) were the most important initial applications where SVMs were used. With the extended application of new kernels, novel applications have taken place in the field of Bioinformatics, concretely many works are related with the classification of data in Genetic Expression (Microarray Gene Expression) (Brown et al., 1997) and detecting structures between proteins and their relationship with the chains of DNA (Jaakkola et al., 2000). Other applications include image identification, voice recognition, prediction in time series, etc. A more extensive list of applications can be found in (Guyon, 2006).

Download Full-text

Least squares twin support vector machines for text categorization

2015 39th National Systems Conference (NSC) ◽

10.1109/natsys.2015.7489094 ◽

2015 ◽

Author(s):

M. Arun Kumar ◽

M. Gopal

Keyword(s):

Support Vector Machines ◽

Least Squares ◽

Text Categorization ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines

Download Full-text

A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

Special interest tracks and posters of the 14th international conference on World Wide Web - WWW '05 ◽

10.1145/1062745.1062854 ◽

2005 ◽

Cited By ~ 48

Author(s):

Man Lan ◽

Chew-Lim Tan ◽

Hwee-Boon Low ◽

Sam-Yuan Sung

Keyword(s):

Support Vector Machines ◽

Comparative Study ◽

Text Categorization ◽

Support Vector ◽

Term Weighting ◽

Weighting Schemes ◽

Vector Machines

Download Full-text

Virtual relevant documents in text categorization with support vector machines

Information Processing & Management ◽

10.1016/j.ipm.2006.08.010 ◽

2007 ◽

Vol 43 (4) ◽

pp. 902-913 ◽

Cited By ~ 9

Author(s):

Kyung-Soon Lee ◽

Kyo Kageura

Keyword(s):

Support Vector Machines ◽

Text Categorization ◽

Support Vector ◽

Vector Machines

Download Full-text

On the Effectiveness of Social Tagging for Resource Discovery

Social Computing ◽

10.4018/978-1-60566-984-7.ch116 ◽

2010 ◽

pp. 1778-1787

Author(s):

Dion Hoe-Lian Goh ◽

Khasfariyati Razikin ◽

Alton Y.K. Chua ◽

Chei Sian Lee ◽

Schubert Foo

Keyword(s):

Support Vector Machines ◽

Text Categorization ◽

Collective Intelligence ◽

Resource Discovery ◽

Social Tagging ◽

Support Vector ◽

Web Pages ◽

Vector Machines ◽

Future Work ◽

F Measure

Social tagging is the process of assigning and sharing among users freely selected terms of resources. This approach enables users to annotate/ describe resources, and also allows users to locate new resources through the collective intelligence of other users. Social tagging offers a new avenue for resource discovery as compared to taxonomies and subject directories created by experts. This chapter investigates the effectiveness of tags as resource descriptors and is achieved using text categorization via support vector machines (SVM). Two text categorization experiments were done for this research, and tags and Web pages from del.icio. us were used. The first study concentrated on the use of terms as its features while the second used both terms and its tags as part of its feature set. The experiments yielded a macroaveraged precision, recall, and F-measure scores of 52.66%, 54.86%, and 52.05%, respectively. In terms of microaveraged values, the experiments obtained 64.76% for precision, 54.40% for recall, and 59.14% for F-measure. The results suggest that the tags were not always reliable indicators of the resource contents. At the same time, the results from the terms-only experiment were better compared to the experiment with both terms and tags. Implications of our work and opportunities for future work are also discussed.

Download Full-text

On the Effectiveness of Social Tagging for Resource Discovery

Handbook of Research on Digital Libraries ◽

10.4018/978-1-59904-879-6.ch025 ◽

2009 ◽

pp. 251-260

Author(s):

Dion Hoe-Lian Goh ◽

Khasfariyati Razikin ◽

Alton Y.K. Chua ◽

Chei Sian Lee ◽

Schubert Foo

Keyword(s):

Support Vector Machines ◽

Text Categorization ◽

Collective Intelligence ◽

Resource Discovery ◽

Social Tagging ◽

Support Vector ◽

Web Pages ◽

Vector Machines ◽

Future Work ◽

F Measure

Social tagging is the process of assigning and sharing among users freely selected terms of resources. This approach enables users to annotate/describe resources, and also allows users to locate new resources through the collective intelligence of other users. Social tagging offers a new avenue for resource discovery as compared to taxonomies and subject directories created by experts. This chapter investigates the effectiveness of tags as resource descriptors and is achieved using text categorization via support vector machines (SVM). Two text categorization experiments were done for this research, and tags and Web pages from del.icio.us were used. The first study concentrated on the use of terms as its features while the second used both terms and its tags as part of its feature set. The experiments yielded a macroaveraged precision, recall, and F-measure scores of 52.66%, 54.86%, and 52.05%, respectively. In terms of microaveraged values, the experiments obtained 64.76% for precision, 54.40% for recall, and 59.14% for F-measure. The results suggest that the tags were not always reliable indicators of the resource contents. At the same time, the results from the terms-only experiment were better compared to the experiment with both terms and tags. Implications of our work and opportunities for future work are also discussed.

Download Full-text

Support Vector Machines for Text Categorization in Chinese Question Classification