scholarly journals Text Classification Using Ensemble Of Non-Linear Support Vector Machines

With the advent of digital era, billions of the documents generate every day that need to be managed, processed and classified. Enormous size of text data is available on world wide web and other sources. As a first step of managing this mammoth data is the classification of available documents in right categories. Supervised machine learning approaches try to solve the problem of document classification but working on large data sets of heterogeneous classes is a big challenge. Automatic tagging and classification of the text document is a useful task due to its many potential applications such as classifying emails into spam or non-spam categories, news articles into political, entertainment, stock market, sports news, etc. The paper proposes a novel approach for classifying the text into known classes using an ensemble of refined Support Vector Machines. The advantage of proposed technique is that it can considerably reduce the size of the training data by adopting dimensionality reduction as pre-training step. The proposed technique has been used on three bench-marked data sets namely CMU Dataset, 20 Newsgroups Dataset, and Classic Dataset. Experimental results show that proposed approach is more accurate and efficient as compared to other state-of-the-art methods.

2011 ◽  
Vol 383-390 ◽  
pp. 925-930
Author(s):  
Chun Cheng Zhang ◽  
Xiang Guang Chen ◽  
Yuan Qing Xu

In order to improve the forecasting accuracy of indoor thermal comfort, the basic principle of fuzzy c-means clustering algorithm (FCM) and support vector machines (SVM) is analyzed. A kind of SVM forecasting method based on FCM data preprocess is proposed in this paper. The large data sets can be divided into multiple mixed groups and each group is represented by a single regression model using the proposed method. The support vector machines based on fuzzy c-means clustering algorithm (FCM+SVM) and the BP neural network based on fuzzy c-means clustering algorithm (FCM+BPNN) are respectively applied to forecast PMV index. The experimental results demonstrate that the FCM+SVM method has better forecasting accuracy compared with FCM+BPNN method.


Author(s):  
RONAN COLLOBERT ◽  
YOSHUA BENGIO ◽  
SAMY BENGIO

A challenge for statistical learning is to deal with large data sets, e.g. in data mining. The training time of ordinary Support Vector Machines is at least quadratic, which raises a serious research challenge if we want to deal with data sets of millions of examples. We propose a "hard parallelizable mixture" methodology which yields significantly reduced training time through modularization and parallelization: the training data is iteratively partitioned by a "gater" model in such a way that it becomes easy to learn an "expert" model separately in each region of the partition. A probabilistic extension and the use of a set of generative models allows representing the gater so that all pieces of the model are locally trained. For SVMs, time complexity appears empirically to local growth linearly with the number of examples, while generalization performance can be enhanced. For the probabilistic version of the algorithm, the iterative algorithm probably goes down in a cost function that is an upper bound on the negative log-likelihood.


Author(s):  
Marianne Maktabi ◽  
Hannes Köhler ◽  
Magarita Ivanova ◽  
Thomas Neumuth ◽  
Nada Rayes ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document