Text Classification Based on Nonlinear Dimensionality Reduction Techniques and Support Vector Machines

Author(s):  
Lukui Shi ◽  
Jun Zhang ◽  
Enhai Liu ◽  
Pilian He
2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Oliver Kramer

Cascade support vector machines have been introduced as extension of classic support vector machines that allow a fast training on large data sets. In this work, we combine cascade support vector machines with dimensionality reduction based preprocessing. The cascade principle allows fast learning based on the division of the training set into subsets and the union of cascade learning results based on support vectors in each cascade level. The combination with dimensionality reduction as preprocessing results in a significant speedup, often without loss of classifier accuracies, while considering the high-dimensional pendants of the low-dimensional support vectors in each new cascade level. We analyze and compare various instantiations of dimensionality reduction preprocessing and cascade SVMs with principal component analysis, locally linear embedding, and isometric mapping. The experimental analysis on various artificial and real-world benchmark problems includes various cascade specific parameters like intermediate training set sizes and dimensionalities.


Cutting edge improved techniques gave greater values to Artificial Intelligence (AI) and Machine Learning (ML) which are becoming a part of interest rapidly for numerous types of researches presently. Clustering and Dimensionality Reduction Techniques are one of the trending methods utilized in Machine Learning these days. Fundamentally clustering techniques such as K-means and Hierarchical is utilized to predict the data and put it into the required group in a cluster format. Clustering can be utilized in recommendation frameworks, examination of clients related to social media platforms, patients related to particular diseases of specific age groups can be categorized, etc. While most aspects of the dimensionality lessening method such as Principal Component Analysis and Linear Discriminant Analysis are a bit like the clustering method but it decreases the data size and plots the cluster. In this paper, a comparative and predictive analysis is done utilizing three different datasets namely IRIS, Wine, and Seed from the UCI benchmark in Machine learning on four distinctive techniques. The class prediction analysis of the dataset is done employing a flask-app. The main aim is to form a good clustering pattern for each dataset for given techniques. The experimental analysis calculates the accuracy of the shaped clusters used different machine learning classifiers namely Logistic Regression, K-nearest neighbors, Support Vector Machine, Gaussian Naïve Bayes, Decision Tree Classifier, and Random Forest Classifier. Cohen Kappa is another accuracy indicator used to compare the obtained classification result. It is observed that Kmeans and Hierarchical clustering analysis provide a good clustering pattern of the input dataset than the dimensionality reduction techniques. Clustering Design is well-formed in all the techniques. The KNN classifier provides an improved accuracy in all the techniques of the dataset.


Sign in / Sign up

Export Citation Format

Share Document