Support Vector Machines and Kernel Functions for Text Processing

This work presents kernel functions that can be used in conjunction with the Support Vector Machine – SVM – learning algorithm to solve the automatic text classification task. Initially the Vector Space Model for text processing is presented. According to this model text is seen as a set of vectors in a high dimensional space; then extensions and alternative models are derived, and some preprocessing procedures are discussed. The SVM learning algorithm, largely employed for text classification, is outlined: its decision procedure is obtained as a solution of an optimization problem. The “kernel trick”, that allows the algorithm to be applied in non-linearly separable cases, is presented, as well as some kernel functions that are currently used in text applications. Finally some text classification experiments employing the SVM classifier are conducted, in order to illustrate some text preprocessing techniques and the presented kernel functions.

Download Full-text

A Kernel-Based Approach for Biomedical Named Entity Recognition

The Scientific World JOURNAL ◽

10.1155/2013/950796 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Rakesh Patra ◽

Sujan Kumar Saha

Keyword(s):

Kernel Function ◽

Text Processing ◽

Named Entity Recognition ◽

Kernel Functions ◽

Entity Recognition ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Named Entity ◽

Tree Kernel

Support vector machine (SVM) is one of the popular machine learning techniques used in various text processing tasks including named entity recognition (NER). The performance of the SVM classifier largely depends on the appropriateness of the kernel function. In the last few years a number of task-specific kernel functions have been proposed and used in various text processing tasks, for example, string kernel, graph kernel, tree kernel and so on. So far very few efforts have been devoted to the development of NER task specific kernel. In the literature we found that the tree kernel has been used in NER task only for entity boundary detection or reannotation. The conventional tree kernel is unable to execute the complete NER task on its own. In this paper we have proposed a kernel function, motivated by the tree kernel, which is able to perform the complete NER task. To examine the effectiveness of the proposed kernel, we have applied the kernel function on the openly available JNLPBA 2004 data. Our kernel executes the complete NER task and achieves reasonable accuracy.

Download Full-text

Automated News Classification using N-gram Model and Key Features of Nepali Language

SCITECH Nepal ◽

10.3126/scitech.v13i1.23504 ◽

2018 ◽

Vol 13 (1) ◽

pp. 64-69

Author(s):

Dinesh Dangol ◽

Rupesh Dahi Shrestha ◽

Arun Timalsina

Keyword(s):

Text Classification ◽

English Language ◽

Promising Result ◽

Text Processing ◽

Automatic Text Classification ◽

Key Features ◽

Research Experiment ◽

N Gram ◽

Automated Text Processing ◽

Automatic Text

With an increasing trend of publishing news online on website, automatic text processing becomes more and more important. Automatic text classification has been a focus of many researchers in different languages for decades. There is a huge amount of research repository on features of English language and their uses on automated text processing. This research implements Nepali language key features for automatic text classification of Nepali news. In particular, the study on impact of Nepali language based features, which are extremely different than English language is more challenging because of the higher level of complexity to be resolved. The research experiment using vector space model, n-gram model and key feature based processing specific to Nepali language shows promising result compared to bag-of-words model for the task of automated Nepali news classification.

Download Full-text

Algorithms of Hard c-Means Clustering Using Kernel Functions in Support Vector Machines

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2003.p0019 ◽

2003 ◽

Vol 7 (1) ◽

pp. 19-24 ◽

Cited By ~ 17

Author(s):

Sadaaki Miyamoto ◽

◽

Youichi Nakayama ◽

Keyword(s):

Support Vector Machines ◽

Dimensional Space ◽

Iterative Algorithms ◽

Kernel Functions ◽

High Dimensional ◽

Support Vector ◽

High Dimensional Space ◽

Numerical Examples ◽

Vector Machines

We discuss hard c-means clustering using a mapping into a high-dimensional space considered within the theory of support vector machines. Two types of iterative algorithms are developed. Effectiveness of the proposed method is shown by numerical examples.

Download Full-text

Stroke Outcome Measurements From Electronic Medical Records: Cross-sectional Study on the Effectiveness of Neural and Nonneural Classifiers

JMIR Medical Informatics ◽

10.2196/29120 ◽

2021 ◽

Vol 9 (11) ◽

pp. e29120

Author(s):

Bruna Stella Zanotto ◽

Ana Paula Beck da Silva Etges ◽

Avner dal Bosco ◽

Eduardo Gabriel Cortes ◽

Renata Ruschel ◽

...

Keyword(s):

Machine Learning ◽

Electronic Medical Records ◽

Text Classification ◽

Medical Records ◽

State Of The Art ◽

Support Vector ◽

Data Set ◽

Automatic Text Classification ◽

Baseline Characteristics ◽

Automatic Text

Background With the rapid adoption of electronic medical records (EMRs), there is an ever-increasing opportunity to collect data and extract knowledge from EMRs to support patient-centered stroke management. Objective This study aims to compare the effectiveness of state-of-the-art automatic text classification methods in classifying data to support the prediction of clinical patient outcomes and the extraction of patient characteristics from EMRs. Methods Our study addressed the computational problems of information extraction and automatic text classification. We identified essential tasks to be considered in an ischemic stroke value-based program. The 30 selected tasks were classified (manually labeled by specialists) according to the following value agenda: tier 1 (achieved health care status), tier 2 (recovery process), care related (clinical management and risk scores), and baseline characteristics. The analyzed data set was retrospectively extracted from the EMRs of patients with stroke from a private Brazilian hospital between 2018 and 2019. A total of 44,206 sentences from free-text medical records in Portuguese were used to train and develop 10 supervised computational machine learning methods, including state-of-the-art neural and nonneural methods, along with ontological rules. As an experimental protocol, we used a 5-fold cross-validation procedure repeated 6 times, along with subject-wise sampling. A heatmap was used to display comparative result analyses according to the best algorithmic effectiveness (F1 score), supported by statistical significance tests. A feature importance analysis was conducted to provide insights into the results. Results The top-performing models were support vector machines trained with lexical and semantic textual features, showing the importance of dealing with noise in EMR textual representations. The support vector machine models produced statistically superior results in 71% (17/24) of tasks, with an F1 score >80% regarding care-related tasks (patient treatment location, fall risk, thrombolytic therapy, and pressure ulcer risk), the process of recovery (ability to feed orally or ambulate and communicate), health care status achieved (mortality), and baseline characteristics (diabetes, obesity, dyslipidemia, and smoking status). Neural methods were largely outperformed by more traditional nonneural methods, given the characteristics of the data set. Ontological rules were also effective in tasks such as baseline characteristics (alcoholism, atrial fibrillation, and coronary artery disease) and the Rankin scale. The complementarity in effectiveness among models suggests that a combination of models could enhance the results and cover more tasks in the future. Conclusions Advances in information technology capacity are essential for scalability and agility in measuring health status outcomes. This study allowed us to measure effectiveness and identify opportunities for automating the classification of outcomes of specific tasks related to clinical conditions of stroke victims, and thus ultimately assess the possibility of proactively using these machine learning techniques in real-world situations.

Download Full-text

Improvised Admissible Kernel Function for Support Vector Machines in Banach Space for Multiclass Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v11i2.1173 ◽

2013 ◽

Vol 11 (2) ◽

pp. 2273-2278

Author(s):

Sangeetha Rajendran ◽

B. Kalpana

Keyword(s):

Banach Space ◽

Learning Theory ◽

Intelligent Systems ◽

Learning Algorithm ◽

Kernel Functions ◽

Learning Ability ◽

Support Vector ◽

Training Time ◽

Vector Machines ◽

Optimal Kernel

Classification based on supervised learning theory is one of the most significant tasks frequently accomplished by so-called Intelligent Systems. Contrary to the traditional classification techniques that are used to validate or contradict a predefined hypothesis, kernel based classifiers offer the possibility to frame new hypotheses using statistical learning theory (Sangeetha and Kalpana, 2010). Support Vector Machine (SVM) is a standard kernel based learning algorithm where it improves the learning ability through experience. It is highly accurate, robust and optimal kernel based classification technique that is well-suited to many real time applications. In this paper, kernel functions related to Hilbert space and Banach Space are explained. Here, the experimental results are carried out using benchmark multiclass datasets which are taken from UCI Machine Learning Repository and their performance are compared using various metrics like support vector, support vector percentage, training time and accuracy.

Download Full-text

English Article Style Recognition and Matching by Using Web Semantics

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.293751 ◽

2022 ◽

Vol 13 (2) ◽

pp. 0-0

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Traditional Method ◽

Utilization Rate ◽

Support Vector ◽

Automatic Text Classification ◽

Internet Information ◽

The Face ◽

Massive Information ◽

Automatic Text

With the explosion of internet information, people feel helpless and difficult to choose in the face of massive information. However, the traditional method to organize a huge set of original documents is not only time-consuming and laborious, but also not ideal. The automatic text classification can liberate users from the tedious document processing work, recognize and distinguish different document contents more conveniently, make a large number of complicated documents institutionalized and systematized, and greatly improve the utilization rate of information. This paper adopts termed-based model to extract the features in web semantics to represent document. The extracted web semantics features are used to learn a reduced support vector machine. The experimental results show that the proposed method can correctly identify most of the writing styles.

Download Full-text

IMPLEMENTATION OF SUPPORT VECTOR MACHINES FOR CLASSIFICATION OF CLINICAL DATASETS

International Journal of Information Acquisition ◽

10.1142/s0219878910002269 ◽

2010 ◽

Vol 07 (04) ◽

pp. 347-356

Author(s):

E. SIVASANKAR ◽

R. S. RAJESH

Keyword(s):

Feature Extraction ◽

Dimensional Space ◽

Principal Component ◽

Kernel Functions ◽

Support Vector ◽

Optimal Hyperplane ◽

Higher Dimensional ◽

Vector Machines ◽

Map Data

In this paper, Principal Component Analysis is used for feature extraction, and a statistical learning based Support Vector Machine is designed for functional classification of clinical data. Appendicitis data collected from BHEL Hospital, Trichy is taken and classified under three classes. Feature extraction transforms the data in the high-dimensional space to a space of fewer dimensions. The classification is done by constructing an optimal hyperplane that separates the members from the nonmembers of the class. For linearly nonseparable data, Kernel functions are used to map data to a higher dimensional space and there the optimal hyperplane is found. This paper works with different SVMs based on radial basis and polynomial kernels, and their performances are compared.

Download Full-text

The effect of gamma value on support vector machine performance with different kernels

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i5.pp5497-5506 ◽

2020 ◽

Vol 10 (5) ◽

pp. 5497

Author(s):

Intisar Shadeed Al-Mejibli ◽

Jwan K. Alwan ◽

Dhafar Hamed Abd

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithm ◽

Kernel Functions ◽

Supervised Machine Learning ◽

Support Vector ◽

Svm Classifier ◽

Machine Performance ◽

Rbf Kernel ◽

The Impact

Currently, the support vector machine (SVM) regarded as one of supervised machine learning algorithm that provides analysis of data for classification and regression. This technique is implemented in many fields such as bioinformatics, face recognition, text and hypertext categorization, generalized predictive control and many other different areas. The performance of SVM is affected by some parameters, which are used in the training phase, and the settings of parameters can have a profound impact on the resulting engine’s implementation. This paper investigated the SVM performance based on value of gamma parameter with used kernels. It studied the impact of gamma value on (SVM) efficiency classifier using different kernels on various datasets descriptions. SVM classifier has been implemented by using Python. The kernel functions that have been investigated are polynomials, radial based function (RBF) and sigmoid. UC irvine machine learning repository is the source of all the used datasets. Generally, the results show uneven effect on the classification accuracy of three kernels on used datasets. The changing of the gamma value taking on consideration the used dataset influences polynomial and sigmoid kernels. While the performance of RBF kernel function is more stable with different values of gamma as its accuracy is slightly changed.

Download Full-text

A Portable, Wireless Photoplethysomography Sensor for Assessing Health of Arteriovenous Fistula Using Class-Weighted Support Vector Machine

Sensors ◽

10.3390/s18113854 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3854 ◽

Cited By ~ 1

Author(s):

Paul Chao ◽

Pei-Yu Chiang ◽

Yung-Hua Kao ◽

Tse-Yi Tu ◽

Chih-Yu Yang ◽

...

Keyword(s):

Arteriovenous Fistula ◽

Dynamic Range ◽

Digital Signal ◽

Kernel Functions ◽

Support Vector ◽

Dimensionless Number ◽

Svm Classifier ◽

Type Ii Error ◽

Vector Machines ◽

Signal Processing Algorithms

A portable, wireless photoplethysomography (PPG) sensor for assessing arteriovenous fistula (AVF) by using class-weighted support vector machines (SVM) was presented in this study. Nowadays, in hospital, AVF are assessed by ultrasound Doppler machines, which are bulky, expensive, complicated-to-operate, and time-consuming. In this study, new PPG sensors were proposed and developed successfully to provide portable and inexpensive solutions for AVF assessments. To develop the sensor, at first, by combining the dimensionless number analysis and the optical Beer Lambert’s law, five input features were derived for the SVM classifier. In the next step, to increase the signal-noise ratio (SNR) of PPG signals, the front-end readout circuitries were designed to fully use the dynamic range of analog-digital converter (ADC) by controlling the circuitries gain and the light intensity of light emitted diode (LED). Digital signal processing algorithms were proposed next to check and fix signal anomalies. Finally, the class-weighted SVM classifiers employed five different kernel functions to assess AVF quality. The assessment results were provided to doctors for diagonosis and detemining ensuing proper treatments. The experimental results showed that the proposed PPG sensors successfully achieved an accuracy of 89.11% in assessing health of AVF and with a type II error of only 9.59%.

Download Full-text