KEYWORD SPOTTING FROM ONLINE CHINESE HANDWRITTEN DOCUMENTS USING ONE-VERSUS-ALL CHARACTER CLASSIFICATION MODEL

In this paper, we propose a method for text-query-based keyword spotting from online Chinese handwritten documents using character classification model. The similarity between the query word and handwriting is obtained by combining the character classification scores. The classifier is trained by one-versus-all strategy so that it gives high similarity to the target class and low scores to the others. Using character classification-based word similarity also helps overcome the out-of-vocabulary (OOV) problem. We use a character-synchronous dynamic search algorithm to efficiently spot the query word in large database. The retrieval performance is further improved by using competing character confusion and writer-adaptive thresholds. Our experimental results on a large handwriting database CASIA-OLHWDB justify the superiority of one-versus-all trained classifiers and the benefits of confidence transformation, character confusion and adaptive thresholds. Particularly, a one-versus-all trained prototype classifier performs as well as a linear support vector machine (SVM) classifier, but consumes much less storage of index file. The experimental comparison with keyword spotting based on handwritten text recognition also demonstrates the effectiveness of the proposed method.

Download Full-text

A New SVM Kernel for Keyword Spotting Using Confidence Measures

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500104 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550010 ◽

Cited By ~ 1

Author(s):

Yassine Ben Ayed

Keyword(s):

Support Vector Machines ◽

Hidden Markov ◽

Support Vector ◽

Svm Classifier ◽

Keyword Spotting ◽

Confidence Measure ◽

Confidence Measures ◽

Vector Machines ◽

Harmonic Means ◽

Speech Recognizer

In this paper, we propose an alternative keyword spotting method relying on confidence measures and support vector machines. Confidence measures are computed from phone information provided by a Hidden Markov Model based speech recognizer. We use three kinds of techniques, i.e., arithmetic, geometric and harmonic means to compute a confidence measure for each word. The acceptance/rejection decision of a word is based on the confidence vector processed by the SVM classifier for which we propose a new Beta kernel. The performance of the proposed SVM classifier is compared with spotting methods based on some confidence means. Experimental results presented in this paper show that the proposed SVM classifier method improves the performances of the keyword spotting system.

Download Full-text

Data Mining Technology Application in False Text Information Recognition

Mobile Information Systems ◽

10.1155/2021/4206424 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Jie Wan ◽

Xue Cao ◽

Kun Yao ◽

Donghui Yang ◽

E. Peng ◽

...

Keyword(s):

Data Mining ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Characteristic Matrix ◽

Mining Technology ◽

Technology Application ◽

Text Information ◽

The Government ◽

Effect Of The Support

False information on the Internet is being heralded as serious social harm to our society. To recognize false text information, in this paper, an effective method for mining text features is proposed in the field of false drug advertisements. Firstly, the data of false drug advertisements and real drug advertisements were collected from the official websites to build a database of false and real drug advertisements. Secondly, by performing feature extraction on the text of drug advertisements, this work built a characteristic matrix based on the effective features and assigned positive or negative labels to the feature vector of the matrix according to whether it is a fake medical advertisement or not. Thirdly, this study trained and tested several different classifiers, selected the classification model with the best performance in identifying false drug advertisements, and found the key characteristics that can determine the classification. Finally, the model with the best performance was used to predict new false drug advertisements collected from Sina Weibo. In the case of identifying false drug advertisements, the classification effect of the support vector machine (SVM) classifier established on the feature set after feature selection was the most effective. The findings of this study can provide an effective method for the government to identify and combat false advertisements. This study has a certain reference significance in demonstrating the use of text data mining technology to identify and detect information fraud behavior.

Download Full-text

A Study of Supplier Selection Method Based on SVM for Weighting Expert Evaluation

Discrete Dynamics in Nature and Society ◽

10.1155/2021/8056209 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Li Zhao ◽

Wenjing Qi ◽

Meihong Zhu

Keyword(s):

Supplier Selection ◽

Strategic Decision ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Preference Order ◽

Expert Evaluation ◽

Learning To Learn ◽

Evaluation Data ◽

Evaluation Information

How to choose suppliers scientifically is an important part of strategic decision-making management of enterprises. Expert evaluation is subjective and uncontrollable; sometimes, there exists biased evaluation, which will lead to controversial or unfair results in supplier selection. To tackle this problem, this paper proposes a novel method that employs machine learning to learn the credibility of expert from historical data, which is converted to weights in evaluation process. We first use the Support Vector Machine (SVM) classifier to classify the historical evaluation data of experts and calculate the experts’ evaluation credibility, then determine the weights of the evaluation experts, finally assemble the weighted evaluation results, and get a preference order of choosing suppliers. The main contribution of this method is that it overcomes the shortcomings of multiple conversions and large loss on evaluation information, maintains the initial evaluation information to the maximum extent, and improves the credibility of evaluation results and the fairness and scientificity of supplier selection. The results show that it is feasible to classify the past evaluation data of the evaluation experts by the SVM classification model, and the expert weights determined on the basis of the evaluation credibility of experts are adjustable.

Download Full-text

Morphological Neuroimaging Biomarkers for Tinnitus: Evidence Obtained by Applying Machine Learning

Neural Plasticity ◽

10.1155/2019/1712342 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11

Author(s):

Yawen Liu ◽

Haijun Niu ◽

Jianming Zhu ◽

Pengfei Zhao ◽

Hongxia Yin ◽

...

Keyword(s):

Machine Learning ◽

Healthy Subjects ◽

Morphological Changes ◽

Gray Matter Volume ◽

Brain Regions ◽

Classification Model ◽

Middle Temporal Gyrus ◽

Support Vector ◽

Svm Classifier ◽

Neuroimaging Biomarkers

According to previous studies, many neuroanatomical alterations have been detected in patients with tinnitus. However, the results of these studies have been inconsistent. The objective of this study was to explore the cortical/subcortical morphological neuroimaging biomarkers that may characterize idiopathic tinnitus using machine learning methods. Forty-six patients with idiopathic tinnitus and fifty-six healthy subjects were included in this study. For each subject, the gray matter volume of 61 brain regions was extracted as an original feature pool. From this feature pool, a hybrid feature selection algorithm combining the F-score and sequential forward floating selection (SFFS) methods was performed to select features. Then, the selected features were used to train a support vector machine (SVM) model. The area under the curve (AUC) and accuracy were used to assess the performance of the classification model. As a result, a combination of 13 cortical/subcortical brain regions was found to have the highest classification accuracy for effectively differentiating patients with tinnitus from healthy subjects. These brain regions include the bilateral hypothalamus, right insula, bilateral superior temporal gyrus, left rostral middle frontal gyrus, bilateral inferior temporal gyrus, right inferior parietal lobule, right transverse temporal gyrus, right middle temporal gyrus, right cingulate gyrus, and left superior frontal gyrus. The accuracy in the training and test datasets was 80.49% and 80.00%, respectively, and the AUC was 0.8586. To the best of our knowledge, this is the first study to elucidate brain morphological changes in patients with tinnitus by applying an SVM classifier. This study provides validated cortical/subcortical morphological neuroimaging biomarkers to differentiate patients with tinnitus from healthy subjects and contributes to the understanding of neuroanatomical alterations in patients with tinnitus.

Download Full-text

Categorization of Common Pigmented Skin Lesions (CPSL) using Multi-Deep Features and Support Vector Machine

10.21203/rs.3.rs-136988/v1 ◽

2021 ◽

Author(s):

SANTI BEHERA ◽

PRABIRA SETHY

Keyword(s):

Support Vector Machine ◽

Skin Cancer ◽

Principal Component ◽

Skin Lesions ◽

Classification Model ◽

Healthcare Sector ◽

Support Vector ◽

Svm Classifier ◽

Deep Feature ◽

Pigmented Skin Lesions

Abstract The skin is the main organ. It is approximately 8 pounds for the average adult. Our skin is a truly wonderful organ. It isolates us and shields our bodies from hazards. However, the skin is also vulnerable to damage and distracted from its original appearance; brown, black, or blue, or combinations of those colors, known as pigmented skin lesions. These common pigmented skin lesions (CPSL) are the leading factor of skin cancer, or can say these are the primary causes of skin cancer. In the healthcare sector, the categorization of CPSL is the main problem because of inaccurate outputs, overfitting, and higher computational costs. Hence, we proposed a classification model based on multi-deep feature and support vector machine (SVM) for the classification of CPSL. The proposed system comprises two phases: first, evaluate the 11 CNN model's performance in the deep feature extraction approach with SVM. Then, concatenate the top performed three CNN model's deep features and with the help of SVM to categorize the CPSL. In the second step, 8192 and 12288 features are obtained by combining binary and triple networks of 4096 features from the top performed CNN model. These features are also given to the SVM classifiers. The SVM results are also evaluated with principal component analysis (PCA) algorithm to the combined feature of 8192 and 12288. The highest results are obtained with 12288 features. The experimentation results, the combination of the deep feature of Alexnet, VGG16 & VGG19, achieved the highest accuracy of 91.7% using SVM classifier. As a result, the results show that the proposed methods are a useful tool for CPSL classification.

Download Full-text

Power Transformer Insulation Assessment Based on Oil-Paper Measurement Data Using SVM-Classifier

10.20944/preprints201806.0002.v1 ◽

2018 ◽

Author(s):

Suwarno ◽

Rahman A. Prasojo

Keyword(s):

Power Transformer ◽

Measurement Data ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Classification Analysis ◽

Dielectric Characteristics ◽

Paper Condition ◽

Paper Insulation ◽

Insulation Condition

Oil immersed paper insulation condition is a crucial aspect of power transformer’s life condition diagnostic. The measurement testing database collected over the years made it possible for researchers to implement classification analysis to in-service power transformer. This article presents classification analysis of transformer oil-immersed paper insulation condition. The measurements data (dielectric characteristics, dissolved gas analysis, and furanic compounds) of 149 transformers with primary voltage of 150 kV had been gathered and analyzed. The algorithm used for developing classification model is Support Vector Machine (SVM). The model has been trained and tested using different datasets. Different models have been created and the best chosen, resulting in 90.63% accuracy in predicting the oil-immersed paper insulation condition. Further implementation was executed to classify oil-paper condition of 19 Transformers which Furan data is not available. The classification results combined, reviewed, and compared to conventional assessment methods and standards, confirming that the model developed has the ability to do classification of current oil-paper condition based on Dissolved Gasses and Dielectric Characteristics.

Download Full-text

Multiclass patent document classification

Artificial Intelligence Research ◽

10.5430/air.v7n1p1 ◽

2017 ◽

Vol 7 (1) ◽

pp. 1 ◽

Cited By ~ 8

Author(s):

Chaitanya Anne ◽

Avdesh Mishra ◽

Md Tamjidul Hoque ◽

Shengru Tu

Keyword(s):

Text Classification ◽

Information Gain ◽

Document Classification ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Practical Reasons ◽

Patent Document ◽

Vast Number

Text classification is used in information extraction and retrieval from a given text, and text classification has been considered as an important step to manage a vast number of records given in digital form that is far-reaching and expanding. This article addresses patent document classification problem into fifteen different categories or classes, where some classes overlap with each other for practical reasons. For the development of the classification model using machine learning techniques, useful features have been extracted from the given documents. The features are used to classify patent document as well as to generate useful tag-words. The overall objective of this work is to systematize NASA’s patent management, by developing a set of automated tools that can assist NASA to manage and market its portfolio of intellectual properties (IP), and to enable easier discovery of relevant IP by users. We have identified an array of methods that can be applied such as k-Nearest Neighbors (kNN), two variations of the Support Vector Machine (SVM) algorithms, and two tree based classification algorithms: Random Forest and J48. The major research steps in this paper consist of filtering techniques for variable selection, information gain and feature correlation analysis, and training and testing potential models using effective classifiers. Further, the obstacles associated with the imbalanced data were mitigated by adding pseudo-synthetic data wherever appropriate, which resulted in a superior SVM classifier based model.

Download Full-text

Emphasis Learning, Features Repetition in Width Instead of Length to Improve Classification Performance: Case Study—Alzheimer’s Disease Diagnosis

Sensors ◽

10.3390/s20030941 ◽

2020 ◽

Vol 20 (3) ◽

pp. 941

Author(s):

Hamid Akramifard ◽

MohammadAli Balafar ◽

SeyedNaser Razavi ◽

Abd Rahman Ramli

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Principal Component ◽

Classification Problem ◽

Disease Diagnosis ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Small Subset ◽

Alzheimer’S Disease Diagnosis

In the past decade, many studies have been conducted to advance computer-aided systems for Alzheimer’s disease (AD) diagnosis. Most of them have recently developed systems concentrated on extracting and combining features from MRI, PET, and CSF. For the most part, they have obtained very high performance. However, improving the performance of a classification problem is complicated, specifically when the model’s accuracy or other performance measurements are higher than 90%. In this study, a novel methodology is proposed to address this problem, specifically in Alzheimer’s disease diagnosis classification. This methodology is the first of its kind in the literature, based on the notion of replication on the feature space instead of the traditional sample space. Briefly, the main steps of the proposed method include extracting, embedding, and exploring the best subset of features. For feature extraction, we adopt VBM-SPM; for embedding features, a concatenation strategy is used on the features to ultimately create one feature vector for each subject. Principal component analysis is applied to extract new features, forming a low-dimensional compact space. A novel process is applied by replicating selected components, assessing the classification model, and repeating the replication until performance divergence or convergence. The proposed method aims to explore most significant features and highest-preforming model at the same time, to classify normal subjects from AD and mild cognitive impairment (MCI) patients. In each epoch, a small subset of candidate features is assessed by support vector machine (SVM) classifier. This repeating procedure is continued until the highest performance is achieved. Experimental results reveal the highest performance reported in the literature for this specific classification problem. We obtained a model with accuracies of 98.81%, 81.61%, and 81.40% for AD vs. normal control (NC), MCI vs. NC, and AD vs. MCI classification, respectively.

Download Full-text

Healthy Fruits Image Label Categorization through Color Shape and Texture Features Based on Machine Learning Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7740.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 34-40

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Learning Algorithm ◽

Texture Features ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Paper Machine ◽

Textual Features ◽

Occurrence Matrix

The fruit categorization according to their visual quality has recently experienced tremendous growth in the field of agriculture and food products. Due to post-harvest loses during handling and processing, there is an increasing demand for quality products in agro industry which requires accuracy to predict the fruit. Various techniques of machine learning have been successfully applied for classifying the fruit built on binary class. In this paper, machine leaning technique is used to automate the process of categorization and to improve the accuracy of different types of fruits by feature selection. To categorized images domain specific features such as color, shape and textual features are considered. Statistical color features are extracted from the image, bounding box feature for shape features and gray-level co-occurrence matrix (GLCM) is used to extract the textual feature of an image. These features are combined in a single feature fusion. A support vector machine (SVM) classification model is trained using training set features on fruit360 dataset which includes six fruit categories (classes) with two sub category (sub-classes) which builds multiclass classification task. We present one-vs-one coding design of Error correcting output codes (ECOC) and apply to SVM classifier; validation followed a fivefold cross validation strategy. The result shows that the textual features combined with color and shape feature improved fruit classification accuracy.

Download Full-text

Automatic Detection of Arrhythmias From An ECG Signal Using An Auto-Encoder And SVM Classifier

10.21203/rs.3.rs-981164/v1 ◽

2021 ◽

Author(s):

Manoj Kumar Ojha ◽

Sulochna Wadhwani ◽

Arun Kumar Wadhwani ◽

Anupam Shukla

Keyword(s):

Feature Learning ◽

Classification Model ◽

Support Vector ◽

Svm Classifier ◽

Premature Ventricular Contractions ◽

Bundle Branch Block ◽

Ecg Signals ◽

Average Accuracy ◽

Different Types ◽

Care Systems

Abstract Millions of people worldwide are affected by arrhythmias. Arrhythmias are abnormal activity of the heart functioning. Some arrhythmias are harmful to the heart and can cause sudden mortality. The electrocardiogram (ECG) is a significant tool in cardiology for the diagnosis of arrhythmia beats. Computer-aided diagnosis (CAD) systems have been proposed in several studies to automatically classify different types of arrhythmias from ECG signals. To improve the classification of arrhythmias, a new end-to-end feature learning and classification model has been developed. This work focuses on the implementation of a one-dimensional convolution neural network (1D-CNN) model based on an auto-encoder convolution network (ACN) that learned the best ECG features from each heartbeat window. After that, we applied a Support Vector Machine (SVM) classifier for auto-encode features in order to detect the four different types of arrhythmic beats, including normal beats. These arrhythmia beats are left bundle branch block (L), right bundle branch block (R), paced beats (P), and premature ventricular contractions (V). using the MIT-BIH arrhythmia database. The statistical performance of the model is evaluated using tenfold cross-validation strategies and obtained as an overall accuracy of 98.84%, average accuracy of 99.53%, sensitivity of 98.24% and precision of 97.58%, respectively. This model has presents better results than other state-of-the-art models. Therefore, this approach may also help in clinical heart care systems.

Download Full-text