Sentiment polarity classification of tweets using a extended dictionary

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.

Download Full-text

Classification of Rusty and Non-Rusty Images

International Journal of Natural Computing Research ◽

10.4018/ijncr.2020100101 ◽

2020 ◽

Vol 9 (4) ◽

pp. 1-17

Author(s):

Mridu Sahu ◽

Tushar Jani ◽

Maski Saijahnavi ◽

Amrit Kumar ◽

Upendra Chaurasiya ◽

...

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Data Augmentation ◽

Support Vector ◽

Suitable Model ◽

Online Databases ◽

Perlin Noise ◽

Combined Features ◽

Different Parts

Rust detection is necessary for proper working and maintenance of machines for security purposes. Images are one of the suggested platforms for rust detection in which rust can be detected even though the human can't reach to the area. However, there are a lack of online databases available that can provide a sizable dataset to identify the most suitable model that can be used further. This paper provides a data augmentation technique by using Perlin noise, and further, the generated images are tested on standard features (i.e., statistical values, entropy, along with SIFT and SURF methods). The two most generalized classifiers, naïve Bayes and support vector machine, are identified and tested to obtain the performance of classification of rusty and non-rusty images. The support vector machine provides better classification accuracy, which also suggests that that the combined features of statistics, SIFT, and SURF are able to differentiate the images. Hence, it can be further used to detect the rust in different parts of machines.

Download Full-text

Feature Extraction and Classification of MRI Using Hybrid RBF Kernel and SVM

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2176104 ◽

2021 ◽

pp. 418-426

Author(s):

Suhas S ◽

Dr. C. R. Venugopal

Keyword(s):

Support Vector Machine ◽

Image Retrieval ◽

Classification Accuracy ◽

Kernel Functions ◽

Polynomial Kernel ◽

Support Vector ◽

Svm Classifier ◽

Mr Images ◽

Rbf Kernel

An enhanced classification system for classification of MR images using association of kernels with support vector machine is developed and presented in this paper along with the design and development of content-based image retrieval (CBIR) system. Content of image retrieval is the process of finding relevant image from large collection of image database using visual queries. Medical images have led to growth in large image collection. Oriented Rician Noise Reduction Anisotropic Diffusion filter is used for image denoising. A modified hybrid Otsu algorithm termed is used for image segmentation. The texture features are extracted using GLCM method. Genetic algorithm with Joint entropy is adopted for feature selection. The classification is done by support vector machine along with various kernels and the performance is validated. A classification accuracy of 98.83% is obtained using SVM with GRBF kernel. Various features have been extracted and these features are used to classify MR images into five different categories. Performance of the MC-SVM classifier is compared with different kernel functions. From the analysis and performance measures like classification accuracy, it is inferred that the brain and spinal cord MRI classification is best done using MC- SVM with Gaussian RBF kernel function than linear and polynomial kernel functions. The proposed system can provide best classification performance with high accuracy and low error rate.

Download Full-text

Spectral-Spatial Classification of Hyperspectral Image Based on Support Vector Machine

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2021010103 ◽

2021 ◽

Vol 16 (1) ◽

pp. 56-74

Author(s):

Weiwei Yang ◽

Haifeng Song

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Spatial Information ◽

Hyperspectral Image ◽

State Of The Art ◽

Support Vector ◽

Svm Classifier ◽

Spatial Classification ◽

Homogeneous Regions

Recent research has shown that integration of spatial information has emerged as a powerful tool in improving the classification accuracy of hyperspectral image (HSI). However, partitioning homogeneous regions of the HSI remains a challenging task. This paper proposes a novel spectral-spatial classification method inspired by the support vector machine (SVM). The model consists of spectral-spatial feature extraction channel (SSC) and SVM classifier. SSC is mainly used to extract spatial-spectral features of HSI. SVM is mainly used to classify the extracted features. The model can automatically extract the features of HSI and classify them. Experiments are conducted on benchmark HSI dataset (Indian Pines). It is found that the proposed method yields more accurate classification results compared to the state-of-the-art techniques.

Download Full-text

AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

Scientific Programming ◽

10.1155/2020/8837357 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Xin Wang ◽

Yue Yang ◽

Mingsong Chen ◽

Qin Wang ◽

Qin Qin ◽

...

Keyword(s):

Support Vector Machine ◽

Hierarchical Clustering ◽

Classification Accuracy ◽

Classification Performance ◽

Support Vector ◽

Seed Sample ◽

Imbalanced Datasets ◽

Centroid Method ◽

Svm Classification ◽

Sampling Process

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.

Download Full-text

Phishing Website Classification using Least Square Twin Support Vector Machine

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a3905.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2063-2068

Keyword(s):

Support Vector Machine ◽

Cyber Security ◽

Classification Accuracy ◽

Personal Information ◽

Classification Problem ◽

Least Square ◽

Twin Support Vector Machine ◽

Support Vector ◽

Security Issue

Phishing is one among the luring procedures used by phishing attackers in the means to abuse the personal details of clients. Phishing is earnest cyber security issue that includes facsimileing legitimate website to apostatize online users so as to purloin their personal information. Phishing can be viewed as special type of classification problem where the classifier is built from substantial number of website's features. It is required to identify the best features for improving classifiers accuracy. This study, highlights on the important features of websites that are used to classify the phishing website and form the legitimate ones by presenting a scheme Decision Tree Least Square Twin Support Vector Machine (DT-LST-SVM) for the classification of phishing website. UCI public domain benchmark website phishing dataset was used to conduct the experiment on the proposed classifier with different kernel function and calculate the classification accuracy of the classifiers. Computational results show that DT-LST-SVM scheme yield the better classification accuracy with phishing websites classification dataset

Download Full-text

Classification of P2P Traffic Based on a Heteromorphic Ensemble Learning Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.2693 ◽

2014 ◽

Vol 687-691 ◽

pp. 2693-2697

Author(s):

Li Ding ◽

Li Mao ◽

Xiao Feng Wang

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Ensemble Learning ◽

Classification Accuracy ◽

Learning Algorithm ◽

Learning Model ◽

Support Vector ◽

Machine Learning Algorithm ◽

Data Environment

One single machine learning algorithm presents shortcomings when the data environment changes in the process of application. This article puts forward a heteromorphic ensemble learning model made up of bayes, support vector machine (SVM) and decision tree which classifies P2P traffic by voting principle. The experiment shows that the model can significantly improve the classification accuracy, and has a good stability.

Download Full-text

Imbalanced learning: Improving classification of diabetic neuropathy from magnetic resonance imaging

PLoS ONE ◽

10.1371/journal.pone.0243907 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243907

Author(s):

Kevin Teh ◽

Paul Armitage ◽

Solomon Tesfaye ◽

Dinesh Selvarajah ◽

Iain D. Wilkinson

Keyword(s):

Magnetic Resonance Imaging ◽

Support Vector Machine ◽

Class Imbalance ◽

Nearest Neighbors ◽

Classification Performance ◽

Support Vector ◽

Imbalanced Learning ◽

Resonance Imaging ◽

K Nearest Neighbors

One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.

Download Full-text

Polarity Classification of Traffic Related Tweets

10.5753/eniac.2018.4417 ◽

2018 ◽

Author(s):

Clarissa Castellã Xavier

Keyword(s):

Support Vector Machine ◽

Maximum Entropy ◽

Traffic Management ◽

Support Vector ◽

Average Precision ◽

Learning Methods ◽

Management Agency ◽

Average Recall ◽

Polarity Classification

In this paper we present a study about polarity classification of tweets in the traffic domain. Specifically, we use the data in Portuguese language from an account maintained by a traffic management agency. We evaluate the performance of three learning methods: SVM (Support Vector Machine), Naive Bayes and Maximum Entropy. We also explore how the use of balanced vs. unbalanced corpus affects the models behavior. The results show that, in this context, a ML classifier obtains better results than the reported in the literature. In our experiments, SVM trained with a balanced corpus outperforms all tested models, achieving 99% of Accuracy, Average Recall and Average Precision.

Download Full-text

Epileptic seizure detection from EEG signals with phase–amplitude cross-frequency coupling and support vector machine

International Journal of Modern Physics B ◽

10.1142/s0217979218500868 ◽

2018 ◽

Vol 32 (08) ◽

pp. 1850086 ◽

Cited By ~ 3

Author(s):

Yang Liu ◽

Jiang Wang ◽

Lihui Cai ◽

Yingyuan Chen ◽

Yingmei Qin

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Epileptic Seizures ◽

Classification Performance ◽

Support Vector ◽

Svm Classifier ◽

Eeg Signals ◽

Frequency Coupling ◽

Phase Amplitude ◽

Cross Frequency Coupling

As a pattern of cross-frequency coupling (CFC), phase–amplitude coupling (PAC) depicts the interaction between the phase and amplitude of distinct frequency bands from the same signal, and has been proved to be closely related to the brain’s cognitive and memory activities. This work utilized PAC and support vector machine (SVM) classifier to identify the epileptic seizures from electroencephalogram (EEG) data. The entropy-based modulation index (MI) matrixes are used to express the strength of PAC, from which we extracted features as the input for classifier. Based on the Bonn database, which contains five datasets of EEG segments obtained from healthy volunteers and epileptic subjects, a 100% classification accuracy is achieved for identifying seizure ictal from healthy data, and an accuracy of 97.67% is reached in the classification of ictal EEG signals from inter-ictal EEGs. Based on the CHB–MIT database which is a group of continuously recorded epileptic EEGs by scalp electrodes, a 97.50% classification accuracy is obtained and a raising sign of MI value is found at 6[Formula: see text]s before seizure onset. The classification performance in this work is effective, and PAC can be considered as a useful tool for detecting and predicting the epileptic seizures and providing reference for clinical diagnosis.

Download Full-text

Imbalanced Support Vector Machine Classification Based on Hyper-Sphere

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.339.384 ◽

2013 ◽

Vol 339 ◽

pp. 384-388

Author(s):

Cun He Li ◽

Rui Xue Chen ◽

Yi Zhao Ouyang

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Learning Algorithm ◽

Classification Performance ◽

Training Data ◽

Support Vector ◽

Classification Rate ◽

Important Method ◽

Selection Technique ◽

Unbalanced Classification

In classification, when the distribution of the training data between classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. Features in the minority classes are normally difficult to be fully recognized. Hyper-sphere support vector machine is an important method for unbalanced classification which is an important issue, but this algorithm has a defect. In order to significantly improve the classification performance of imbalanced datasets, we propose a new method based on Generalized Hyper-sphere Support Vector Machine to enhance the classification accuracy for the minority classes. Support vector machine (SVM) is then used as the base classifier to train the reprocessed dataset. Our experimental results demonstrate that the proposed selection technique improves the classification rate of the rare events, and it also improves the overall accuracy of SVM without data pre-processing.

Download Full-text