scholarly journals Sentiment polarity classification of tweets using a extended dictionary

2018 ◽  
Vol 21 (62) ◽  
pp. 1
Author(s):  
Jorge E. Camargo ◽  
Vladimir Vargas-Calderon ◽  
Nelson Vargas ◽  
Liliana Calderón-Benavides

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.

2020 ◽  
Vol 9 (4) ◽  
pp. 1-17
Author(s):  
Mridu Sahu ◽  
Tushar Jani ◽  
Maski Saijahnavi ◽  
Amrit Kumar ◽  
Upendra Chaurasiya ◽  
...  

Rust detection is necessary for proper working and maintenance of machines for security purposes. Images are one of the suggested platforms for rust detection in which rust can be detected even though the human can't reach to the area. However, there are a lack of online databases available that can provide a sizable dataset to identify the most suitable model that can be used further. This paper provides a data augmentation technique by using Perlin noise, and further, the generated images are tested on standard features (i.e., statistical values, entropy, along with SIFT and SURF methods). The two most generalized classifiers, naïve Bayes and support vector machine, are identified and tested to obtain the performance of classification of rusty and non-rusty images. The support vector machine provides better classification accuracy, which also suggests that that the combined features of statistics, SIFT, and SURF are able to differentiate the images. Hence, it can be further used to detect the rust in different parts of machines.


Author(s):  
Suhas S ◽  
Dr. C. R. Venugopal

An enhanced classification system for classification of MR images using association of kernels with support vector machine is developed and presented in this paper along with the design and development of content-based image retrieval (CBIR) system. Content of image retrieval is the process of finding relevant image from large collection of image database using visual queries. Medical images have led to growth in large image collection. Oriented Rician Noise Reduction Anisotropic Diffusion filter is used for image denoising. A modified hybrid Otsu algorithm termed is used for image segmentation. The texture features are extracted using GLCM method. Genetic algorithm with Joint entropy is adopted for feature selection. The classification is done by support vector machine along with various kernels and the performance is validated. A classification accuracy of 98.83% is obtained using SVM with GRBF kernel. Various features have been extracted and these features are used to classify MR images into five different categories. Performance of the MC-SVM classifier is compared with different kernel functions. From the analysis and performance measures like classification accuracy, it is inferred that the brain and spinal cord MRI classification is best done using MC- SVM with Gaussian RBF kernel function than linear and polynomial kernel functions. The proposed system can provide best classification performance with high accuracy and low error rate.


Author(s):  
Weiwei Yang ◽  
Haifeng Song

Recent research has shown that integration of spatial information has emerged as a powerful tool in improving the classification accuracy of hyperspectral image (HSI). However, partitioning homogeneous regions of the HSI remains a challenging task. This paper proposes a novel spectral-spatial classification method inspired by the support vector machine (SVM). The model consists of spectral-spatial feature extraction channel (SSC) and SVM classifier. SSC is mainly used to extract spatial-spectral features of HSI. SVM is mainly used to classify the extracted features. The model can automatically extract the features of HSI and classify them. Experiments are conducted on benchmark HSI dataset (Indian Pines). It is found that the proposed method yields more accurate classification results compared to the state-of-the-art techniques.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Xin Wang ◽  
Yue Yang ◽  
Mingsong Chen ◽  
Qin Wang ◽  
Qin Qin ◽  
...  

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.


Phishing is one among the luring procedures used by phishing attackers in the means to abuse the personal details of clients. Phishing is earnest cyber security issue that includes facsimileing legitimate website to apostatize online users so as to purloin their personal information. Phishing can be viewed as special type of classification problem where the classifier is built from substantial number of website's features. It is required to identify the best features for improving classifiers accuracy. This study, highlights on the important features of websites that are used to classify the phishing website and form the legitimate ones by presenting a scheme Decision Tree Least Square Twin Support Vector Machine (DT-LST-SVM) for the classification of phishing website. UCI public domain benchmark website phishing dataset was used to conduct the experiment on the proposed classifier with different kernel function and calculate the classification accuracy of the classifiers. Computational results show that DT-LST-SVM scheme yield the better classification accuracy with phishing websites classification dataset


2014 ◽  
Vol 687-691 ◽  
pp. 2693-2697
Author(s):  
Li Ding ◽  
Li Mao ◽  
Xiao Feng Wang

One single machine learning algorithm presents shortcomings when the data environment changes in the process of application. This article puts forward a heteromorphic ensemble learning model made up of bayes, support vector machine (SVM) and decision tree which classifies P2P traffic by voting principle. The experiment shows that the model can significantly improve the classification accuracy, and has a good stability.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243907
Author(s):  
Kevin Teh ◽  
Paul Armitage ◽  
Solomon Tesfaye ◽  
Dinesh Selvarajah ◽  
Iain D. Wilkinson

One of the fundamental challenges when dealing with medical imaging datasets is class imbalance. Class imbalance happens where an instance in the class of interest is relatively low, when compared to the rest of the data. This study aims to apply oversampling strategies in an attempt to balance the classes and improve classification performance. We evaluated four different classifiers from k-nearest neighbors (k-NN), support vector machine (SVM), multilayer perceptron (MLP) and decision trees (DT) with 73 oversampling strategies. In this work, we used imbalanced learning oversampling techniques to improve classification in datasets that are distinctively sparser and clustered. This work reports the best oversampling and classifier combinations and concludes that the usage of oversampling methods always outperforms no oversampling strategies hence improving the classification results.


2018 ◽  
Author(s):  
Clarissa Castellã Xavier

In this paper we present a study about polarity classification of tweets in the traffic domain. Specifically, we use the data in Portuguese language from an account maintained by a traffic management agency. We evaluate the performance of three learning methods: SVM (Support Vector Machine), Naive Bayes and Maximum Entropy. We also explore how the use of balanced vs. unbalanced corpus affects the models behavior. The results show that, in this context, a ML classifier obtains better results than the reported in the literature. In our experiments, SVM trained with a balanced corpus outperforms all tested models, achieving 99% of Accuracy, Average Recall and Average Precision.


2018 ◽  
Vol 32 (08) ◽  
pp. 1850086 ◽  
Author(s):  
Yang Liu ◽  
Jiang Wang ◽  
Lihui Cai ◽  
Yingyuan Chen ◽  
Yingmei Qin

As a pattern of cross-frequency coupling (CFC), phase–amplitude coupling (PAC) depicts the interaction between the phase and amplitude of distinct frequency bands from the same signal, and has been proved to be closely related to the brain’s cognitive and memory activities. This work utilized PAC and support vector machine (SVM) classifier to identify the epileptic seizures from electroencephalogram (EEG) data. The entropy-based modulation index (MI) matrixes are used to express the strength of PAC, from which we extracted features as the input for classifier. Based on the Bonn database, which contains five datasets of EEG segments obtained from healthy volunteers and epileptic subjects, a 100% classification accuracy is achieved for identifying seizure ictal from healthy data, and an accuracy of 97.67% is reached in the classification of ictal EEG signals from inter-ictal EEGs. Based on the CHB–MIT database which is a group of continuously recorded epileptic EEGs by scalp electrodes, a 97.50% classification accuracy is obtained and a raising sign of MI value is found at 6[Formula: see text]s before seizure onset. The classification performance in this work is effective, and PAC can be considered as a useful tool for detecting and predicting the epileptic seizures and providing reference for clinical diagnosis.


2013 ◽  
Vol 339 ◽  
pp. 384-388
Author(s):  
Cun He Li ◽  
Rui Xue Chen ◽  
Yi Zhao Ouyang

In classification, when the distribution of the training data between classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. Features in the minority classes are normally difficult to be fully recognized. Hyper-sphere support vector machine is an important method for unbalanced classification which is an important issue, but this algorithm has a defect. In order to significantly improve the classification performance of imbalanced datasets, we propose a new method based on Generalized Hyper-sphere Support Vector Machine to enhance the classification accuracy for the minority classes. Support vector machine (SVM) is then used as the base classifier to train the reprocessed dataset. Our experimental results demonstrate that the proposed selection technique improves the classification rate of the rare events, and it also improves the overall accuracy of SVM without data pre-processing.


Sign in / Sign up

Export Citation Format

Share Document