Performance analysis of sentiments in Twitter dataset using SVM models

Sentiment Analysis is a current research topic by many researches using supervised and machine learning algorithms. The analysis can be done on movie reviews, twitter reviews, online product reviews, blogs, discussion forums, Myspace comments and social networks. The Twitter data set is analyzed using support vector machines (SVM) classifier with various parameters. The content of tweet is classified to find whether it contains fact data or opinion data. The deep analysis is required to find the opinion of the tweets posted by the individual. The sentiment is classified in to positive, negative and neutral. From this classification and analysis, an important decision can be made to improve the productivity. The performance of SVM radial kernel, SVM linear grid and SVM radial grid was compared and found that SVM linear grid performs better than other SVM models.

Download Full-text

Vehicle Price Prediction using SVM Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.g5915.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 398-401

Keyword(s):

Machine Learning ◽

Research Area ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Data Set ◽

Network Support ◽

Java Application ◽

Learning Techniques ◽

The Individual

The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application

Download Full-text

Sentiment Analysis of Student’s Opinion on Programming Assessment: Evaluation of Naïve Bayes over Support Vector Machines

International Journal of Innovative Computing ◽

10.11113/ijic.v10n2.278 ◽

2020 ◽

Vol 10 (2) ◽

Author(s):

Mahmood Umar ◽

Nor Bahiah Ahmad ◽

Anazida Zainal

Keyword(s):

Support Vector Machines ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Experimental Result ◽

Support Vector ◽

Small Data ◽

Data Set ◽

Vector Machines

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset.

Download Full-text

Author identification for Under-Resourced language (KadazanDusun)

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i1.pp248-255 ◽

2020 ◽

Vol 17 (1) ◽

pp. 248 ◽

Cited By ~ 1

Author(s):

Nursyahirah Tarmizi ◽

Suhaila Saee ◽

Dayang Hanani Abang Ibrahim

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Identification Task ◽

Data Set ◽

Short Text ◽

Author Identification

<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.</span>

Download Full-text

A Novel Points of Interest Selection Method For SVM-based Profiled Attacks

Journal of Science and Technology on Information security ◽

10.54654/isj.v2i12.117 ◽

2021 ◽

Vol 2 (12) ◽

pp. 45-58

Author(s):

Tran Ngoc Quy ◽

Nguyen Hong Quang

Keyword(s):

Selection Process ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Side Channel ◽

Data Set ◽

Points Of Interest ◽

Mode Decomposition ◽

Evaluation Board ◽

Dpa Contest

Abstract—Currently, one of the most powerful side channel attacks (SCA) is profiled attack. Machine learning algorithms, for example support vector machine (SVM), are currently used to improve the effectiveness of the attack. One issue of using SVM-based profiled attack is extracting points of interest (POIs), or features from power traces. Our work proposes a novel method for POIs selection of power traces based on the combining variational mode decomposition (VMD) and Gram-Schmidt orthogonalization (GSO). VMD is used to decompose the power traces into sub-signals (modes) and POIs selection process based on GSO is conducted on these sub-signals. As a result, the selected POIs are used for SVM classifier to conduct profiled attack. This attack method outperforms other profiled attacks in the same attack scenario. Experiments were performed on a trace data set collected from the Atmega8515 smart card with AES-128 run on the Sakura-G/W side channel evaluation board and the DPA Contest v4 dataset to verify the effectiveness of our method in reducing number of power traces for the attacks, especially with noisy power traces.Tóm tắt—Hiện nay, tấn công mẫu được xem là một trong những tấn công kênh kề (SCA) mạnh. Các thuật toán học máy, ví dụ như máy vector hỗ trợ (SVM), thường được sử dụng để nâng cao hiệu quả của tấn công mẫu. Một thách thức đối với tấn công mẫu sử dụng SVM là cần phải tìm được các điểm thích hợp (POI) hay các đặc trưng từ vết điện năng tiêu thụ. Công trình nghiên cứu này đề xuất một phương pháp mới đề tìm POI của vết điện năng tiêu thụ bằng cách kết hợp kỹ thuật phân tích mode biến phân (VMD) và quá trình trực giao hóa Gram-Schmidt (GSO). Trong đó, VMD được sử dụng để phân tách vết điện năng tiêu thụ thành các tín hiệu con còn gọi là VMD mode và việc lựa chọn POIs trên VMD mode này được thực hiện dựa trên quá trình GSO. Dựa trên phương pháp lựa chọn POIs này, chúng tôi đề xuất phương pháp tấn công mẫu sử dụng SVM có hiệu quả tốt hơn các tấn công mẫu khác ở cùng kịch bản tấn công. Các thí nghiệm tấn công được thực hiện trên tập dữ liệu được thu thập từ thẻ thông minh Atmega8515 cài đặt AES-128 chạy trên nền tảng thiết bị tấn công kênh kề Sakura-G/W và tập dữ liệu DPA Contest v4, để chứng minh tính hiệu quả của phương pháp của chúng tôi, trong việc giảm số lượng vết điện năng tiêu thụ cần cho cuộc tấn công, đặc biệt trong trường hợp các điện năng tiêu thụ có nhiễu.

Download Full-text

TREMO: A dataset for emotion analysis in Turkish

Journal of Information Science ◽

10.1177/0165551518761014 ◽

2018 ◽

Vol 44 (6) ◽

pp. 848-860 ◽

Cited By ~ 2

Author(s):

Mansur Alp Tocoglu ◽

Adil Alpkocak

Keyword(s):

Vector Space Model ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Training Process ◽

Validation Process ◽

Space Model ◽

Vector Machines ◽

F Measure

This study presents a new dataset to be used in emotion extraction studies in Turkish text. We consider emotion extraction as a supervised text classification problem, which thereby requires a dataset for the training process. To satisfy this requirement, we aim to create a new dataset containing data for the six emotion categories: happiness, fear, anger, sadness, disgust and surprise. To gather this dataset, we conducted a survey and collected 27,350 entries from 4709 individuals. In the next step, we performed a validation process in which annotators validated each entry one by one by assigning a related emotion category. As a result of this process, we obtained two datasets, one raw and the other validated. Subsequently, we generated four versions of these two datasets using two different stemming methods and then modelled them using a vector space model. Then, we ran machine learning algorithms, including complement naive Bayes (CNB), random forest (RF), decision tree C4.5 (J48) and an updated version of support vector machines (SVMs), on the models to calculate the accuracy, precision, recall and F-measure values. Based on the results we obtained, we concluded that the SVM classifier yielded the highest performance value and that the models trained with a validated dataset provide more accurate results than the models trained with a non-validated dataset.

Download Full-text

Using support vector machines to improve elemental ion identification in macromolecular crystal structures

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004715004241 ◽

2015 ◽

Vol 71 (5) ◽

pp. 1147-1158 ◽

Cited By ~ 3

Author(s):

Nader Morshed ◽

Nathaniel Echols ◽

Paul D. Adams

Keyword(s):

Support Vector Machines ◽

High Resolution ◽

Crystal Structures ◽

Metal Binding ◽

Model Building ◽

Protein Crystal ◽

Support Vector ◽

Svm Classifier ◽

Data Set ◽

Vector Machines

In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based ona prioriexpectations of the chemical environment and X-ray scattering.

Download Full-text

Improving the Efficiency of Heart Disease Prediction Using SVM and a Novel Tree Specific Random Forest Classifier (NTSRF)

Alinteri Journal of Agricultural Sciences ◽

10.47059/alinteri/v36i1/ajas21087 ◽

2021 ◽

Vol 36 (1) ◽

pp. 616-622

Author(s):

P. Harish ◽

Dr.R. Sabitha

Keyword(s):

Heart Disease ◽

Random Forest ◽

Random Forest Classifier ◽

Support Vector ◽

Svm Classifier ◽

Disease Prediction ◽

Data Set ◽

Medical Sector ◽

Accuracy And Precision ◽

Better Than

Aim: The objective of the work is to evaluate the accuracy and precision in predicting the heart disease using Support Vector Machine (SVM) and Random Forest (RF) classification algorithms. Materials and Methods: Random Forest Classifier is applied on a Health dataset that consists of 304 records. A framework for heart disease prediction in the medical sector comparing Random Forest and SVM classifiers has been proposed and developed. The sample size was measured as 21 per group. The accuracy and the precision of the classifiers was evaluated and recorded. Results: The SVM classifier produces 53.04% in predicting the heart disease on the data set used whereas the Random forest classifier predicts the same at the rate of 83.2%. The significant value is 0.0. Hence RF is better than SVM. Conclusion: The performance of Random forest is better compared with SVM in terms of both precision and accuracy.

Download Full-text

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Download Full-text

Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences

BioMed Research International ◽

10.1155/2016/4783801 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 13

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zhu-Hong You ◽

Yu-Hong Fang ◽

Yu-Jun Zhao ◽

...

Keyword(s):

Protein Sequences ◽

Relevance Vector Machine ◽

Experimental Results ◽

Computational Method ◽

Support Vector ◽

Svm Classifier ◽

Local Phase ◽

Local Phase Quantization ◽

Phase Quantization ◽

Better Than

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.

Download Full-text

Heart disease prediction using machine learning techniques : a survey

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.8.10557 ◽

2018 ◽

Vol 7 (2.8) ◽

pp. 684 ◽

Cited By ~ 12

Author(s):

V V. Ramalingam ◽

Ayantan Dandapath ◽

M Karthik Raja

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Complex Data ◽

Learning Techniques ◽

Vector Machines ◽

Supervised Learning Algorithms ◽

Life Threatening

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.

Download Full-text