scholarly journals Performance analysis of sentiments in Twitter dataset using SVM models

Author(s):  
Lakshmana Kumar Ramasamy ◽  
Seifedine Kadry ◽  
Yunyoung Nam ◽  
Maytham N. Meqdad

Sentiment Analysis is a current research topic by many researches using supervised and machine learning algorithms. The analysis can be done on movie reviews, twitter reviews, online product reviews, blogs, discussion forums, Myspace comments and social networks. The Twitter data set is analyzed using support vector machines (SVM) classifier with various parameters. The content of tweet is classified to find whether it contains fact data or opinion data. The deep analysis is required to find the opinion of the tweets posted by the individual. The sentiment is classified in to positive, negative and neutral. From this classification and analysis, an important decision can be made to improve the productivity. The performance of SVM radial kernel, SVM linear grid and SVM radial grid was compared and found that SVM linear grid performs better than other SVM models.

The prediction of price for a vehicle has been more popular in research area, and it needs predominant effort and information about the experts of this particular field. The number of different attributes is measured and also it has been considerable to predict the result in more reliable and accurate. To find the price of used vehicles a well defined model has been developed with the help of three machine learning techniques such as Artificial Neural Network, Support Vector Machine and Random Forest. These techniques were used not on the individual items but for the whole group of data items. This data group has been taken from some web portal and that same has been used for the prediction. The data must be collected using web scraper that was written in PHP programming language. Distinct machine learning algorithms of varying performances had been compared to get the best result of the given data set. The final prediction model was integrated into Java application


2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Mahmood Umar ◽  
Nor Bahiah Ahmad ◽  
Anazida Zainal

This study investigates the performance of machine learning algorithms for sentiment analysis of students’ opinions on programming assessment. Previous researches show that Support Vector Machines (SVM) performs the best among all techniques, followed by Naïve Bayes (NB) in sentiment analysis. This study proposes a framework for classifying sentiments, as positive or negative using NB algorithm and Lexicon-based approach on small data set. The performance of NB algorithm was evaluated using SVM. NB and SVM conquer the Lexicon-based approach opinion lexicon technique in terms of accuracy in the specific area for which it is trained. The Lexicon-based technique, on the other hand, avoids difficult steps needed to train the classifier. Data was analyzed from 75 first year undergraduate students in School of Computing, Universiti Teknologi Malaysia taking programming subject. The student’s sentiments were gathered based on their opinions for the zero-score policy for unsuccessful compilation of program during skill-based test. The result of the study reveals that the students tend to have negative sentiments on programming assessment as it gives them scary emotions. The experimental result of applying NB algorithm yields a prediction accuracy of 85% which outperform both the SVM with 70% and Lexicon-based approach with 60% accuracy. The result shows that NB works better than SVM and Lexicon-based approach on small dataset. 


Author(s):  
Nursyahirah Tarmizi ◽  
Suhaila Saee ◽  
Dayang Hanani Abang Ibrahim

<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.</span>


2021 ◽  
Vol 2 (12) ◽  
pp. 45-58
Author(s):  
Tran Ngoc Quy ◽  
Nguyen Hong Quang

Abstract—Currently, one of the most powerful side channel attacks (SCA) is profiled attack. Machine learning algorithms, for example support vector machine (SVM), are currently used to improve the effectiveness of the attack. One issue of using SVM-based profiled attack is extracting points of interest (POIs), or features from power traces. Our work proposes a novel method for POIs selection of power traces based on the combining variational mode decomposition (VMD) and Gram-Schmidt orthogonalization (GSO). VMD is used to decompose the power traces into sub-signals (modes) and POIs selection process based on GSO is conducted on these sub-signals. As a result, the selected POIs are used for SVM classifier to conduct profiled attack. This attack method outperforms other profiled attacks in the same attack scenario. Experiments were performed on a trace data set collected from the Atmega8515 smart card with AES-128 run on the Sakura-G/W side channel evaluation board and the DPA Contest v4 dataset to verify the effectiveness of our method in reducing number of power traces for the attacks, especially with noisy power traces.Tóm tắt—Hiện nay, tấn công mẫu được xem là một trong những tấn công kênh kề (SCA) mạnh. Các thuật toán học máy, ví dụ như máy vector hỗ trợ (SVM), thường được sử dụng để nâng cao hiệu quả của tấn công mẫu. Một thách thức đối với tấn công mẫu sử dụng SVM là cần phải tìm được các điểm thích hợp (POI) hay các đặc trưng từ vết điện năng tiêu thụ. Công trình nghiên cứu này đề xuất một phương pháp mới đề tìm POI của vết điện năng tiêu thụ bằng cách kết hợp kỹ thuật phân tích mode biến phân (VMD) và quá trình trực giao hóa Gram-Schmidt (GSO). Trong đó, VMD được sử dụng để phân tách vết điện năng tiêu thụ thành các tín hiệu con còn gọi là VMD mode và việc lựa chọn POIs trên VMD mode này được thực hiện dựa trên quá trình GSO. Dựa trên phương pháp lựa chọn POIs này, chúng tôi đề xuất phương pháp tấn công mẫu sử dụng SVM có hiệu quả tốt hơn các tấn công mẫu khác ở cùng kịch bản tấn công. Các thí nghiệm tấn công được thực hiện trên tập dữ liệu được thu thập từ thẻ thông minh Atmega8515 cài đặt AES-128 chạy trên nền tảng thiết bị tấn công kênh kề Sakura-G/W và tập dữ liệu DPA Contest v4, để chứng minh tính hiệu quả của phương pháp của chúng tôi, trong việc giảm số lượng vết điện năng tiêu thụ cần cho cuộc tấn công, đặc biệt trong trường hợp các điện năng tiêu thụ có nhiễu.


2018 ◽  
Vol 44 (6) ◽  
pp. 848-860 ◽  
Author(s):  
Mansur Alp Tocoglu ◽  
Adil Alpkocak

This study presents a new dataset to be used in emotion extraction studies in Turkish text. We consider emotion extraction as a supervised text classification problem, which thereby requires a dataset for the training process. To satisfy this requirement, we aim to create a new dataset containing data for the six emotion categories: happiness, fear, anger, sadness, disgust and surprise. To gather this dataset, we conducted a survey and collected 27,350 entries from 4709 individuals. In the next step, we performed a validation process in which annotators validated each entry one by one by assigning a related emotion category. As a result of this process, we obtained two datasets, one raw and the other validated. Subsequently, we generated four versions of these two datasets using two different stemming methods and then modelled them using a vector space model. Then, we ran machine learning algorithms, including complement naive Bayes (CNB), random forest (RF), decision tree C4.5 (J48) and an updated version of support vector machines (SVMs), on the models to calculate the accuracy, precision, recall and F-measure values. Based on the results we obtained, we concluded that the SVM classifier yielded the highest performance value and that the models trained with a validated dataset provide more accurate results than the models trained with a non-validated dataset.


2015 ◽  
Vol 71 (5) ◽  
pp. 1147-1158 ◽  
Author(s):  
Nader Morshed ◽  
Nathaniel Echols ◽  
Paul D. Adams

In the process of macromolecular model building, crystallographers must examine electron density for isolated atoms and differentiate sites containing structured solvent molecules from those containing elemental ions. This task requires specific knowledge of metal-binding chemistry and scattering properties and is prone to error. A method has previously been described to identify ions based on manually chosen criteria for a number of elements. Here, the use of support vector machines (SVMs) to automatically classify isolated atoms as either solvent or one of various ions is described. Two data sets of protein crystal structures, one containing manually curated structures deposited with anomalous diffraction data and another with automatically filtered, high-resolution structures, were constructed. On the manually curated data set, an SVM classifier was able to distinguish calcium from manganese, zinc, iron and nickel, as well as all five of these ions from water molecules, with a high degree of accuracy. Additionally, SVMs trained on the automatically curated set of high-resolution structures were able to successfully classify most common elemental ions in an independent validation test set. This method is readily extensible to other elemental ions and can also be used in conjunction with previous methods based ona prioriexpectations of the chemical environment and X-ray scattering.


2021 ◽  
Vol 36 (1) ◽  
pp. 616-622
Author(s):  
P. Harish ◽  
Dr.R. Sabitha

Aim: The objective of the work is to evaluate the accuracy and precision in predicting the heart disease using Support Vector Machine (SVM) and Random Forest (RF) classification algorithms. Materials and Methods: Random Forest Classifier is applied on a Health dataset that consists of 304 records. A framework for heart disease prediction in the medical sector comparing Random Forest and SVM classifiers has been proposed and developed. The sample size was measured as 21 per group. The accuracy and the precision of the classifiers was evaluated and recorded. Results: The SVM classifier produces 53.04% in predicting the heart disease on the data set used whereas the Random forest classifier predicts the same at the rate of 83.2%. The significant value is 0.0. Hence RF is better than SVM. Conclusion: The performance of Random forest is better compared with SVM in terms of both precision and accuracy.


2020 ◽  
Vol 27 (4) ◽  
pp. 329-336 ◽  
Author(s):  
Lei Xu ◽  
Guangmin Liang ◽  
Baowen Chen ◽  
Xu Tan ◽  
Huaikun Xiang ◽  
...  

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Ji-Yong An ◽  
Fan-Rong Meng ◽  
Zhu-Hong You ◽  
Yu-Hong Fang ◽  
Yu-Jun Zhao ◽  
...  

We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments onYeastandHumandatasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on theYeastdataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


Sign in / Sign up

Export Citation Format

Share Document