Cyberbullying Sentiment Analysis with Word2Vec and One-Against-All Support Vector Machine

Depression and social anxiety are the two main negative impacts of cyberbullying. Unfortunately, a survey conducted by UNICEF on 3rd September 2019 showed that 1 in 3 young people in 30 countries had been victims of cyberbullying. Sentiment analysis research will be conducted to detect a comment that contains cyberbullying. Dataset of cyberbullying is obtained from the Kaggle website, named, Toxic Comment Classification Challenge. The pre-processing process consists of 4 stages, namely comment generalization (convert text into lowercase and remove punctuation), tokenization, stop words removal, and lemmatization. Word Embedding will be used to conduct sentiment analysis by implementing Word2Vec. After that, One-Against-All (OAA) method with the Support Vector Machine (SVM) model will be used to make predictions in the form of multi labelling. The SVM model will go through a hyperparameter tuning process using Randomized Search CV. Then, evaluation will be carried out using Micro Averaged F1 Score to assess the prediction accuracy and Hamming Loss to assess the numbers of pairs of sample and label that are incorrectly classified. Implementation result of Word2Vec and OAA SVM model provide the best result for the data undergoing the process of pre-processing using comment generalization, tokenization, stop words removal, and lemmatization which is stored into 100 features in Word2Vec model. Micro Averaged F1 and Hamming Loss percentage that is produced by the tuned model is 83.40% and 15.13% respectively. Index Terms— Sentiment Analysis; Word Embedding; Word2Vec; One-Against-All; Support Vector Machine; Toxic Comment Classification Challenge; Multi Labelling

Download Full-text

Study on influencing factors of prediction accuracy of support vector machine (SVM) model for NOx emission of a hydrogen enriched compressed natural gas engine

Fuel ◽

10.1016/j.fuel.2018.07.009 ◽

2018 ◽

Vol 234 ◽

pp. 954-964 ◽

Cited By ~ 8

Author(s):

Hao Duan ◽

Yue Huang ◽

Roopesh Kumar Mehra ◽

Panpan Song ◽

Fanhua Ma

Keyword(s):

Support Vector Machine ◽

Natural Gas ◽

Influencing Factors ◽

Prediction Accuracy ◽

Nox Emission ◽

Support Vector ◽

Compressed Natural Gas ◽

Gas Engine ◽

Natural Gas Engine ◽

Svm Model

Download Full-text

Stable Isotope Ratio and Elemental Profile Combined with Support Vector Machine for Provenance Discrimination of Oolong Tea (Wuyi-Rock Tea)

Journal of Analytical Methods in Chemistry ◽

10.1155/2017/5454231 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Yun-xiao Lou ◽

Xian-shu Fu ◽

Xiao-ping Yu ◽

Zi-hong Ye ◽

Hai-feng Cui ◽

...

Keyword(s):

Support Vector Machine ◽

Stable Isotope ◽

Isotope Ratio ◽

Prediction Accuracy ◽

Stable Isotope Ratio ◽

Classification Model ◽

Support Vector ◽

Svm Model ◽

Metallic Element ◽

Instrumental Methods

This paper focused on an effective method to discriminate the geographical origin of Wuyi-Rock tea by the stable isotope ratio (SIR) and metallic element profiling (MEP) combined with support vector machine (SVM) analysis. Wuyi-Rock tea (n=99) collected from nine producing areas and non-Wuyi-Rock tea (n=33) from eleven nonproducing areas were analysed for SIR and MEP by established methods. The SVM model based on coupled data produced the best prediction accuracy (0.9773). This prediction shows that instrumental methods combined with a classification model can provide an effective and stable tool for provenance discrimination. Moreover, every feature variable in stable isotope and metallic element data was ranked by its contribution to the model. The results show that δ2H, δ18O, Cs, Cu, Ca, and Rb contents are significant indications for provenance discrimination and not all of the metallic elements improve the prediction accuracy of the SVM model.

Download Full-text

Short-term Load Forecasting Model for Microgrid Based on HSA-SVM

MATEC Web of Conferences ◽

10.1051/matecconf/201817301007 ◽

2018 ◽

Vol 173 ◽

pp. 01007

Author(s):

Han Aoyang ◽

Yu Litao ◽

An Shuhuai ◽

Zhang Zhisheng

Keyword(s):

Support Vector Machine ◽

Prediction Accuracy ◽

Harmony Search ◽

Load Forecasting ◽

Support Vector ◽

Forecasting Model ◽

Short Term ◽

Search Optimization ◽

Svm Model ◽

Short Term Load Forecasting

Short-term load forecasting for microgrid is the basis of the research on scheduling techniques of microgrid. Accurate load forecasting for microgrid will provide the necessary basis for cooperative optimization scheduling. Short-term loadforecasting model for microgrid based on support vector machine(SVM) is constructed in this paper. The harmony search optimization algorithm(HSA) is used to optimize the parameters of the SVM model, because it has the advantages of fast convergence speed and better optimization ability. Through the simulation and test of the actual microgrid load system, it is proved that the short-term loadforecasting model for microgrid based on HSA-SVM can effectively improve the prediction accuracy.

Download Full-text

Prediction of Consumption Choices of Low-Income Groups in a Mixed-Income Community Using a Support Vector Machine Method

Sustainability ◽

10.3390/su11143981 ◽

2019 ◽

Vol 11 (14) ◽

pp. 3981 ◽

Cited By ~ 1

Author(s):

Xiaoqian Zu ◽

Yongxiang Wu ◽

Zhenduo Zhang ◽

Lu Yu

Keyword(s):

Support Vector Machine ◽

Low Income ◽

Latent Variables ◽

Prediction Accuracy ◽

Support Vector ◽

Income Groups ◽

Consumption Choices ◽

Svm Model ◽

Low Income Groups ◽

Mixed Income

To examine how cross-strata neighboring behavior in a mixed-income community can influence the consumption choices of individuals in low-income groups, and to improve the prediction accuracy of the consumption choice model of low-income groups for small sample sizes, we developed a support vector machine (SVM) algorithm based on the influence of neighboring behavior. We substituted the predicted latent variables into the SVM classifier and constructed an SVM prediction model with latent variables based on reference group theory. We established the model parameters using cross-validation and used low-income residents from a mixed-income community in Shanghai as study objects to empirically test the model’s performance. The results show that the SVM selection model with latent variables has good prediction accuracy. The proposed model’s accuracy was improved by 1.29% on the basis of the particle swarm optimization (PSO)-SVM model without latent variables, and by 19.35% on the basis of the SVM model with latent variables. The proposed model can be employed to predict the consumption choices of individuals in low-income groups. This paper offers a theoretical reference for investigating neighboring behavior in a mixed-income community and the consumption choices of individuals in low-income groups and is practically important for urban community planning systems.

Download Full-text

Support Vector Machine based Word Embedding and Feature Reduction for Sentiment Analysis-A Study

2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) ◽

10.1109/iccmc48092.2020.iccmc-00035 ◽

2020 ◽

Author(s):

Prajakta P. Shelke ◽

Ankita N. Korde

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Feature Reduction ◽

Word Embedding ◽

Support Vector

Download Full-text

Apakah Youtuber Indonesia Kena Bully Netizen?

Jurnal ULTIMA InfoSys ◽

10.31937/si.v11i2.1764 ◽

2020 ◽

Vol 11 (2) ◽

pp. 130-134

Author(s):

Joviano Siahaan ◽

Wella Wella ◽

Ririn Ikana Desanti

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Sentiment Analysis ◽

Test Data ◽

Electronic Communication ◽

High Accuracy ◽

Support Vector ◽

Data Cleansing ◽

Svm Model ◽

The Subject

This study will examine the cyberbullying phenomenon that was experienced by Indonesian Youtubers in their Instagram comment section. Cyberbullying is the use of electronic communication to bully a person, typically by sending messages of an intimidating or threatening nature. Youtubers are the subject of this research due to their massive following, who constantly responds to every content posted on their Instagram page. The algorithm chosen to conduct this sentiment analysis was Support Vector Machine (SVM) due to their high accuracy percentage. The data used in this analysis was retrieved from 10 Indonesian Youtuber Instagram accounts. In order to analyze this data, several step was done including text mining, data cleansing, data modeling and applying model to test data. The result of analysis using an SVM model with an accuracy of 81.2% is 49.524% of comments on an Indonesian Youtuber comment section are considered as cyberbullying.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1840 ◽

2020 ◽

Vol 4 (2) ◽

pp. 362-369

Author(s):

Sharazita Dyah Anggita ◽

Ikmah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

The Public ◽

Svm Algorithm ◽

Bayes Algorithm ◽

Freight Forwarding ◽

Improved Accuracy

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text