Opinion mining on newspaper headlines using SVM and NLP

Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.

Download Full-text

Klasifikasi Laporan Keluhan Pelayanan Publik Berdasarkan Instansi Menggunakan Metode LDA-SVM

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021863768 ◽

2021 ◽

Vol 8 (6) ◽

pp. 1265

Author(s):

Muhammad Alkaff ◽

Andreyan Rizky Baskara ◽

Irham Maulani

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Confusion Matrix ◽

Support Vector ◽

Topic Distribution ◽

The Government ◽

Dirichlet Allocation

Sebuah sistem layanan untuk menyampaikan aspirasi dan keluhan masyarakat terhadap layanan pemerintah Indonesia, bernama Lapor! Pemerintah sudah lama memanfaatkan sistem tersebut untuk menjawab permasalahan masyarakat Indonesia terkait permasalahan birokrasi. Namun, peningkatan volume laporan dan pemilahan laporan yang dilakukan oleh operator dengan membaca setiap keluhan yang masuk melalui sistem menyebabkan sering terjadi kesalahan dimana operator meneruskan laporan tersebut ke instansi yang salah. Oleh karena itu, diperlukan suatu solusi yang dapat menentukan konteks laporan secara otomatis dengan menggunakan teknik Natural Language Processing. Penelitian ini bertujuan untuk membangun klasifikasi laporan secara otomatis berdasarkan topik laporan yang ditujukan kepada instansi yang berwenang dengan menggabungkan metode Latent Dirichlet Allocation (LDA) dan Support Vector Machine (SVM). Proses pemodelan topik untuk setiap laporan dilakukan dengan menggunakan metode LDA. Metode ini mengekstrak laporan untuk menemukan pola tertentu dalam dokumen yang akan menghasilkan keluaran dalam nilai distribusi topik. Selanjutnya, proses klasifikasi untuk menentukan laporan agensi tujuan dilakukan dengan menggunakan SVM berdasarkan nilai topik yang diekstraksi dengan metode LDA. Performa model LDA-SVM diukur dengan menggunakan confusion matrix dengan menghitung nilai akurasi, presisi, recall, dan F1 Score. Hasil pengujian menggunakan teknik split train-test dengan skor 70:30 menunjukkan bahwa model menghasilkan kinerja yang baik dengan akurasi 79,85%, presisi 79,98%, recall 72,37%, dan Skor F1 74,67%. AbstractA service system to convey aspirations and complaints from the public against Indonesia's government services, named Lapor! The Government has used the Government for a long time to answer the problems of the Indonesian people related to bureaucratic problems. However, the increasing volume of reports and the sorting of reports carried out by operators by reading every complaint that comes through the system cause frequent errors where operators forward the reports to the wrong agencies. Therefore, we need a solution that can automatically determine the report's context using Natural Language Processing techniques. This study aims to build automatic report classifications based on report topics addressed to authorized agencies by combining Latent Dirichlet Allocation (LDA) and Support Vector Machine (SVM). The topic-modeling process for each report was carried out using the LDA method. This method extracts reports to find specific patterns in documents that will produce output in topic distribution values. Furthermore, the classification process to determine the report's destination agency carried out using the SVM based on the value of the topics extracted by the LDA method. The LDA-SVM model's performance is measured using a confusion matrix by calculating the value of accuracy, precision, recall, and F1 Score. The test results using the train-test split technique with a 70:30 show that the model produces good performance with 79.85% accuracy, 79.98% precision, 72.37% recall, and 74.67% F1 Score

Download Full-text

Analisis Sentimen Sistem Ganjil Genap di Tol Bekasi Menggunakan Algoritma Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v3i2.1050 ◽

2019 ◽

Vol 3 (2) ◽

pp. 243-250

Author(s):

Heru Sukma Utama ◽

Didi Rosiyadi ◽

Bobby Suryo Prakoso ◽

Dedi Ariadarma

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Opinion Mining ◽

Confusion Matrix ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Toll Road ◽

Svm Algorithm ◽

Svm Model ◽

Textual Data

Analysis of the odd even-numbered sentiment systems in Bekasi toll using the Support Vector Machine Algorithm, is a process of understanding, extracting, and processing textual data automatically from social media. The purpose of this study was to determine the level of accuracy, recall and precision of opinion mining generated using the Support Vector Machine algorithm to provide information community sentiment towards the effectiveness of the odd system of Bekasi tiolls on social media. The research method used in this study was to do text mining in comments-comments regarding posts regarding even odd oddities on Bekasi toll on Twitter, Instagram, Youtube and Facebook. The steps taken are starting from preprocessing, transformation, datamining and evaluation, followed by information gaon feature selection, select by weight and applying SVM Algorithm model. The results obtained from the study using the SVM model are obtained Confusion Matrix result, namely accuracyof 78.18%, Precision of 74.03%, and Sensitivity or Recall of 86.82%. Thus this study concludes that the use of Support Vector Machine Algorithms can analyze even odd sentiments on the Bekasi toll road.

Download Full-text

An Extension of the VSM Documents Representation using Word Embedding

Balkan Region Conference on Engineering and Business Education ◽

10.1515/cplbu-2017-0033 ◽

2017 ◽

Vol 2 (1) ◽

pp. 249-257

Author(s):

Daniel Morariu ◽

Lucian Vințan ◽

Radu Crețulescu

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Language Processing ◽

Learning Algorithm ◽

Word Embedding ◽

Support Vector ◽

Science Students ◽

Document Representation ◽

Learning Time ◽

Educational Paradigm

Abstract In this paper, we will present experiments that try to integrate the power of Word Embedding representation in real problems for documents classification. Word Embedding is a new tendency used in the natural language processing domain that tries to represent each word from the document in a vector format. This representation embeds the semantically context in that the word occurs more frequently. We include this new representation in a classical VSM document representation and evaluate it using a learning algorithm based on the Support Vector Machine. This new added information makes the classification to be more difficult because it increases the learning time and the memory needed. The obtained results are slightly weaker comparatively with the classical VSM document representation. By adding the WE representation to the classical VSM representation we want to improve the current educational paradigm for the computer science students which is generally limited to the VSM representation.

Download Full-text

Human-Centered A.I. and Security Primitives

Journal of Computer Science Research ◽

10.30564/jcsr.v2i4.2534 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Alex Mathew

Keyword(s):

Artificial Intelligence ◽

Support Vector Machine ◽

Natural Language Processing ◽

Speech Recognition ◽

Mathematical Models ◽

Image Recognition ◽

Language Processing ◽

Modern World ◽

Support Vector ◽

Answering Questions

The paper reviews how human-centered artificial intelligence and security primitive have influenced life in the modern world and how it’s useful in the future. Human-centered A.I. has enhanced our capabilities by the way of intelligence, human informed technology. It has created a technology that has made machines and computer intelligently carry their function. The security primitive has enhanced the safety of the data and increased accessibility of data from anywhere regardless of the password is known. This has improved personalized customer activities and filled the gap between the human-machine. This has been successful due to the usage of heuristics which solve belowems by experimental, support vector machine which evaluates and group the data, natural language processing systems which change speech to language. The results of this will lead to image recognition, games, speech recognition, translation, and answering questions. In conclusion, human-centered A.I. and security primitives is an advanced mode of technology that uses statistical mathematical models that provides tools to perform certain work. The results keep on advancing and spreading with years and it will be common in our lives.

Download Full-text

Text classification based on gated recurrent unit combines with support vector machine

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i4.pp3734-3742 ◽

2020 ◽

Vol 10 (4) ◽

pp. 3734

Author(s):

Muhammad Zulqarnain ◽

Rozaida Ghazali ◽

Yana Mazwin Mohmad Hassim ◽

Muhammad Rehan

Keyword(s):

Support Vector Machine ◽

Language Processing ◽

Text Classification ◽

Classification Model ◽

Support Vector ◽

Text Data ◽

Intelligent Technique ◽

Learning Tasks ◽

The One ◽

Gated Recurrent Unit

As the amount of unstructured text data that humanity produce largely and a lot of texts are grows on the Internet, so the one of the intelligent technique is require processing it and extracting different types of knowledge from it. Gated recurrent unit (GRU) and support vector machine (SVM) have been successfully used to Natural Language Processing (NLP) systems with comparative, remarkable results. GRU networks perform well in sequential learning tasks and overcome the issues of “vanishing and explosion of gradients in standard recurrent neural networks (RNNs) when captureing long-term dependencies. In this paper, we proposed a text classification model based on improved approaches to this norm by presenting a linear support vector machine (SVM) as the replacement of Softmax in the final output layer of a GRU model. Furthermore, the cross-entropy function shall be replaced with a margin-based function. Empirical results present that the proposed GRU-SVM model achieved comparatively better results than the baseline approaches BLSTM-C, DABN.

Download Full-text

Natural Language Processing and Machine Learning Classifier used for Detecting the Author of the Sentence

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4098.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 936-939 ◽

Cited By ~ 6

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Support Vector ◽

Learning Classifier ◽

Number Of Classes

Detecting the author of the sentence in a collective document can be done by choosing a suitable set of features and implementing using Natural Language Processing in Machine Learning. Training our machine is the basic idea to identify the author name of a specific sentence. This can be done by using 8 different NLP steps like applying stemming algorithm, finding stop-list words, preprocessing the data, and then applying it to a machine learning classifier-Support vector machine (SVM) which classify the dataset into a number of classes specifying the author of the sentence and defines the name of author for each and every sentence with an accuracy of 82%.This paper helps the readers who are interested in knowing the names of the authors who have written some specific words

Download Full-text

ANALISIS SENTIMEN KOMENTAR FACEBOOK BERBASIS LEXICON DAN SUPPORT VECTOR MACHINE

SAINTEKBU ◽

10.32764/saintekbu.v12i2.855 ◽

2020 ◽

Vol 12 (2) ◽

pp. 40-44

Author(s):

Iin Kurniasari

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Support Vector

Facebook adalah salah satu media sosial yang sering digunakan. Terutama pada pandemi co-19 saat ini. Banyak sekali sentimen publik yang beredar, terutama di Facebook dalam bentuk komentar atas informasi yang ada tentang covid-19 yang menantang untuk dianalisis untuk beberapa tujuan. Teknik NLP (Natural Language Processing) yang terdiri dari casefolding, tokenizing, filtering dan stemming dapat digunakan dalam kasus ini. Studi ini berfokus pada pengembangan analisis sentimen di Facebook menggunakan Lexicon dan Support Vector Machine. Data Lexicon yang diperoleh memiliki akurasi lebih rendah daripada menggunakan Support Vector Machine.

Download Full-text

Classification of Current Procedural Terminology Codes from Electronic Health Record Data Using Machine Learning

Anesthesiology ◽

10.1097/aln.0000000000003150 ◽

2020 ◽

Vol 132 (4) ◽

pp. 738-749 ◽

Cited By ~ 1

Author(s):

Michael L. Burns ◽

Michael R. Mathis ◽

John Vandervest ◽

Xinyu Tan ◽

Bo Lu ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Quality Improvement ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Support Vector ◽

Current Procedural Terminology ◽

Test Dataset ◽

Quality Improvement Research

Abstract Background Accurate anesthesiology procedure code data are essential to quality improvement, research, and reimbursement tasks within anesthesiology practices. Advanced data science techniques, including machine learning and natural language processing, offer opportunities to develop classification tools for Current Procedural Terminology codes across anesthesia procedures. Methods Models were created using a Train/Test dataset including 1,164,343 procedures from 16 academic and private hospitals. Five supervised machine learning models were created to classify anesthesiology Current Procedural Terminology codes, with accuracy defined as first choice classification matching the institutional-assigned code existing in the perioperative database. The two best performing models were further refined and tested on a Holdout dataset from a single institution distinct from Train/Test. A tunable confidence parameter was created to identify cases for which models were highly accurate, with the goal of at least 95% accuracy, above the reported 2018 Centers for Medicare and Medicaid Services (Baltimore, Maryland) fee-for-service accuracy. Actual submitted claim data from billing specialists were used as a reference standard. Results Support vector machine and neural network label-embedding attentive models were the best performing models, respectively, demonstrating overall accuracies of 87.9% and 84.2% (single best code), and 96.8% and 94.0% (within top three). Classification accuracy was 96.4% in 47.0% of cases using support vector machine and 94.4% in 62.2% of cases using label-embedding attentive model within the Train/Test dataset. In the Holdout dataset, respective classification accuracies were 93.1% in 58.0% of cases and 95.0% among 62.0%. The most important feature in model training was procedure text. Conclusions Through application of machine learning and natural language processing techniques, highly accurate real-time models were created for anesthesiology Current Procedural Terminology code classification. The increased processing speed and a priori targeted accuracy of this classification approach may provide performance optimization and cost reduction for quality improvement, research, and reimbursement tasks reliant on anesthesiology procedure codes. Editor’s Perspective What We Already Know about This Topic What This Article Tells Us That Is New

Download Full-text

Sarcasm Detection in Text Data Using Glove Embedding

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37663 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2495-2499

Author(s):

Samrudhi Naik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Processing Technique ◽

Text Data ◽

Natural Language Processing Technique

Abstract: Sarcasm is a way of expressing feelings in which people say or write something which is completely different or opposite to what they actually mean to say. Hence it is very difficult to identify sarcasm . It is usually an ironic or satirical remark tempered by humor. Mainly, people use it to say the opposite of what's true to make someone look or feel foolish. Understanding the sarcasm can improve the accuracy of sentiment analysis. Sentiment analysis (or opinion mining) is a natural language processing technique used to determine whether data is positive, negative or neutral. This helps in identifying what the opinions of users or individual or society are. In this project an attempt is made to develop a model to detect if a sentence is sarcastic or if it is not sarcastic. Keywords: Sarcasm detection, GloVe Embedding, LSTM, Natural Language Processing, Sentiment

Download Full-text

The Applications of Support Vector Machine in Natural Language Processing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2572 ◽

2013 ◽

Vol 427-429 ◽

pp. 2572-2575

Author(s):

Xiao Hua Li ◽

Shu Xian Liu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Basic Knowledge ◽

Support Vector

This article provides a brief introduction to Natural Language Processing and basic knowledge of Machine Learning and Support Vector Machine at first, and then, gives a more detailed introduction about how to use SVM models in several major directions about NLP, and at the end, a brief summary about the application of SVM in Natural Language Processing is given.

Download Full-text