Pengembangan Modul PreprocessingTeks untuk Kasus Formalisasi dan Pengecekan Ejaan Bahasa Indonesia pada Aplikasi Web Mining Simple Solution (WMSS)

Abstract Data of social media currently has been much used to analyze both sentiment analysis and another analysis. In fact, data that is obtained from the social media in generally has some mistakes which can influence the spelling in writing of words. The solution offered is word formalization and spelling check. Based on the problem, it will be built a preprocessing model to overcome two the mistakes. The method that will be used in formalization is to change the words to be formal form based on KBBI, while the method used for spelling check is spelling correction. Spelling correction method consists of distance edit, bigram and distance edit rule. In this study, in addition the application of both methods, also it will be analyzed comparing the result of spelling correction. From the result of analysis shows that distance edit rule has higher accuracy, namely 83.39% than using both edit distance and bigram method. In addition, edit distance rule method also has faster performance than another both methods. Overall, method to change word to formal word were based on KBBI and spelling correction has been able to overcome the problem of two cases, such that it can increase accuracy of the result of the analysis. Keywords: preprocessing, spelling correction, edit distance, bigram AbstrakData media sosial saat ini telah banyak digunakan untuk melakukan analisis baik analisis sentimen maupun analisis terkait lainnya. Nyatanya, data yang diperoleh dari media sosial tersebut pada umumnya memiliki kesalahan yang akan mempengaruhi hasil analisis. Kesalahan tersebut berupa penggunaan kata yang tidak baku dan adanya kesalahan ejaan dalam penulisan kata. Solusi yang ditawarkan berupa formalisasi kata dan pengecekan ejaan. Berdasarkan masalah tersebut, akan dibangun modul preprocessing untuk mengatasi dua kesalahan di atas. Metode yang digunakan pada formalisasi adalah mengubah kata ke bentuk formal berdasarkan KBBI sedangkan metode yang digunakan pada pengecekan ejaan adalah spelling correction. Metode spelling correction tersebut terdiri dari tiga yaitu edit distance, bigram dan edit distance + rule. Pada penelitian ini, selain penerapan kedua metode juga akan dilakukan analisis untuk melihat perbandingan hasil pada metode spelling correction. Dari hasil analisis tersebut, diketahui bahwa metode edit distance + rule memiliki akurasi yang lebih tinggi yaitu sebesar 83,39% dibandingkan dengan kedua metode lainnya yaitu edit distance dan bigram. Selain itu, metode edit distance + rule juga memiliki performa tercepat dibandingkan kedua metode lainnya. Secara keseluruhan, metode mengubah kata ke bentuk formal berdasarkan KBBI dan spelling correction telah mampu mengatasi masalah pada dua kasus di atas sehingga dapat meningkatkan akurasi hasil analisis. Kata Kunci:preprocessing, spelling correction, edit distance, bigram

Download Full-text

Pengembangan Modul Preprocessing Teks untuk Kasus Formalisasi dan Pengecekan Ejaan Bahasa Indonesia pada Aplikasi Web Mining Simple Solution (WMSS)

Jurnal Matematika Statistika dan Komputasi ◽

10.20956/jmsk.v15i2.5574 ◽

2018 ◽

Vol 15 (2) ◽

pp. 92

Author(s):

Umi Chuzaimah Chuzaimah Zulkifli

Keyword(s):

Web Mining ◽

Edit Distance ◽

Simple Solution ◽

Spelling Correction ◽

Bahasa Indonesia

Data media sosial saat ini telah banyak digunakan untuk melakukan analisis baik analisis sentimen maupun analisis terkait lainnya. Nyatanya, data yang diperoleh dari media sosial tersebut pada umumnya memiliki kesalahan yang akan mempengaruhi hasil analisis. Kesalahan tersebut berupa penggunaan kata yang tidak baku dan adanya kesalahan ejaan dalam penulisan kata. Solusi yang ditawarkan berupa formalisasi kata dan pengecekan ejaan. Berdasarkan masalah tersebut, akan dibangun modul preprocessing untuk mengatasi dua kesalahan di atas. Metode yang digunakan pada formalisasi adalah mengubah kata ke bentuk formal berdasarkan KBBI sedangkan metode yang digunakan pada pengecekan ejaan adalah spelling correction. Metode spelling correction tersebut terdiri dari tiga yaitu edit distance, bigram dan edit distance + rule. Pada penelitian ini, selain penerapan kedua metode juga akan dilakukan analisis untuk melihat perbandingan hasil pada metode spelling correction. Dari hasil analisis tersebut, diketahui bahwa metode edit distance + rule memiliki akurasi yang lebih tinggi yaitu sebesar 83,39% dibandingkan dengan kedua metode lainnya yaitu edit distance dan bigram. Selain itu, metode edit distance + rule juga memiliki performa tercepat dibandingkan kedua metode lainnya. Secara keseluruhan, metode mengubah kata ke bentuk formal berdasarkan KBBI dan spelling correction telah mampu mengatasi masalah pada dua kasus di atas sehingga dapat meningkatkan akurasi hasil analisis.

Download Full-text

Using Social Media in Tourist Sentiment Analysis: A Case Study of Andalusia during the Covid-19 Pandemic

Sustainability ◽

10.3390/su13073836 ◽

2021 ◽

Vol 13 (7) ◽

pp. 3836

Author(s):

David Flores-Ruiz ◽

Adolfo Elizondo-Salto ◽

María de la O. Barroso-González

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Cost Savings ◽

Mass Scale ◽

Tourism Sector ◽

The Social ◽

Analytical Tools ◽

Media Data

This paper explores the role of social media in tourist sentiment analysis. To do this, it describes previous studies that have carried out tourist sentiment analysis using social media data, before analyzing changes in tourists’ sentiments and behaviors during the COVID-19 pandemic. In the case study, which focuses on Andalusia, the changes experienced by the tourism sector in the southern Spanish region as a result of the COVID-19 pandemic are assessed using the Andalusian Tourism Situation Survey (ECTA). This information is then compared with data obtained from a sentiment analysis based on the social network Twitter. On the basis of this comparative analysis, the paper concludes that it is possible to identify and classify tourists’ perceptions using sentiment analysis on a mass scale with the help of statistical software (RStudio and Knime). The sentiment analysis using Twitter data correlates with and is supplemented by information from the ECTA survey, with both analyses showing that tourists placed greater value on safety and preferred to travel individually to nearby, less crowded destinations since the pandemic began. Of the two analytical tools, sentiment analysis can be carried out on social media on a continuous basis and offers cost savings.

Download Full-text

Polarity Classification of Arabic Sentiments

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016070103 ◽

2016 ◽

Vol 11 (3) ◽

pp. 32-49 ◽

Cited By ~ 5

Author(s):

Mohammed N. Al-Kabi ◽

Heider A. Wahsheh ◽

Izzat M. Alsmadi

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Operating Characteristic ◽

Opinion Mining ◽

Online Social Network ◽

The Social ◽

Polarity Classification ◽

Arabic Sentiment Analysis ◽

Modern Standard

Sentiment Analysis/Opinion Mining is associated with social media and usually aims to automatically identify the polarities of different points of views of the users of the social media about different aspects of life. The polarity of a sentiment reflects the point view of its author about a certain issue. This study aims to present a new method to identify the polarity of Arabic reviews and comments whether they are written in Modern Standard Arabic (MSA), or one of the Arabic Dialects, and/or include Emoticons. The proposed method is called Detection of Arabic Sentiment Analysis Polarity (DASAP). A modest dataset of Arabic comments, posts, and reviews is collected from Online social network websites (i.e. Facebook, Blogs, YouTube, and Twitter). This dataset is used to evaluate the effectiveness of the proposed method (DASAP). Receiver Operating Characteristic (ROC) prediction quality measurements are used to evaluate the effectiveness of DASAP based on the collected dataset.

Download Full-text

The Effects of Different Kernels in SVM Sentiment Analysis on Mass Social Distancing

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2020.v09.i02.p01 ◽

2020 ◽

Vol 9 (2) ◽

pp. 161

Author(s):

Komang Dhiyo Yonatha Wijaya ◽

Anak Agung Istri Ngurah Eka Karyawati

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Kernel Method ◽

Idle Time ◽

Linear Kernel ◽

Social Distancing ◽

The Social ◽

Negative Sentiment ◽

F Measure ◽

Kernel Yield

During this pandemic, social media has become a major need as a means of communication. One of the social medias used is Twitter by using messages referred to as tweets. Indonesia currently undergoing mass social distancing. During this time most people use social media in order to spend their idle time However, sometimes, this result in negative sentiment that used to insult and aimed at an individual or group. To filter that kind of tweets, a sentiment analysis was performed with SVM and 3 different kernel method. Tweets are labelled into 3 classes of positive, neutral, and negative. The experiments are conducted to determine which kernel is better. From the sentiment analysis that has been performed, SVM linear kernel yield the best score Some experiments show that the precision of linear kernel is 57%, recall is 50%, and f-measure is 44%

Download Full-text

Sentiment Analysis for the Social Media

Proceedings of the SouthEast Conference on - ACM SE '17 ◽

10.1145/3077286.3077569 ◽

2017 ◽

Cited By ~ 2

Author(s):

Elif Uysal ◽

Semih Yumusak ◽

Kasim Oztoprak ◽

Erdogan Dogdu

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

The Social

Download Full-text

Reviewing Sentiment Analysis at the Shallow End

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8274 ◽

2020 ◽

Vol 8 (4) ◽

pp. 47-62

Author(s):

Francisca Oladipo ◽

Ogunsanya, F. B ◽

Musa, A. E. ◽

Ogbuju, E. E ◽

Ariwa, E.

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Information Exchange ◽

Training Data ◽

Data Set ◽

The Social ◽

Machine Learning Approach ◽

Media Space ◽

Social Media Platforms

The social media space has evolved into a large labyrinth of information exchange platform and due to the growth in the adoption of different social media platforms, there has been an increasing wave of interests in sentiment analysis as a paradigm for the mining and analysis of users’ opinions and sentiments based on their posts. In this paper, we present a review of contextual sentiment analysis on social media entries with a specific focus on Twitter. The sentimental analysis consists of two broad approaches which are machine learning which uses classification techniques to classify text and is further categorized into supervised learning and unsupervised learning; and the lexicon-based approach which uses a dictionary without using any test or training data set, unlike the machine learning approach.

Download Full-text

Bi-Lingual (English, Punjabi) Sarcastic Sentiment Analysis by using Classification Methods

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i8053.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 1383-1388

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Main Step ◽

Online Data ◽

Social Media Data ◽

The Social ◽

Negative Comments ◽

Regional Languages ◽

Media Data ◽

Day By Day

Sentiment analysis is one of the heated topic in the field of text mining. As the social media data is increased day by day the main need of the data scientists is to classify the data so that it can be further used for decision making or knowledge discovery. Now –a-days everything and everyone available online so to check the latest trends in business or in daily life one must consider the online data. The main focus of sentiment analysis is to focus on positive or negative comments so that a well define picture is created that what is trending or not but the sarcasm manipulates the data as in sarcastic comment negative comment consider as positive because of the presence of positive words in the comment or data so it is necessary to detect the sarcasm in online data . The data on social media is available in various languages so sentiment analysis in regional languages is also a main step . In the proposed work we focus on two languages i.e Punjabi and English. Here we use deep learning based neural networks for the sarcasm detection in English as well as Punjabi language. In the proposed work we consider three datasets i.e. balanced English dataset, Balanced Punjabi Dataset and unbalanced Punjabi dataset. We used six different models to check the accuracy of the classified data the models we used are LSTM with word embedding layer, BiLSTM with , LSTM+LSTM, BiLSTM+BiLSTM, LSTM+BiLSTM, CNN respectively. LSTM provide better accuracy for balanced Punjabi and English dataset i.e. 95.63% and 94.17% respectively. The accuracy for unbalanced Punjabi dataset is provided by BiLSTM i.e.96.31%.

Download Full-text

Visual Sentiment Analysis on Social Media Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2174101 ◽

2021 ◽

pp. 366-372

Author(s):

Harshala Bhoir ◽

K. Jayamalini

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

State Of The Art ◽

Social Media Data ◽

Media Image ◽

The Social ◽

Textual Data ◽

Art Works ◽

Negative Sentiment ◽

Media Data

Visual sentiment analysis is the way to automatically recognize positive and negative emotions from images, videos, graphics, stickers etc. To estimate the polarity of the sentiment evoked by images in terms of positive or negative sentiment, most of the state-of-the-art works exploit the text associated to a social post provided by the user. However, such textual data is typically noisy due to the subjectivity of the user which usually includes text useful to maximize the diffusion of the social post. Proposed system will extract and employ an Objective Text description of images automatically extracted from the visual content rather than the classic Subjective Text provided by the user. The proposed System will extract three views visual view, subjective text view and objective text view of social media image and will give sentiment polarity positive, negative or neutral based on hypothesis table.

Download Full-text

Komparasi Algoritma Naive Bayes Dengan Algoritma Genetika Pada Analisis Sentimen Pengguna Busway

Jurnal Teknik Komputer ◽

10.31294/jtk.v5i2.5406 ◽

2019 ◽

Vol 5 (2) ◽

pp. 227-234

Author(s):

Riska Aryanti ◽

Atang Saepudin ◽

Eka Fitriani ◽

Rifky Permana ◽

Dede Firmansyah Saefudin

Keyword(s):

Genetic Algorithm ◽

Social Media ◽

Sentiment Analysis ◽

Web Site ◽

Naive Bayes ◽

Naïve Bayes ◽

Classification Algorithm ◽

The Social ◽

The Will ◽

Research Classification

Congestion major cities in Indonesi caused by the proliferation of the use of private vehicles. Some expressing he thinks about busway user through the social media and other web site, This opinion can be used as a sentiment analysis to see if the user busway proposes a review of positive or negative. The results of the analysis sentiment can help in the sight of and evaluate the use of busway, also expected to improve and transjakarta facility from so they tend to have an opinion positive. Based on the results of the analysis, sentiment it is hoped people will switch to using the will of course will reduce congestion. In the study also added the stages preprocesing by using the framework gataframework to complete the process that cannot be done on tools rapidminer. The methodology that was used in this research was it is anticipated that analysis the sentiment of the by the application of an genetic algorithm for an election features with an algorithm naive bayes. From the results of the testing to the case in research it is found that classification algorithm naive bayes based genetic algorithm having the kind of accuracy that good enough 88,55 % and value of auc reached 0,813 % with the level of the diagnosis classifications good. So that in this research classification algorithm naive bayes based genetic algorithm can be recommended as algorithms classifications good enough to analyze the busway user sentimen. Based on analysis is expected to private transport users will switch to using the busway will reduce congestion

Download Full-text

SCRAPING OF SOCIAL MEDIA DATA USING PYTHON-3 AND PERFORMING DATA ANALYTICS USING MICROSOFT POWER BI.

June-2020 - International Journal of Engineering Sciences & Research Technology ◽

10.29121/ijesrt.v9.i7.2020.8 ◽

2020 ◽

Vol 9 (7) ◽

pp. 66-79

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Data Analytics ◽

Marketing Strategies ◽

Social Media Data ◽

The Social ◽

The Right ◽

Media Data ◽

Course Of Study

The manifestation of humanity is driven by fulfillment of desires. These desires are satiated by the society and its resources. But after the advent of social media the societal boundaries have shrunken but desires haven’t, hence the desires are now fulfilled through social media. The aforementioned phenomenon was recognized by the business plutocrats very early and have started to satisfy human desires using social media as a tool. But before satisfying the desires, the businesses needs to identify the specific desires of an individual. The identification of specific desires/needs will help the marketing agencies to develop user specific marketing strategies. These desires are explicitly available through the expressions of sentiments in the social media. The sentiment analysis can provide an insight to the desires of an individual. These patterns and insights helps the businesses to market their product to the right person. The sentiments and expressions can be captured using the scraping technique. The aforesaid points highlight’s the course of study followed by this paper and it is to perform data analytics of the social media data scraped using python.

Download Full-text