Lexicon Based Sentiment Analysis in Indonesia Languages : A Systematic Literature Review

Yuli Fauziah; Bambang Yuwono; Agus Sasmito Aribowo

doi:10.31098/cset.v1i1.397

Lexicon Based Sentiment Analysis in Indonesia Languages : A Systematic Literature Review

RSF Conference Series: Engineering and Technology ◽

10.31098/cset.v1i1.397 ◽

2021 ◽

Vol 1 (1) ◽

pp. 363-367

Author(s):

Yuli Fauziah ◽

Bambang Yuwono ◽

Agus Sasmito Aribowo

Keyword(s):

Literature Review ◽

Sentiment Analysis ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Main Question ◽

Maximum Accuracy ◽

Stop Word ◽

Positive Sentiment ◽

Negative Sentiment

This systematic literature review aims to determine the trend of lexicon based sentiment analysis research in Indonesian Language in the last two years. The focus of the study is on the understanding of preprocessing used in lexicon-based sentiment analysis studies in the last two years, the lexicon used in these studies, and classification accuracy. The main question in this SLR : what techniques of lexicon based sentiment analysis will provide the highest accuracy. The most widely used preprocessing methods in previous research are tokenization, case conversion, stemming, remove punctuation, remove stop word, remove or replace emoji and emoticons, and normalization or slangword conversion. The sentiment labeling process in previous studies calculated based on the comparison of the number of negative sentiment keywords with positive sentiment keywords in one sentence. The maximum accuracy from previous study is 90%. The most widely used lexicon is NRC and Inset which is a lexicon dictionary in Indonesian. Knowledge of this can be used to propose a better model for lexicon based sentiment analysis in Indonesian Languages.

Download Full-text

Data Mining-based Financial Statement Fraud Detection: Systematic Literature Review and Meta-analysis to Estimate Data Sample Mapping of Fraudulent Companies Against Non-fraudulent Companies

Global Business Review ◽

10.1177/0972150920984857 ◽

2021 ◽

pp. 097215092098485

Author(s):

Sonika Gupta ◽

Sushil Kumar Mehta

Keyword(s):

Machine Learning ◽

Data Mining ◽

Literature Review ◽

Systematic Literature Review ◽

Classification Accuracy ◽

Meta Analysis ◽

Financial Statement ◽

Research Articles ◽

Financial Statement Fraud ◽

Data Mining Techniques

Data mining techniques have proven quite effective not only in detecting financial statement frauds but also in discovering other financial crimes, such as credit card frauds, loan and security frauds, corporate frauds, bank and insurance frauds, etc. Classification of data mining techniques, in recent years, has been accepted as one of the most credible methodologies for the detection of symptoms of financial statement frauds through scanning the published financial statements of companies. The retrieved literature that has used data mining classification techniques can be broadly categorized on the basis of the type of technique applied, as statistical techniques and machine learning techniques. The biggest challenge in executing the classification process using data mining techniques lies in collecting the data sample of fraudulent companies and mapping the sample of fraudulent companies against non-fraudulent companies. In this article, a systematic literature review (SLR) of studies from the area of financial statement fraud detection has been conducted. The review has considered research articles published between 1995 and 2020. Further, a meta-analysis has been performed to establish the effect of data sample mapping of fraudulent companies against non-fraudulent companies on the classification methods through comparing the overall classification accuracy reported in the literature. The retrieved literature indicates that a fraudulent sample can either be equally paired with non-fraudulent sample (1:1 data mapping) or be unequally mapped using 1:many ratio to increase the sample size proportionally. Based on the meta-analysis of the research articles, it can be concluded that machine learning approaches, in comparison to statistical approaches, can achieve better classification accuracy, particularly when the availability of sample data is low. High classification accuracy can be obtained with even a 1:1 mapping data set using machine learning classification approaches.

Download Full-text

Development and Application of Sentiment Analysis Tools in Software Engineering: A Systematic Literature Review

Evaluation and Assessment in Software Engineering ◽

10.1145/3463274.3463328 ◽

2021 ◽

Author(s):

Martin Obaidi ◽

Jil Klünder

Keyword(s):

Software Engineering ◽

Literature Review ◽

Sentiment Analysis ◽

Systematic Literature Review ◽

Analysis Tools

Download Full-text

Sentiment Analysis in Education Domain: A Systematic Literature Review

Communications in Computer and Information Science - Technologies and Innovation ◽

10.1007/978-3-030-00940-3_21 ◽

2018 ◽

pp. 285-297 ◽

Cited By ~ 3

Author(s):

Karen Mite-Baidal ◽

Carlota Delgado-Vera ◽

Evelyn Solís-Avilés ◽

Ana Herrera Espinoza ◽

Jenny Ortiz-Zambrano ◽

...

Keyword(s):

Literature Review ◽

Sentiment Analysis ◽

Systematic Literature Review

Download Full-text

Analysis of Social Media Users Sentiments against Omnibus Law Based on Hashtags on Twitter

SISTEMASI ◽

10.32520/stmsi.v11i1.1685 ◽

2022 ◽

Vol 11 (1) ◽

pp. 197

Author(s):

Okta Fanny ◽

Heri Suroyo

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Main Topic ◽

Accuracy Score ◽

Test Results ◽

Bayes Classifier ◽

The Public ◽

Average Accuracy ◽

Positive Sentiment ◽

Negative Sentiment

From the research that has been done, it can be concluded that Sentiment Analysis can be used to know the sentiment of the public, especially Twitter netizens against omnibus law. After the sentiment analysis, it looks neutral artmen with the largest percentage of 55%, then positive sentiment by 35% and negative sentiment by 10%. The results of the analysis showed that the Naïve Bayes Classifier method provides classification test results with accuracy in Hashtag Pro with an average accuracy score of 92.1%, precision values with an average of 94.8% and recall values with an average of 90.7%. While Hashtag Counter For data classification, with an average accuracy value of 98.3%, precision value with an average of 97.6% and recall value with an average of 98.7%. The result of text cloud analysis conducted on a combination of hashtags both Hashtag pros and Hashtags cons, the dominant word appears is Omnibus Law which means that all hashtags in scrap is really discussing the main topic that is about Omnibus Law

Download Full-text

Mining social media data to investigate patient perceptions regarding DMARD pharmacotherapy for rheumatoid arthritis

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-217333 ◽

2020 ◽

Vol 79 (11) ◽

pp. 1432-1437 ◽

Cited By ~ 1

Author(s):

Chanakya Sharma ◽

Samuel Whittle ◽

Pari Delir Haghighi ◽

Frada Burstein ◽

Roee Sa'adon ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Social Media ◽

Side Effects ◽

Sentiment Analysis ◽

Computer Algorithms ◽

Social Media Platforms ◽

Synthetic Agents ◽

Positive Sentiment ◽

Negative Sentiment ◽

Media Data

ObjectivesWe hypothesise that patients have a positive sentiment regarding biological/targeted synthetic disease modifying anti-rheumatic drugs (b/tsDMARDs) and a negative sentiment towards conventional synthetic agents (csDMARDs). We analysed discussions on social media platforms regarding DMARDs to understand the collective sentiment expressed towards these medications.MethodsTreato analytics were used to download all available posts on social media about DMARDs in the context of rheumatoid arthritis. Strict filters ensured that user generated content was downloaded. The sentiment (positive or negative) expressed in these posts was analysed for each DMARD using sentiment analysis. We also analysed the reason(s) for this sentiment for each DMARD, looking specifically at efficacy and side effects.ResultsComputer algorithms analysed millions of social media posts and included 54 742 posts about DMARDs. We found that both classes had an overall positive sentiment. The ratio of positive to negative posts was higher for b/tsDMARDs (1.210) than for csDMARDs (1.048). Efficacy was the most commonly mentioned reason in posts with a positive sentiment and lack of efficacy was the most commonly mentioned reason for a negative sentiment. These were followed by the presence/absence of side effects in negative or positive posts, respectively.ConclusionsPublic opinion on social media is generally positive about DMARDs. Lack of efficacy followed by side effects were the most common themes in posts with a negative sentiment. There are clear reasons why a DMARD generates a positive or negative sentiment, as the sentiment analysis technology becomes more refined, targeted studies could be done to analyse these reasons and allow clinicians to tailor DMARDs to match patient needs.

Download Full-text

Analyzing the Sentiment of MOOC Discussion Posts

Alberta Academic Review ◽

10.29173/aar36 ◽

2019 ◽

Vol 2 (2) ◽

pp. 1-2

Author(s):

Haniya Ahmed ◽

Kenny Wong

Keyword(s):

Natural Language ◽

Sentiment Analysis ◽

Online Course ◽

Massive Open Online ◽

Massive Open Online Course ◽

Positive Sentiment ◽

Negative Sentiment

The purpose of the project is to identify common difficulties that learners may face and to understand their emotions as they progress through MOOCs. MOOC is an abbreviation for the Massive Open Online Course and the research deals with the data from ten different courses from Coursera. The data is used to extract pieces of text that students have made. Then, those certain texts are required to be sent to Google Cloud Natural Language API. This app allows users to get a sentiment analysis of a text. The main goal is to assist instructors with monitoring MOOC to make it more efficient and easier for students to progress since it assists to improve the courses. To achieve this, the first step is to gather all the data from each of the courses. Then use programming to dump all that data into one big database. The program that is used here is called Pycharm and user is required to use python and sql to aid him in dumping the data in the database. Once the database is created, coding is done to only select out the pieces of information that are needed. These texts should be where students make comments or ask questions. Next, the data is queried to send these texts to Google Cloud Natural Language API. Here, the program breaks down all the sentences to only be just words. Then the program is going to categorize each word according to whether its connotation is positive, negative or neutral. Next, all the words are sorted according to their connotations. The overall sentiment depends on the emotion that has the highest number. If positives and negatives are all balanced out then the sentiment is neutral. Sentiment scores range from -1 to 1, where -1 is the most negative, 1 is the most positive and anywhere near 0 is neutral. Positive sentiment scores indicate instructors that students are doing well on their course and neutral sentiment scores indicate that the course is balanced out with difficulties and easy tasks. However, negative sentiment is the most important to instructors since it indicates them that students are struggling and they need to improve the course.

Download Full-text

Sentiment Analysis for Malay Language: Systematic Literature Review

2018 International Conference on Information and Communication Technology for the Muslim World (ICT4M) ◽

10.1109/ict4m.2018.00063 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dini Handayani ◽

Normi Sham Awang Abu Bakar ◽

Hamwira Yaacob ◽

Mustafa Ali Abuzaraida

Keyword(s):

Literature Review ◽

Sentiment Analysis ◽

Systematic Literature Review

Download Full-text

Sentiment Analysis of Microtakaful Industry: Comparison between Indonesia and Malaysia

International Journal of Nusantara Islam ◽

10.15575/ijni.v6i1.3004 ◽

2019 ◽

Vol 6 (1) ◽

pp. 20-34 ◽

Cited By ~ 1

Author(s):

Aam Slamet Rusydiana ◽

Irman Firmansyah ◽

Lina Marlina

Keyword(s):

Sentiment Analysis ◽

Analytical Tool ◽

Public Response ◽

Public Sentiment ◽

Positive Sentiment ◽

Negative Sentiment ◽

Industry Comparison

It is important to do research on public sentiment towards microtakaful presence in a country in order to know public response to its existence. This study aimed to determine public sentiment towards microtakaful in Indonesia and in Malaysia. Data were collected from 40 articles, journals and other writings. Data were analyzed using the software Semantria as an analytical tool in the form of text. The results showed that the assessment of existence of microtakaful in Indonesia amounted to 52% of the community showed positive sentiment, 28% indicate negative sentiment and 20% indicates a neutral sentiment. While in Malaysia that 62% showed positive sentiment, 23% negative sentiment and 15% neutral sentiment.

Download Full-text