Rule Based Morphological Variation Removable Stemming Algorithm

doi:10.35940/ijrte.c6200.118419

Rule Based Morphological Variation Removable Stemming Algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6200.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1809-1814

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Morphological Variation ◽

Morphological Changes ◽

The Internet ◽

Rule Based ◽

The People ◽

Stop Word ◽

Preprocessing Technique ◽

Better Than

Sentiment analysis is a technique to analyze the people opinion, attitude, sentiment and emotion towards any particular object. Sentiment analysis has the following steps to predict the opinion of a review sentences. The steps are preprocessing, feature selection, classification and sentiment prediction. Preprocessing is the main important step and it consists of many techniques. They are Stop word Removal, punctuation removal, conversion of numbers to number names. Stemming is another important preprocessing technique which is used to transform the words in text into their grammatical root form and is mainly used to improve the retrieval of the information from the internet. It is applied mainly to get strengthen the retrieval of the information. Many morphological languages have immense amount of morphological deviation in the words. It triggered vast challenges. Many algorithms exist with different techniques and has several drawbacks. The aim of this paper is to propose a rule based stemmer that is a truncating stemmer. The new stemming mechanism in this paper has brought about many morphological changes. The new rule based morphological variation removable stemming algorithm is better than the existing other algorithms such as New Porter, Paice/Lovins and Lancaster stemming algorithm

Download Full-text

A Rule Based Stemmer

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9545.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2026-2029

Keyword(s):

Sentiment Analysis ◽

Web Mining ◽

Web Content ◽

Rule Based ◽

Web Content Mining ◽

Digital World ◽

Stop Word ◽

Enormous Amount ◽

Content Mining ◽

Preprocessing Technique

The present digital world generates enormous amount of data instantaneously. The need to effectively mine knowledge seems to be the need of the hour. Sentiment Analysis, a part of web content mining which is a subpart of web mining has gained momentum in the field of research. It analyses the opinion of variety of people all over the world. Sentiment Analysis encompasses preprocessing, feature selection, classification and sentiment prediction. Preprocessing is an important process and it deals with many techniques. Stop word removal, punctuation removal, conversion of numbers to number names are some of the basic techniques. Stemming is yet another important preprocessing technique that reduces the different words form to its root. There are basically three types of stemmers namely truncating, statistical and hybrid. The aim of this paper is to propose a rule based stemmer that is a truncating stemmer. It deals with rules for truncation and replacement. The data given as input passes through a series of rules. If the condition specified gets satisfied then the associated rule gets executed otherwise the input is checked with the next rule and the process continues further. The result of execution is stemmed words. The performance of the proposed rule based stemmer is compared with the existing stemmers under the same rule based category namely Porter and Lancaster. Various metrics have been used for evaluation. The observations reveal the fact that the proposed stemmer out performs the Porter and Lancaster stemmers in terms of correctly stemmed words factor and shows a good average conflation factor and lesser over stemming and under stemming errors.

Download Full-text

Sentiment Analysis through Machine Learning

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.56.3.32 ◽

2021 ◽

Vol 56 (3) ◽

pp. 384-393

Author(s):

Md. Abbas Ali Khan ◽

Ali-Emran ◽

Md. Alamgir Kabir ◽

Mohammad Hanif Ali ◽

A. K. M. Fazlul Haque

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Short Distance ◽

The Internet ◽

Text Document ◽

Ride Sharing ◽

Wheeled Vehicle ◽

The People ◽

Day By Day ◽

Rainy Days

In recent years, App-Based Transportation System (ABTS) like Ride Sharing (Uber, Patho) has become popular day by day. For our daily life, a rickshaw (a 3-wheeled vehicle usually for one or two passengers that one man pulls) is most important for a short distance. If we add this vehicle to our ABTS system, it will be very much helpful for us, specifically for the rainy season in Bangladesh. On heavy rainy days, in our city Dhaka, other vehicles like CNG, cars, and bikes become unused because roads go underwater. However, the man who pulled the rickshaw can serve this condition. It is more important than the conventional rickshaw is unable to provide such service properly. In this regard, we are proposing an App-Based Rickshaw (ABR), which is convenient to get over distance through the internet. To do this, we have collected data through close questionnaires’ from several types of people. In contrast, collected data are based on a text document. So our aim is to Sentiment Analysis (SA) of the people through machine learning and checks the feasibility of applicability in the real world.

Download Full-text

Analisis Sentimen Terhadap Review Film Menggunakan Metode Modified Balanced Random Forest dan Mutual Information

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i2.2844 ◽

2021 ◽

Vol 5 (2) ◽

pp. 415

Author(s):

Firdausi Nuzula Zamzami ◽

Adiwijaya Adiwijaya ◽

Mahendra Dwifebri P

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Mutual Information ◽

Sentiment Analysis ◽

Information Exchange ◽

The Internet ◽

Machine Learning Method ◽

Learning Method ◽

Internet Information

Information exchange is currently the most happening on the internet. Information exchange can be done in many ways, such as expressing expressions on social media. One of them is reviewing a film. When someone reviews a film he will use his emotions to express their feelings, it can be positive or negative. The fast growth of the internet has made information more diverse, plentiful and unstructured. Sentiment analysis can handle this, because sentiment analysis is a classification process to understand opinions, interactions, and emotions of a document or text that is carried out automatically by a computer system. One suitable machine learning method is the Modified Balanced Random Forest. To deal with the various data, the feature selection used is Mutual Information. With these two methods, the system is able to produce an accuracy value of 79% and F1-scores value of 75%.

Download Full-text

Data’s Intimacy

Technologies of Speculation ◽

10.18574/nyu/9781479860234.003.0005 ◽

2020 ◽

pp. 76-113

Author(s):

Sun-ha Hong

Keyword(s):

Internet Of Things ◽

The Self ◽

The Internet ◽

Care Of The Self ◽

The People ◽

The World ◽

Smart Machines ◽

Market Driven ◽

The Internet Of Things ◽

Better Than

Today, machines observe, record, and sense the world—not just for us but also often instead of us and indifferently to our meaning. The intertwined problems of technological knowledge and (our) knowledge of technology manifest in the growing industry of smart machines, the Internet of Things, and other means for self-tracking. The automation of the care of the self is buoyed by a popular fantasy of data’s intimacy, of machines that know you better than yourself. Yet as the technology becomes normalized, the hacker ethic gives way to a market-driven shift in which more and more of “my” personal truth is colonized by machines (and the people behind the machines) that I cannot question.

Download Full-text

Sentiment Analysis of Chinese Micro Blog Using Machine Learning and an Improved Feature Selection Method

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.1219 ◽

2014 ◽

Vol 631-632 ◽

pp. 1219-1223

Author(s):

Jia Hao Chen ◽

Jian Hua Wu

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Sentiment Analysis ◽

Rapid Development ◽

Feature Selection Method ◽

Selection Method ◽

Media Services ◽

Social Media Service ◽

Better Than

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.

Download Full-text

A comparative study of sentiment analysis using SVM and SentiWordNet

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i3.pp902-909 ◽

2019 ◽

Vol 13 (3) ◽

pp. 902 ◽

Cited By ~ 7

Author(s):

Mohammad Fikri ◽

Riyanarto Sarno

Keyword(s):

Sentiment Analysis ◽

Extraction Method ◽

Support Vector ◽

The Internet ◽

Imbalanced Dataset ◽

Rule Based ◽

Inverse Document Frequency ◽

Feature Extraction Method ◽

Document Frequency ◽

Svm Algorithm

<p><span>Sentiment analysis has grown rapidly which impact on the number of services using the internet popping up in Indonesia. In this research, the sentiment analysis uses the rule-based method with the help of SentiWordNet and Support Vector Machine (SVM) algorithm with Term Frequency–Inverse Document Frequency (TF-IDF) as feature extraction method. Since the number of sentences in positive, negative and neutral classes is imbalanced, the oversampling method is implemented. For imbalanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 56% and 76%, respectively. However, for the balanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 52% and 89%, respectively.</span></p>

Download Full-text

Sarcasm Revealing using Rule Based Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2978.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 2104-2108

Keyword(s):

Sentiment Analysis ◽

Social Phenomena ◽

Rule Based ◽

The People ◽

The One

Sentiment analysis is the process of finding out whether one's opinion is positive, negative, or neutral. Now-a-days the people are telling their opinion about the fields like marketing product, political and social phenomena are mostly through the online. Their opinions are positive, negative or neutral. The machine to identify the opinion is very difficult. There are so many issues in this field. The one of the issue is sarcasm detection. Sometimes the people give their opinion sarcastically. Sarcastic means, an opinion of an object is to say positive instead of negative. The machine will take this opinion as positive. So the final polarity of the product will be wrong due to this kind of identification. The purpose of this paper is to find these types of sentences and correct the polarity value.

Download Full-text

Metode Seleksi Fitur Untuk Klasifikasi Sentimen Menggunakan Algoritma Naive Bayes: Sebuah Literature Review

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.2983 ◽

2021 ◽

Vol 5 (3) ◽

pp. 799

Author(s):

Fitria Septianingrum ◽

Agung Susilo Yuda Irawan

Keyword(s):

Feature Selection ◽

Literature Review ◽

Sentiment Analysis ◽

Industrial Revolution ◽

Naive Bayes ◽

Feature Selection Method ◽

Naïve Bayes ◽

Digital Data ◽

The Internet ◽

Bayes Algorithm

In the era of the industrial revolution 4.0 as it is today, where the internet is a necessity for people to live their daily lives. The high intensity of internet use in the community, it causes the distribution of information in it to spread widely and quickly. The rapid distribution of information on the internet is also in line with the growing growth of digital data, so that the public opinions contained therein become important things. Because, from this digital data, it can be processed with sentiment analysis in order to obtain useful information about issues that are developing in the community or to find out public opinion on a company's product. The number of studies related to sentiment analysis that applies the Naive Bayes algorithm to solve the problem, so researchers are interested in conducting research on the use of feature selection for the algorithm. Therefore, this research was conducted to determine what feature selection is the most optimal when combined with the Naive Bayes algorithm using the Systematic Literature Review (SLR) research method. The results of this study concluded that the most optimal feature selection method when combined with the Naive Bayes algorithm is the Particle Swarm Optimization (PSO) method with an average accuracy value of 89.08%.

Download Full-text

Sentiment Analysis of Japanese Tourism Online Reviews

Journal of Data and Information Science ◽

10.2478/jdis-2019-0005 ◽

2019 ◽

Vol 4 (1) ◽

pp. 89-113

Author(s):

Chuanming Yu ◽

Xingyu Zhu ◽

Bolin Feng ◽

Lin Cai ◽

Lu An

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Large Scale ◽

Empirical Studies ◽

Online Reviews ◽

Learning Technology ◽

Rule Based ◽

Document Frequency ◽

Overall Performance ◽

Tourism Attractions

AbstractPurposeOnline reviews on tourism attractions provide important references for potential tourists to choose tourism spots. The main goal of this study is conducting sentiment analysis to facilitate users comprehending the large scale of the reviews, based on the comments about Chinese attractions from Japanese tourism website 4Travel.Design/methodology/approachDifferent statistics- and rule-based methods are used to analyze the sentiment of the reviews. Three groups of novel statistics-based methods combining feature selection functions and the traditional term frequency-inverse document frequency (TF-IDF) method are proposed. We also make seven groups of different rules-based methods. The macro-average and micro-average values for the best classification results of the methods are calculated respectively and the performance of the methods are shown.FindingsWe compare the statistics-based and rule-based methods separately and compare the overall performance of the two method. According to the results, it is concluded that the combination of feature selection functions and weightings can strongly improve the overall performance. The emotional vocabulary in the field of tourism (EVT), kaomojis, negative and transitional words can notably improve the performance in all of three categories. The rule-based methods outperform the statistics-based ones with a narrow advantage.Research limitationTwo limitations can be addressed: 1) the empirical studies to verify the validity of the proposed methods are only conducted on Japanese languages; and 2) the deep learning technology is not been incorporated in the methods.Practical implicationsThe results help to elucidate the intrinsic characteristics of the Japanese language and the influence on sentiment analysis. These findings also provide practical usage guidelines within the field of sentiment analysis of Japanese online tourism reviews.Originality/valueOur research is of practicability. Currently, there are no studies that focus on the sentiment analysis of Japanese reviews about Chinese attractions.

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text