scholarly journals Rule Based Morphological Variation Removable Stemming Algorithm

2019 ◽  
Vol 8 (4) ◽  
pp. 1809-1814

Sentiment analysis is a technique to analyze the people opinion, attitude, sentiment and emotion towards any particular object. Sentiment analysis has the following steps to predict the opinion of a review sentences. The steps are preprocessing, feature selection, classification and sentiment prediction. Preprocessing is the main important step and it consists of many techniques. They are Stop word Removal, punctuation removal, conversion of numbers to number names. Stemming is another important preprocessing technique which is used to transform the words in text into their grammatical root form and is mainly used to improve the retrieval of the information from the internet. It is applied mainly to get strengthen the retrieval of the information. Many morphological languages have immense amount of morphological deviation in the words. It triggered vast challenges. Many algorithms exist with different techniques and has several drawbacks. The aim of this paper is to propose a rule based stemmer that is a truncating stemmer. The new stemming mechanism in this paper has brought about many morphological changes. The new rule based morphological variation removable stemming algorithm is better than the existing other algorithms such as New Porter, Paice/Lovins and Lancaster stemming algorithm

The present digital world generates enormous amount of data instantaneously. The need to effectively mine knowledge seems to be the need of the hour. Sentiment Analysis, a part of web content mining which is a subpart of web mining has gained momentum in the field of research. It analyses the opinion of variety of people all over the world. Sentiment Analysis encompasses preprocessing, feature selection, classification and sentiment prediction. Preprocessing is an important process and it deals with many techniques. Stop word removal, punctuation removal, conversion of numbers to number names are some of the basic techniques. Stemming is yet another important preprocessing technique that reduces the different words form to its root. There are basically three types of stemmers namely truncating, statistical and hybrid. The aim of this paper is to propose a rule based stemmer that is a truncating stemmer. It deals with rules for truncation and replacement. The data given as input passes through a series of rules. If the condition specified gets satisfied then the associated rule gets executed otherwise the input is checked with the next rule and the process continues further. The result of execution is stemmed words. The performance of the proposed rule based stemmer is compared with the existing stemmers under the same rule based category namely Porter and Lancaster. Various metrics have been used for evaluation. The observations reveal the fact that the proposed stemmer out performs the Porter and Lancaster stemmers in terms of correctly stemmed words factor and shows a good average conflation factor and lesser over stemming and under stemming errors.


2021 ◽  
Vol 56 (3) ◽  
pp. 384-393
Author(s):  
Md. Abbas Ali Khan ◽  
Ali-Emran ◽  
Md. Alamgir Kabir ◽  
Mohammad Hanif Ali ◽  
A. K. M. Fazlul Haque

In recent years, App-Based Transportation System (ABTS) like Ride Sharing (Uber, Patho) has become popular day by day. For our daily life, a rickshaw (a 3-wheeled vehicle usually for one or two passengers that one man pulls) is most important for a short distance. If we add this vehicle to our ABTS system, it will be very much helpful for us, specifically for the rainy season in Bangladesh. On heavy rainy days, in our city Dhaka, other vehicles like CNG, cars, and bikes become unused because roads go underwater. However, the man who pulled the rickshaw can serve this condition. It is more important than the conventional rickshaw is unable to provide such service properly. In this regard, we are proposing an App-Based Rickshaw (ABR), which is convenient to get over distance through the internet. To do this, we have collected data through close questionnaires’ from several types of people. In contrast, collected data are based on a text document. So our aim is to Sentiment Analysis (SA) of the people through machine learning and checks the feasibility of applicability in the real world.


2021 ◽  
Vol 5 (2) ◽  
pp. 415
Author(s):  
Firdausi Nuzula Zamzami ◽  
Adiwijaya Adiwijaya ◽  
Mahendra Dwifebri P

Information exchange is currently the most happening on the internet. Information exchange can be done in many ways, such as expressing expressions on social media. One of them is reviewing a film. When someone reviews a film he will use his emotions to express their feelings, it can be positive or negative. The fast growth of the internet has made information more diverse, plentiful and unstructured. Sentiment analysis can handle this, because sentiment analysis is a classification process to understand opinions, interactions, and emotions of a document or text that is carried out automatically by a computer system. One suitable machine learning method is the Modified Balanced Random Forest. To deal with the various data, the feature selection used is Mutual Information. With these two methods, the system is able to produce an accuracy value of 79% and F1-scores value of 75%.


Author(s):  
Sun-ha Hong

Today, machines observe, record, and sense the world—not just for us but also often instead of us and indifferently to our meaning. The intertwined problems of technological knowledge and (our) knowledge of technology manifest in the growing industry of smart machines, the Internet of Things, and other means for self-tracking. The automation of the care of the self is buoyed by a popular fantasy of data’s intimacy, of machines that know you better than yourself. Yet as the technology becomes normalized, the hacker ethic gives way to a market-driven shift in which more and more of “my” personal truth is colonized by machines (and the people behind the machines) that I cannot question.


2014 ◽  
Vol 631-632 ◽  
pp. 1219-1223
Author(s):  
Jia Hao Chen ◽  
Jian Hua Wu

With the rapid development of Internet and occurrence of social media services, many users are becoming the creators of social information. However, the normal manual work can't deal with a large number of subjective messages. As a new kind of social media service, micro blog has been widely accepted and can be used for sentiment analysis. This paper compared performances of three machine learning methods on sentiment analysis of Chinese micro blog. We also proposed an improved feature selection method that increases the accuracy of classification. Experiment results show that SVM is closed to Naïve Bayes and they are better than logistic regression in most cases.


Author(s):  
Mohammad Fikri ◽  
Riyanarto Sarno

<p><span>Sentiment analysis has grown rapidly which impact on the number of services using the internet popping up in Indonesia. In this research, the sentiment analysis uses the rule-based method with the help of SentiWordNet and Support Vector Machine (SVM) algorithm with Term Frequency–Inverse Document Frequency (TF-IDF) as feature extraction method. Since the number of sentences in positive, negative and neutral classes is imbalanced, the oversampling method is implemented. For imbalanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 56% and 76%, respectively. However, for the balanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 52% and 89%, respectively.</span></p>


Sentiment analysis is the process of finding out whether one's opinion is positive, negative, or neutral. Now-a-days the people are telling their opinion about the fields like marketing product, political and social phenomena are mostly through the online. Their opinions are positive, negative or neutral. The machine to identify the opinion is very difficult. There are so many issues in this field. The one of the issue is sarcasm detection. Sometimes the people give their opinion sarcastically. Sarcastic means, an opinion of an object is to say positive instead of negative. The machine will take this opinion as positive. So the final polarity of the product will be wrong due to this kind of identification. The purpose of this paper is to find these types of sentences and correct the polarity value.


2021 ◽  
Vol 5 (3) ◽  
pp. 799
Author(s):  
Fitria Septianingrum ◽  
Agung Susilo Yuda Irawan

In the era of the industrial revolution 4.0 as it is today, where the internet is a necessity for people to live their daily lives. The high intensity of internet use in the community, it causes the distribution of information in it to spread widely and quickly. The rapid distribution of information on the internet is also in line with the growing growth of digital data, so that the public opinions contained therein become important things. Because, from this digital data, it can be processed with sentiment analysis in order to obtain useful information about issues that are developing in the community or to find out public opinion on a company's product. The number of studies related to sentiment analysis that applies the Naive Bayes algorithm to solve the problem, so researchers are interested in conducting research on the use of feature selection for the algorithm. Therefore, this research was conducted to determine what feature selection is the most optimal when combined with the Naive Bayes algorithm using the Systematic Literature Review (SLR) research method. The results of this study concluded that the most optimal feature selection method when combined with the Naive Bayes algorithm is the Particle Swarm Optimization (PSO) method with an average accuracy value of 89.08%.


2019 ◽  
Vol 4 (1) ◽  
pp. 89-113
Author(s):  
Chuanming Yu ◽  
Xingyu Zhu ◽  
Bolin Feng ◽  
Lin Cai ◽  
Lu An

AbstractPurposeOnline reviews on tourism attractions provide important references for potential tourists to choose tourism spots. The main goal of this study is conducting sentiment analysis to facilitate users comprehending the large scale of the reviews, based on the comments about Chinese attractions from Japanese tourism website 4Travel.Design/methodology/approachDifferent statistics- and rule-based methods are used to analyze the sentiment of the reviews. Three groups of novel statistics-based methods combining feature selection functions and the traditional term frequency-inverse document frequency (TF-IDF) method are proposed. We also make seven groups of different rules-based methods. The macro-average and micro-average values for the best classification results of the methods are calculated respectively and the performance of the methods are shown.FindingsWe compare the statistics-based and rule-based methods separately and compare the overall performance of the two method. According to the results, it is concluded that the combination of feature selection functions and weightings can strongly improve the overall performance. The emotional vocabulary in the field of tourism (EVT), kaomojis, negative and transitional words can notably improve the performance in all of three categories. The rule-based methods outperform the statistics-based ones with a narrow advantage.Research limitationTwo limitations can be addressed: 1) the empirical studies to verify the validity of the proposed methods are only conducted on Japanese languages; and 2) the deep learning technology is not been incorporated in the methods.Practical implicationsThe results help to elucidate the intrinsic characteristics of the Japanese language and the influence on sentiment analysis. These findings also provide practical usage guidelines within the field of sentiment analysis of Japanese online tourism reviews.Originality/valueOur research is of practicability. Currently, there are no studies that focus on the sentiment analysis of Japanese reviews about Chinese attractions.


Sign in / Sign up

Export Citation Format

Share Document