scholarly journals Improving Accuracy of The Sentence-Level Lexicon-Based Sentiment Analysis Using Machine Learning

Author(s):  
Titya Eng ◽  
Md Rashed Ibn Nawab ◽  
Kazi Md Shahiduzzaman

Sentiment Analysis studies people's attitudes, opinions, evaluations, emotions, sentiments toward some entities such as products, topics, individuals, services, issues and classify them whether the opinion or evaluations inclines to that entities or not. It is getting more research focus in recent years due to its benefits for scientific and commercial purposes. This research aims at developing a better approach for sentiment analysis at the sentence level by using a combination of lexicon resources and a machine learning method. Moreover, as reviews data on the internet is unstructured and has much noise, this research uses different preprocessing techniques to clean the data before processing in different algorithms discussed in subsequent sections. Additionally, the lexicon building processes, how the lexicon is handled and combined with the machine learning algorithm for predicting sentiment is also discussed. In sentiment analysis, sentence's sentiment can be classified into three classes: positive sentiment, negative sentiment, or neutral. However, in this research work, we have excluded neutral sentiment for avoiding ambiguity and unnecessary complexity. The experiment results show that the proposed algorithm outperforms compared to the baseline machine learning algorithms. We have used four distinct datasets and different performance measures to check and validate the proposed method's robustness.

Author(s):  
P. Priyanga ◽  
N. C. Naveen

This article describes how healthcare organizations is growing increasingly and are the potential beneficiary users of the data that is generated and gathered. From hospitals to clinics, data and analytics can be a very powerful tool that can improve patient care and satisfaction with efficiency. In developing countries, cardiovascular diseases have a huge impact on increasing death rates and are expected by the end of 2020 in spite of the best clinical practices. The current Machine Learning (ml) algorithms are adapted to estimate the heart disease risks in middle aged patients. Hence, to predict the heart diseases a detailed analysis is made in this research work by taking into account the angiographic heart disease status (i.e. ≥ 50% diameter narrowing). Deep Neural Network (DNN), Extreme Learning Machine (elm), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) learning algorithm (with linear and polynomial kernel functions) are considered in this work. The accuracy and results of these algorithms are analyzed by comparing the effectiveness among them.


2019 ◽  
Vol 16 (10) ◽  
pp. 4425-4430 ◽  
Author(s):  
Devendra Prasad ◽  
Sandip Kumar Goyal ◽  
Avinash Sharma ◽  
Amit Bindal ◽  
Virendra Singh Kushwah

Machine Learning is a growing area in computer science in today’s era. This article is focusing on prediction analysis using K-Nearest Neighbors (KNN) Machine Learning algorithm. Data in the dataset are processed, analyzed and predicated using the specified algorithm. Introduction of various Machine Learning algorithms, its pros and cons have been discussed. The KNN algorithm with detail study is given and it is implemented on the specified data with certain parameters. The research work elucidates prediction analysis and explicates the prediction of quality of restaurants.


2020 ◽  
Vol 17 (9) ◽  
pp. 4294-4298
Author(s):  
B. R. Sunil Kumar ◽  
B. S. Siddhartha ◽  
S. N. Shwetha ◽  
K. Arpitha

This paper intends to use distinct machine learning algorithms and exploring its multi-features. The primary advantage of machine learning is, a machine learning algorithm can predict its work automatically by learning what to do with information. This paper reveals the concept of machine learning and its algorithms which can be used for different applications such as health care, sentiment analysis and many more. Sometimes the programmers will get confused which algorithm to apply for their applications. This paper provides an idea related to the algorithm used on the basis of how accurately it fits. Based on the collected data, one of the algorithms can be selected based upon its pros and cons. By considering the data set, the base model is developed, trained and tested. Then the trained model is ready for prediction and can be deployed on the basis of feasibility.


2020 ◽  
Vol 17 (7) ◽  
pp. 2869-2875
Author(s):  
Sajay Thomas Samuel ◽  
Booma Poolan Marikannan

Machine learning can help people to perform complex tasks and solve problems as it uses historical data to learn its pattern and make predictions based on the past data. This research addresses the problem about movie reviews on social media specifically Twitter; where it will gather the tweets on movie reviews and display a rating based on the sentiment of the tweet. Twitter is an online social media website where people from all walks of life communicate by tweeting short updates without exceeding the character limit which is 240 characters. Twitter is continuously growing as a business and became one of the biggest platform for communication and instant messaging. Due to the large number of users, there are voluminous amounts of data available that can be used for more in depth information and insights and to get the sentiments from analysing the tweets. In today’s world, there are many applications that are using sentiment analysis in various fields such as to gets insights about a particular brand or product. To do sentiment analysis using the traditional ways can be time consuming and becomes very complex. The aim of this research is to investigate about the domain of sentiment analysis and incorporate a machine learning algorithm to create a system that is able to get and display the ratings of a particular movie. The machine learning algorithms used are Naïve Bayes Classifier and SVM. The algorithm with better accuracy will be chosen for the implementation phase.


In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.


Author(s):  
Satwik P M and Dr. Meenatchi Sundram

In this Research article, we presented a new approach for predicting the flood through the advanced Machine learning Algorithm which is one among the Neural networks class that outperforms itself in best data operations and predictive analytics. This Research article discusses in detail about the prediction of flood occurrences evaluation process. We interpreted the Research with many algorithms that is existing, and the Research work have been dealing with different research works inculcated and compared with different Research approaches. On Comparing to the Previous Researches its observed that the Neural Turing networks have been performing the prediction of the rainfall and flood-based disasters for the consecutive year counts of 10,15 and 20 with 93.8% accuracy. Here the Research is analyzed with various parameters and Comparing it with the other researches which is implemented with other machine learning algorithms. Comparing with the previous researches the Idea of the research have been described and evaluated with the different evaluation parameters including the number of iterations or Epochs.


Ethiopia is the leading producer of chickpea in Africa and among the top ten most important producers of chickpea in the world. Debre Zeit Agriculture Research Center is a research center in Ethiopia which is mandated for the improvement of chickpea and other crops. Genome enabled prediction technologies trying to transform the classification of chickpea types and upgrading the existing identification paradigm.Current state of the identification of chickpea types in Ethiopia still sticks to a manual. Domain experts tried to recognize every chickpea type, the way and efficiency of identifying each chickpea types mainly depend on the skills and experience of experts in the domain area and this frequently causes error and sometimes inaccurate. Most of the classification and identification of crops researches were done outside Ethiopia; for local and emerging varieties, there is a need to design classification model that assists selection mechanisms of chickpea and even accuracy of an existing algorithm should be verified and optimized. The main aim of this study is to design chickpea type classification model using machine learning algorithm that classify chickpea types. This research work has a total of 8303 records with 8 features and 80% for training and 20% for testing were used. Data preprocessing were done to prepare the dataset for experiments. ANN, SVM and DT were used to build the model. For evaluating the performance of the model confusion matrix with Accuracy, Recall and Precision were used. The experimental results show that the best-performed algorithms were decision tree and achieve 97.5% accuracy. After the evaluation of results found in this research work, agriculture research centers and companies have benefited. The model of chickpea type classification will be applied in Debre Zeit agriculture research center in Ethiopia as a base to support the experts during chickpea type identification process. In addition it enables the expertise to save time, effort and cost with the support of the identification model. Moreover, this research can also be used as a corner stone in the area and will be referred by future researchers in the domain area.


2019 ◽  
Vol 10 (1) ◽  
pp. 38-62
Author(s):  
Megha Rathi ◽  
Vikas Pareek

Recent advances in mobile technology and machine learning together steer us to create a mobile-based healthcare app for recommending disease. In this study, the authors develop an android-based healthcare app which will detect all kinds of diseases in no time. The authors developed a novel, hybrid machine-learning algorithm in order to provide more accurate results. For the same purpose, the authors have combined two machine-learning algorithms, SVM and GA. The proposed algorithms will enhance the accuracy and at the same time reduce the complexity and count of attributes in the database. Analysis of algorithm is also done using statistical parameters like accuracy, confusion matrix, and roc-curve. The pivotal intent of this research work is to create an android-based healthcare app which will predict disease when provided with certain details. For a disease like cancer, for which a series of tests are required for confirmation, this app will quickly detect cancer and it is helpful to doctors as they can start the right course of treatment right away. Further, this app will also recommend a diet fitting the patient profile.


2014 ◽  
Vol 945-949 ◽  
pp. 3418-3423
Author(s):  
Lin Du ◽  
Wei Ran Xu ◽  
Ping Yang Liu

As sentence level sentiment analysis having been studied extensively, it has been proven that the syntactic structure of a sentence usually holds important information for sentiment analysis, especially for handling polarity reversal. However, the previous attempts of adopting such structural information mainly focus on making certain predefined rules which requires large linguistic expertise of the rule-maker,and the procedure itself is often manually labored and time consuming. To solve this problem, in this paper we propose a novel simple vector model to represent a sentence’s syntactic structure and its prior sentiment information uniformly and rapidly. Experiment results show that our proposed approach performs well in COAE 2013 dataset, and could also be used for machine learning algorithms to extract more distinguish features automatically.


Kerntechnik ◽  
2022 ◽  
Vol 0 (0) ◽  
Author(s):  
Hong Xu ◽  
Tao Tang ◽  
Baorui Zhang ◽  
Yuechan Liu

Abstract Opinion mining and sentiment analysis based on social media has been developed these years, especially with the popularity of social media and the development of machine learning. But in the community of nuclear engineering and technology, sentiment analysis is seldom studied, let alone the automatic analysis by using machine learning algorithms. This work concentrates on the public sentiment mining of nuclear energy in German-speaking countries based on the public comments of nuclear news in social media by using the automatic methodology, since compared with the news itself, the comments are closer to the public real opinions. The results showed that majority comments kept in neutral sentiment. 23% of comments were in positive tones, which were approximate 4 times those in negative tones. The concerning issues of the public are the innovative technology development, safety, nuclear waste, accidents and the cost of nuclear power. Decision tree, random forest and long short-term memory networks (LSTM) are adopted for the automatic sentiment analysis. The results show that all of the proposed methods can be applied in practice to some extent. But as a deep learning algorithm, LSTM gets the highest accuracy approximately 85.6% with also the best robustness of all.


Sign in / Sign up

Export Citation Format

Share Document