Analyzing Text Data for Opinion Mining

Opinion mining is extract subjective information from text data using tools such as NLP, text analysis etc. Automated opinion mining often uses machine learning, a type of artificial intelligence (AI), to mine text for sentiment. Opinion mining, which is also called sentiment analysis, involves building a system to collect and categorize opinions about a product.In this project the problem of sentiment analysis in twitter; that is classifying tweets according to the sentiment expressed in terms of positive, negative or neutral. Twitter is an online micro-blogging and social-networking platform which allows users to write short status updates of maximum length 140 characters. It is a rapidly expanding service with over 200 million registered users out of which 100 million are active users and half of them log on twitter on a daily basis - generating nearly 250 million tweets per day. Due to this large amount of usage we hope to achieve a reflection of public sentiment by analysing the sentiments expressed in the tweets. Analysing the public sentiment is important for many applications such as firms trying to find out the response of their products in the market, predicting political elections and predicting socioeconomic phenomena like stock exchange.

Download Full-text

Opinion mining from noisy text data

Proceedings of the second workshop on Analytics for noisy unstructured text data - AND '08 ◽

10.1145/1390749.1390763 ◽

2008 ◽

Cited By ~ 10

Author(s):

Lipika Dey ◽

S K Mirajul Haque

Keyword(s):

Opinion Mining ◽

Text Data ◽

Noisy Text

Download Full-text

Opinion mining on newspaper headlines using SVM and NLP

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i3.pp2152-2163 ◽

2019 ◽

Vol 9 (3) ◽

pp. 2152 ◽

Cited By ~ 1

Author(s):

Chaudhary Jashubhai Rameshbhai ◽

Joy Paulose

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Language Processing ◽

Opinion Mining ◽

Confusion Matrix ◽

Support Vector ◽

Text Data ◽

Mining Technique ◽

Svm Model ◽

Linear Svm

Opinion Mining also known as Sentiment Analysis, is a technique or procedure which uses Natural Language processing (NLP) to classify the outcome from text. There are various NLP tools available which are used for processing text data. Multiple research have been done in opinion mining for online blogs, Twitter, Facebook etc. This paper proposes a new opinion mining technique using Support Vector Machine (SVM) and NLP tools on newspaper headlines. Relative words are generated using Stanford CoreNLP, which is passed to SVM using count vectorizer. On comparing three models using confusion matrix, results indicate that Tf-idf and Linear SVM provides better accuracy for smaller dataset. While for larger dataset, SGD and linear SVM model outperform other models.

Download Full-text

Aspect Based Sentiment Analysis for E-Commerce Shopping Website

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39117 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1819-1822

Author(s):

Neha V. Thakare

Keyword(s):

Sentiment Analysis ◽

Opinion Mining ◽

Emotion Detection ◽

Research Knowledge ◽

Text Data ◽

Customer Expectations ◽

Customer Reviews ◽

Classification Feature ◽

On Line ◽

Area Unit

Abstract: Sentiment Analysis is that the most ordinarily used approach to research knowledge that is within the form of text and to identify sentiment content from the text. Opinion Mining is another name for sentiment analysis. a good vary of text data is getting generated within the form of suggestions, feedback, tweets, and comments. E-Commerce portals area unit generating tons of data. Every day within the form of customer reviews. Analyzing E-Commerce data can facilitate on-line retailers to grasp customer expectations, offer an improved searching expertise, and to extend sales. Sentiment Analysis can be used to identify positive, negative, and neutral information from the customer reviews. Researchers have developed a lot of techniques in Sentiment Analysis. Keywords: Sentiment analysis, Sentiment classification, Feature selection, Emotion detection, Customer Reviews;

Download Full-text

Sarcasm Detection in Text Data Using Glove Embedding

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37663 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2495-2499

Author(s):

Samrudhi Naik

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Processing Technique ◽

Text Data ◽

Natural Language Processing Technique

Abstract: Sarcasm is a way of expressing feelings in which people say or write something which is completely different or opposite to what they actually mean to say. Hence it is very difficult to identify sarcasm . It is usually an ironic or satirical remark tempered by humor. Mainly, people use it to say the opposite of what's true to make someone look or feel foolish. Understanding the sarcasm can improve the accuracy of sentiment analysis. Sentiment analysis (or opinion mining) is a natural language processing technique used to determine whether data is positive, negative or neutral. This helps in identifying what the opinions of users or individual or society are. In this project an attempt is made to develop a model to detect if a sentence is sarcastic or if it is not sarcastic. Keywords: Sarcasm detection, GloVe Embedding, LSTM, Natural Language Processing, Sentiment

Download Full-text

Opinion mining from noisy text data

International Journal on Document Analysis and Recognition (IJDAR) ◽

10.1007/s10032-009-0090-z ◽

2009 ◽

Vol 12 (3) ◽

pp. 205-226 ◽

Cited By ~ 49

Author(s):

Lipika Dey ◽

Sk. Mirajul Haque

Keyword(s):

Opinion Mining ◽

Text Data ◽

Noisy Text

Download Full-text

An Approach To Twitter Sentiment Analysis Over Hadoop

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.20110 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 374

Author(s):

Yazala Ritika Siril Paul ◽

Dilipkumar A. Borikar

Keyword(s):

Sentiment Analysis ◽

Opinion Mining ◽

Emotional State ◽

Streaming Data ◽

Data Streaming ◽

Text Data ◽

Twitter Data ◽

The People ◽

Data Platform ◽

Apache Hive

Sentiment analysis is the process of identifying people’s attitude and emotional state from the language they use via any social websites or other sources. The main aim is to identify a set of potential features in the review and extract the opinion expressions of those features by making full use of their associations. The Twitter has now become a routine for the people around the world to post thousands of reactions and opinions on every topic, every second of every single day. It’s like one big psychological database that’s constantly being updated and which can be used to analyze the sentiments of the people. Hadoop is one of the best options available for twitter data sentiment analysis and which also works for the distributed big data, streaming data, text data etc. This paper provides an efficient mechanism to perform sentiment analysis/ opinion mining on Twitter data over Hortonworks Data platform, which provides Hadoop on Windows, with the assistance of Apache Flume, Apache HDFS and Apache Hive.

Download Full-text

Sentiment classification of social media reviews using an ensemble classifier

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i1.pp355-363 ◽

2019 ◽

Vol 16 (1) ◽

pp. 355 ◽

Cited By ~ 1

Author(s):

Savita Sangam ◽

Subhash Shinde

Keyword(s):

Social Media ◽

Opinion Mining ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Support Vector ◽

Business Organizations ◽

Text Data ◽

Proposed Model ◽

Show Business ◽

Use Of Social Media

These days it has become a common practice for business organizations and individuals to make use of social media for sharing the opinions about the products or the services. Consumers are also ready to share their views on certain products or commodities. Thus huge amount of unstructured social media data gets generated day by day. Gradually heap of text data will be formed in many areas like automated business, education, health care, and show business and so on. Opinion mining also referred as sentiment analysis or sentiment classification, deals with mining of the review text and classifying the opinions or the sentiments of that text as positive or negative. In this paper we propose an ensemble classifier model consisting of Support Vector Machine and Artificial Neural Network. It combines the knowledge from two feature sets for sentiment classification. The proposed model shows the acceptable performance in terms of accuracy when compared with the baseline model.

Download Full-text

A Numerically Coded File of Operative Procedures Derived from a Free Text Data Collection System : A Measure of the Accuracy

Methods of Information in Medicine ◽

10.1055/s-0038-1635717 ◽

1976 ◽

Vol 15 (01) ◽

pp. 21-28 ◽

Cited By ~ 3

Author(s):

Carmen A. Scudiero ◽

Ruth L. Wong

Keyword(s):

Data Collection ◽

Pap Smear ◽

Operative Procedures ◽

Free Text ◽

Collection System ◽

Process Data ◽

Text Data ◽

Data Collection System ◽

History Of ◽

Correlation System

A free text data collection system has been developed at the University of Illinois utilizing single word, syntax free dictionary lookup to process data for retrieval. The source document for the system is the Surgical Pathology Request and Report form. To date 12,653 documents have been entered into the system.The free text data was used to create an IRS (Information Retrieval System) database. A program to interrogate this database has been developed to numerically coded operative procedures. A total of 16,519 procedures records were generated. One and nine tenths percent of the procedures could not be fitted into any procedures category; 6.1% could not be specifically coded, while 92% were coded into specific categories. A system of PL/1 programs has been developed to facilitate manual editing of these records, which can be performed in a reasonable length of time (1 week). This manual check reveals that these 92% were coded with precision = 0.931 and recall = 0.924. Correction of the readily correctable errors could improve these figures to precision = 0.977 and recall = 0.987. Syntax errors were relatively unimportant in the overall coding process, but did introduce significant error in some categories, such as when right-left-bilateral distinction was attempted.The coded file that has been constructed will be used as an input file to a gynecological disease/PAP smear correlation system. The outputs of this system will include retrospective information on the natural history of selected diseases and a patient log providing information to the clinician on patient follow-up.Thus a free text data collection system can be utilized to produce numerically coded files of reasonable accuracy. Further, these files can be used as a source of useful information both for the clinician and for the medical researcher.

Download Full-text

Diagnostics of professional competence of IT students based on digital footprint data

Informatics and Education ◽

10.32517/0234-0453-2020-35-4-4-11 ◽

2020 ◽

pp. 4-11

Author(s):

I. G. Zakharova ◽

Yu. V. Boganyuk ◽

M. S. Vorobyova ◽

E. A. Pavlova

Keyword(s):

Information Technology ◽

Educational Program ◽

Professional Competence ◽

Objective Data ◽

Text Data ◽

Data Set ◽

Job Requirements ◽

Digital Footprint ◽

Graduate Employment ◽

The University

The article goal is to demonstrate the possibilities of the approach to diagnosing the level of IT graduates’ professional competence, based on the analysis of the student’s digital footprint and the content of the corresponding educational program. We describe methods for extracting student professional level indicators from digital footprint text data — courses’ descriptions and graduation qualification works. We show methods of comparing these indicators with the formalized requirements of employers, reflected in the texts of vacancies in the field of information technology. The proposed approach was applied at the Institute of Mathematics and Computer Science of the University of Tyumen. We performed diagnostics using a data set that included texts of courses’ descriptions for IT areas of undergraduate studies, 542 graduation qualification works in these areas, 879 descriptions of job requirements and information on graduate employment. The presented approach allows us to evaluate the relevance of the educational program as a whole and the level of professional competence of each student based on objective data. The results were used to update the content of some major courses and to include new elective courses in the curriculum.

Download Full-text