scholarly journals Multilingual Lexicon based Approach for Real-Time Sentiment Analysis

The information on WWW has mounted to a greater height, overriding to fledgling analysis in the direction of sentiments using Artificial Intelligence. Sentiment Analysis deals with the calculus exploration of sentiments, opinions and subjectivity. In this paper, multilingual tweets are analyzed for identifying the polarities of various political parties like AAP, BJP, Samajwadi, BSP and Congress; so that the users will get an idea that to which party they should give their vote. The data is being analyzed using Natural Language Processing. Using different smoothening techniques, noise is removed from data, classified by using Machine learning algorithms and then the accuracy of the system is gauged using various evaluation precision measures. The central premise of this research is to benignant common people and politicians both. For common people; is for deciding their precious vote, to which party to give will be good for themselves and nation too. For politicians; they will have an idea about themselves i.e. after seeking the polarities of different parties, the politicians will have an idea which party is preferable and which is not preferable, so that the politicians can work accordingly. The system shows comparison among VADER and SVM algorithm; and SVM algorithm showed 90% accuracy.

2021 ◽  
Vol 12 (04) ◽  
pp. 01-21
Author(s):  
Felipe Cujar-Rosero ◽  
David Santiago Pinchao Ortiz ◽  
Silvio Ricardo Timarán Pereira ◽  
Jimmy Mateo Guerrero Restrepo

This paper presents the final results of the research project that aimed for the construction of a tool which is aided by Artificial Intelligence through an Ontology with a model trained with Machine Learning, and is aided by Natural Language Processing to support the semantic search of research projects of the Research System of the University of Nariño. For the construction of NATURE, as this tool is called, a methodology was used that includes the following stages: appropriation of knowledge, installation and configuration of tools, libraries and technologies, collection, extraction and preparation of research projects, design and development of the tool. The main results of the work were three: a) the complete construction of the Ontology with classes, object properties (predicates), data properties (attributes) and individuals (instances) in Protegé, SPARQL queries with Apache Jena Fuseki and the respective coding with Owlready2 using Jupyter Notebook with Python within the virtual environment of anaconda; b) the successful training of the model for which Machine Learning algorithms were used and specifically Natural Language Processing algorithms such as: SpaCy, NLTK, Word2vec and Doc2vec, this was also performed in Jupyter Notebook with Python within the virtual environment of anaconda and with Elasticsearch; and c) the creation of NATURE by managing and unifying the queries for the Ontology and for the Machine Learning model. The tests showed that NATURE was successful in all the searches that were performed as its results were satisfactory.


Author(s):  
Kirti Jain

Sentiment analysis, also known as sentiment mining, is a submachine learning task where we want to determine the overall sentiment of a particular document. With machine learning and natural language processing (NLP), we can extract the information of a text and try to classify it as positive, neutral, or negative according to its polarity. In this project, We are trying to classify Twitter tweets into positive, negative, and neutral sentiments by building a model based on probabilities. Twitter is a blogging website where people can quickly and spontaneously share their feelings by sending tweets limited to 140 characters. Because of its use of Twitter, it is a perfect source of data to get the latest general opinion on anything.


Author(s):  
Evrenii Polyakov ◽  
Leonid Voskov ◽  
Pavel Abramov ◽  
Sergey Polyakov

Introduction: Sentiment analysis is a complex problem whose solution essentially depends on the context, field of study andamount of text data. Analysis of publications shows that the authors often do not use the full range of possible data transformationsand their combinations. Only a part of the transformations is used, limiting the ways to develop high-quality classification models.Purpose: Developing and exploring a generalized approach to building a model, which consists in sequentially passing throughthe stages of exploratory data analysis, obtaining a basic solution, vectorization, preprocessing, hyperparameter optimization, andmodeling. Results: Comparative experiments conducted using a generalized approach for classical machine learning and deeplearning algorithms in order to solve the problem of sentiment analysis of short text messages in natural language processinghave demonstrated that the classification quality grows from one stage to another. For classical algorithms, such an increasein quality was insignificant, but for deep learning, it was 8% on average at each stage. Additional studies have shown that theuse of automatic machine learning which uses classical classification algorithms is comparable in quality to manual modeldevelopment; however, it takes much longer. The use of transfer learning has a small but positive effect on the classificationquality. Practical relevance: The proposed sequential approach can significantly improve the quality of models under developmentin natural language processing problems.


Author(s):  
Rashida Ali ◽  
Ibrahim Rampurawala ◽  
Mayuri Wandhe ◽  
Ruchika Shrikhande ◽  
Arpita Bhatkar

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.


Author(s):  
Irene Li ◽  
Alexander R. Fabbri ◽  
Robert R. Tung ◽  
Dragomir R. Radev

Recent years have witnessed the rising popularity of Natural Language Processing (NLP) and related fields such as Artificial Intelligence (AI) and Machine Learning (ML). Many online courses and resources are available even for those without a strong background in the field. Often the student is curious about a specific topic but does not quite know where to begin studying. To answer the question of “what should one learn first,”we apply an embedding-based method to learn prerequisite relations for course concepts in the domain of NLP. We introduce LectureBank, a dataset containing 1,352 English lecture files collected from university courses which are each classified according to an existing taxonomy as well as 208 manually-labeled prerequisite relation topics, which is publicly available 1. The dataset will be useful for educational purposes such as lecture preparation and organization as well as applications such as reading list generation. Additionally, we experiment with neural graph-based networks and non-neural classifiers to learn these prerequisite relations from our dataset.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 218-239 ◽  
Author(s):  
Ravikumar Patel ◽  
Kalpdrum Passi

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.


Author(s):  
Anurag Langan

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.


JAMIA Open ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 139-149 ◽  
Author(s):  
Meijian Guan ◽  
Samuel Cho ◽  
Robin Petro ◽  
Wei Zhang ◽  
Boris Pasche ◽  
...  

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.


Author(s):  
Roy Rada

The techniques of artificial intelligence include knowledgebased, machine learning, and natural language processing techniques. The discipline of investing requires data identification, asset valuation, and risk management. Artificial intelligence techniques apply to many aspects of financial investing, and published work has shown an emphasis on the application of knowledge-based techniques for credit risk assessment and machine learning techniques for stock valuation. However, in the future, knowledge-based, machine learning, and natural language processing techniques will be integrated into systems that simultaneously address data identification, asset valuation, and risk management.


Sign in / Sign up

Export Citation Format

Share Document