Emotion Detection using Social Media Data

Abstract: Previous research on emotion recognition of Twitter users centered on the use of lexicons and basic classifiers on pack of words models, despite the recent accomplishments of deep learning in many disciplines of natural language processing. The study's main question is if deep learning can help them improve their performance. Because of the scant contextual information that most posts offer, emotion analysis is still difficult. The suggested method can capture more emotion sematic than existing models by projecting emoticons and words into emoticon space, which improves the performance of emotion analysis. In a microblog setting, this aids in the detection of subjectivity, polarity, and emotion. It accomplishes this by utilizing hash tags to create three large emotion-labeled data sets that can be compared to various emotional orders. Then compare the results of a few words and character-based repetitive and convolutional neural networks to the results of a pack of words and latent semantic indexing models. Furthermore, the specifics examine the transferability of the most recent hidden state representations across distinct emotional classes and whether it is possible to construct a unified model for predicting each of them using a common representation. It's been shown that repetitive neural systems, especially character-based ones, outperform pack-of-words and latent semantic indexing models. The semantics of the token must be considered while classifying the tweet emotion. The semantics of the tokens recorded in the hash map may be simply searched. Despite these models' low exchange capacities, the recently presented training heuristic produces a unity model with execution comparable to the three solo models. Keywords: Hashtags, Sentiment Analysis, Facial Recognition, Emotions.

Download Full-text

Enhanced Latent Semantic Indexing Using Cosine Similarity Measures for Medical Application

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/5/7 ◽

2020 ◽

Vol 17 (5) ◽

pp. 742-749

Author(s):

Fawaz Al-Anzi ◽

Dia AbuZeina

Keyword(s):

Language Processing ◽

Search Engines ◽

Dimensional Space ◽

Similarity Measures ◽

Medical Application ◽

Latent Semantic Indexing ◽

Arabic Language ◽

Cosine Similarity ◽

Semantic Indexing ◽

Cosine Similarity Measures

The Vector Space Model (VSM) is widely used in data mining and Information Retrieval (IR) systems as a common document representation model. However, there are some challenges to this technique such as high dimensional space and semantic looseness of the representation. Consequently, the Latent Semantic Indexing (LSI) was suggested to reduce the feature dimensions and to generate semantic rich features that can represent conceptual term-document associations. In fact, LSI has been effectively employed in search engines and many other Natural Language Processing (NLP) applications. Researchers thereby promote endless effort seeking for better performance. In this paper, we propose an innovative method that can be used in search engines to find better matched contents of the retrieving documents. The proposed method introduces a new extension for the LSI technique based on the cosine similarity measures. The performance evaluation was carried out using an Arabic language data collection that contains 800 medical related documents, with more than 47,222 unique words. The proposed method was assessed using a small testing set that contains five medical keywords. The results show that the performance of the proposed method is superior when compared to the standard LSI

Download Full-text

Investigation on Deep Learning Approach for Big Data

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Pattern Engineering System Development for Big Data Analytics ◽

10.4018/978-1-5225-3870-7.ch002 ◽

2018 ◽

pp. 25-38

Author(s):

Dharmendra Singh Rajput ◽

T. Sunil Kumar Reddy ◽

Dasari Naga Raju

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

Probabilistic Models ◽

Big Data Analytics ◽

Data Sets ◽

Semantic Indexing ◽

Learning Approaches ◽

Data Sampling ◽

Learning Mechanisms

In recent years, big data analytics is the major research area where the researchers are focused. Complex structures are trained at each level to simplify the data abstractions. Deep learning algorithms are one of the promising researches for automation of complex data extraction from large data sets. Deep learning mechanisms produce better results in machine learning, such as computer vision, improved classification modelling, probabilistic models of data samples, and invariant data sets. The challenges handled by the big data are fast information retrieval, semantic indexing, extracting complex patterns, and data tagging. Some investigations are concentrated on integration of deep learning approaches with big data analytics which pose some severe challenges like scalability, high dimensionality, data streaming, and distributed computing. Finally, the chapter concludes by posing some questions to develop the future work in semantic indexing, active learning, semi-supervised learning, domain adaptation modelling, data sampling, and data abstractions.

Download Full-text

Classification of Twitter Vaping Discourse Using BERTweet: Comparative Deep Learning Study (Preprint)

10.2196/preprints.33678 ◽

2021 ◽

Author(s):

Alycia Noel Carey ◽

William Baker ◽

Jason B. Colditz ◽

Huy Mai ◽

Shyam Visweswaran ◽

...

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Characteristic Curve ◽

Data Sets ◽

Learning Approaches ◽

Sensitive Data ◽

Twitter Data ◽

Traditional Natural

BACKGROUND Twitter provides a valuable platform for the surveillance and monitoring of public health topics; however, manually categorizing large quantities of Twitter data is labor intensive and presents barriers to identify major trends and sentiments. Additionally, while machine and deep learning approaches have been proposed with high accuracy, they require large, annotated data sets. Public pre-trained deep learning classification models, such as BERTweet, produce higher quality models while using smaller annotated training sets. OBJECTIVE This study aims to derive and evaluate a pre-trained deep learning model based on BERTweet that can identify tweets relevant to vaping, tweets (related to vaping) of commercial nature, and tweets with pro-vape sentiment. Additionally, the performance of the BERTweet classifier will be compared against a long short-term memory (LSTM) model to show the improvements a pre-trained model has over traditional deep learning approaches. METHODS Twitter data were collected from August – October 2019 using vaping related search terms. From this set, a random subsample of 2,401 English tweets was manually annotated for relevance (vaping related or not), commercial nature (commercial or not), and sentiment (positive, negative, neutral). Using the annotated data, three separate classifiers were built using BERTweet with the default parameters defined by the Simple Transformer API. Each model was trained for 20 iterations and evaluated with a random split of the annotate tweets, reserving 10% of tweets for evaluations. RESULTS The relevance, commercial, and sentiment classifiers achieved an area under the receiver operating characteristic curve (AUROC) of 94.5%, 99.3%, and 81.7%, respectively. Additionally, the weighted F1 scores of each were 97.6%, 99.0%, and 86.1%. We found that BERTweet outperformed the LSTM model in classification of all categories. CONCLUSIONS Large, open-source deep learning classifiers, such as BERTweet, can provide researchers the ability to reliably determine if tweets are relevant to vaping, include commercial content, and include positive, negative, or neutral content about vaping with a higher accuracy than traditional Natural Language Processing deep learning models. Such enhancement to the utilization of Twitter data can allow for faster exploration and dissemination of time-sensitive data than traditional methodologies (e.g., surveys, polling research).

Download Full-text

Investigation on Deep Learning Approach for Big Data

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch056 ◽

2020 ◽

pp. 1016-1029

Author(s):

Dharmendra Singh Rajput ◽

T. Sunil Kumar Reddy ◽

Dasari Naga Raju

Keyword(s):

Big Data ◽

Deep Learning ◽

Data Analytics ◽

Probabilistic Models ◽

Big Data Analytics ◽

Data Sets ◽

Semantic Indexing ◽

Learning Approaches ◽

Data Sampling ◽

Learning Mechanisms

Download Full-text

Adaptive Framework for Deep Learning based Dynamic and Temporal Topic Modeling from Big Data

Recent Patents on Engineering ◽

10.2174/1872212113666190329234812 ◽

2019 ◽

Vol 13 ◽

Cited By ~ 5

Author(s):

Ajeet Ram Pathak ◽

Manjusha Pandey ◽

Siddharth Rautaray

Keyword(s):

Big Data ◽

Deep Learning ◽

Sentiment Analysis ◽

Adaptive Learning ◽

Topic Modeling ◽

Latent Semantic Indexing ◽

Streaming Data ◽

Semantic Indexing ◽

Current Trends ◽

Adaptive Framework

Background: The large amount of data emanated from social media platforms need scalable topic modeling in order to get current trends and themes of events discussed on such platforms. Topic modeling play crucial role in many natural language processing applications like sentiment analysis, recommendation systems, event tracking, summarization, etc. Objectives: The aim of the proposed work is to adaptively extract the dynamically evolving topics over streaming data, and infer the current trends and get the notion of trend of topics over time. Because of various world level events, many uncorrelated streaming channels tend to start discussion on similar topics. We aim to find the effect of uncorrelated streaming channels on topic modeling when they tend to start discussion on similar topics. Method: An adaptive framework for dynamic and temporal topic modeling using deep learning has been put forth in this paper. The framework approximates online latent semantic indexing constrained by regularization on streaming data using adaptive learning method. The framework is designed using deep layers of feedforward neural network. Results: This framework supports dynamic and temporal topic modeling. The proposed approach is scalable to large collection of data. We have performed exploratory data analysis and correspondence analysis on real world Twitter dataset. Results state that our approach works well to extract topic topics associated with a given hashtag. Given the query, the approach is able to extract both implicit and explicit topics associated with the terms mentioned in the query. Conclusion: The proposed approach is a suitable solution for performing topic modeling over Big Data. We are approximating the Latent Semantic Indexing model with regularization using deep learning with differentiable ℓ1 regularization, which makes the model work on streaming data adaptively at real-time. The model also supports the extraction of aspects from sentences based on interrelation of topics and thus, supports aspect modeling in aspect-based sentiment analysis.

Download Full-text

An Improved Semidiscrete Matrix Decomposition and its Application in Chinese Information Retrieval

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.241-244.3121 ◽

2012 ◽

Vol 241-244 ◽

pp. 3121-3124 ◽

Cited By ~ 1

Author(s):

Yang Luo

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Matrix Decomposition ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Storage Space ◽

Important Direction ◽

The Difference

Information retrieval is an important direction in the area of natural language processing .This paper introduced semidiscrete matrix decomposition in latent semantic indexing. We aimed at it’s disadvantage in storage space and presented SSDD,then we compare the difference of SVD and SDD and SSDD in performance

Download Full-text

TRANSDUCTIVE LEARNING FOR SHORT-TEXT CLASSIFICATION PROBLEMS USING LATENT SEMANTIC INDEXING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405003971 ◽

2005 ◽

Vol 19 (02) ◽

pp. 143-163 ◽

Cited By ~ 20

Author(s):

SARAH ZELIKOVITZ ◽

FINELLA MARQUEZ

Keyword(s):

Text Classification ◽

Latent Semantic Indexing ◽

Training Data ◽

Data Sets ◽

Semantic Indexing ◽

Classification Problems ◽

Transductive Learning ◽

Short Text ◽

Series Of Experiments ◽

Value Decomposition

This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available test examples. We report the performance of LSI on data sets both with and without the inclusion of the test examples, and we show that tailoring the SVD process to the test examples can be even more useful than adding additional training data. This method can be especially useful to combat possible inclusion of unrelated data in the original corpus, and to compensate for limited amounts of data. Additionally, we evaluate the vocabulary of the training and test sets and present the results of a series of experiments to illustrate how the test set is used in an advantageous way.

Download Full-text

EDLm6APred: ensemble deep learning approach for mRNA m6A site prediction

BMC Bioinformatics ◽

10.1186/s12859-021-04206-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Lin Zhang ◽

Gangshen Li ◽

Xiuyu Li ◽

Honglei Wang ◽

Shutao Chen ◽

...

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Language Processing ◽

High Throughput Sequencing ◽

Prediction Models ◽

Receiver Operating Curve ◽

Data Sets ◽

Methylation Site ◽

Site Prediction ◽

A Site

Abstract Background As a common and abundant RNA methylation modification, N6-methyladenosine (m6A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m6A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m6A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. Results This paper mainly studies feature extraction and classification of m6A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m6A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m6A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m6A sites, an ensemble deep learning predictor (EDLm6APred) was finally constructed for m6A site prediction. Experimental results on human and mouse data sets show that EDLm6APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m6A site detection. Compared with the existing m6A methylation site prediction models without genomic features, EDLm6APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm6APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred. Conclusions Our proposed EDLm6APred method is a reliable predictor for m6A methylation sites.

Download Full-text

Adam Deep Learning With SOM for Human Sentiment Classification

International Journal of Ambient Computing and Intelligence ◽

10.4018/ijaci.2019070106 ◽

2019 ◽

Vol 10 (3) ◽

pp. 92-116 ◽

Cited By ~ 25

Author(s):

Md. Nawab Yousuf Ali ◽

Md. Golam Sarowar ◽

Md. Lizur Rahman ◽

Jyotismita Chaki ◽

Nilanjan Dey ◽

...

Keyword(s):

Deep Learning ◽

Social Network ◽

Language Processing ◽

Research Work ◽

Principal Component ◽

Data Sets ◽

Self Organizing Map ◽

Significant Information ◽

Social Network Data ◽

Traditional Natural

Nowadays, with the improvement in communication through social network services, a massive amount of data is being generated from user's perceptions, emotions, posts, comments, reactions, etc., and extracting significant information from those massive data, like sentiment, has become one of the complex and convoluted tasks. On other hand, traditional Natural Language Processing (NLP) approaches are less feasible to be applied and therefore, this research work proposes an approach by integrating unsupervised machine learning (Self-Organizing Map), dimensionality reduction (Principal Component Analysis) and computational classification (Adam Deep Learning) to overcome the problem. Moreover, for further clarification, a comparative study between various well known approaches and the proposed approach was conducted. The proposed approach was also used in different sizes of social network data sets to verify its superior efficient and feasibility, mainly in the case of Big Data. Overall, the experiments and their analysis suggest that the proposed approach is very promissing.

Download Full-text

Combining Deep Learning and Argumentative Reasoning for the Analysis of Social Media Textual Content Using Small Data Sets

Computational Linguistics ◽

10.1162/coli_a_00338 ◽

2018 ◽

Vol 44 (4) ◽

pp. 833-858 ◽

Cited By ~ 5

Author(s):

Oana Cocarascu ◽

Francesca Toni

Keyword(s):

Social Media ◽

Deep Learning ◽

Deception Detection ◽

Contextual Information ◽

Small Data ◽

Data Sets ◽

Supervised Classifiers ◽

Small Data Sets ◽

Use Of Social Media ◽

Textual Content

The use of social media has become a regular habit for many and has changed the way people interact with each other. In this article, we focus on analyzing whether news headlines support tweets and whether reviews are deceptive by analyzing the interaction or the influence that these texts have on the others, thus exploiting contextual information. Concretely, we define a deep learning method for relation–based argument mining to extract argumentative relations of attack and support. We then use this method for determining whether news articles support tweets, a useful task in fact-checking settings, where determining agreement toward a statement is a useful step toward determining its truthfulness. Furthermore, we use our method for extracting bipolar argumentation frameworks from reviews to help detect whether they are deceptive. We show experimentally that our method performs well in both settings. In particular, in the case of deception detection, our method contributes a novel argumentative feature that, when used in combination with other features in standard supervised classifiers, outperforms the latter even on small data sets.

Download Full-text