scholarly journals Deep Learning for emotion analysis in Arabic tweets

Author(s):  
ENAS ABDEL HAKIM KHALIL ◽  
Enas .M.F. El Houby ◽  
Hoda .k. Mohamed

Abstract Expressing our emotions using text and emojis expressions became widespread through social media such as Facebook, Instagram, Twitter, Weibo, and LinkedIn. Nowadays, both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. We proposed a model for multilabel emotion classification, using a bidirectional Long Short-term Memory BiLSTM deep network. It is evaluated on the Arabic tweets' dataset provided by SemEval 2018 for the E-c task. Several preprocessing steps, including ARLSTEM with some modifications, replacing emojis with corresponding text meaning from a manually built lexicon, and feature vector representation using Aravec word embedding is applied. The novelty in our research that it examines the effect of hyperparameter tuning on model performance, and it uses BiLSTM in all of its deep neural network layers. The proposed model achieves a comparable performance with state-of-the-art models using different machine learning and deep learning techniques. The system achieves about 9% enhancement in validation accuracy compared with the last best model in the same task using Support Vector classifier SVC; it outperforms the other deep neural networks (UNCCTeam) based on fully connected layers in micro F1 metric of about 4.4%.

Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 4 ◽  
Author(s):  
Jurgita Kapočiūtė-Dzikienė ◽  
Robertas Damaševičius ◽  
Marcin Woźniak

We describe the sentiment analysis experiments that were performed on the Lithuanian Internet comment dataset using traditional machine learning (Naïve Bayes Multinomial—NBM and Support Vector Machine—SVM) and deep learning (Long Short-Term Memory—LSTM and Convolutional Neural Network—CNN) approaches. The traditional machine learning techniques were used with the features based on the lexical, morphological, and character information. The deep learning approaches were applied on the top of two types of word embeddings (Vord2Vec continuous bag-of-words with negative sampling and FastText). Both traditional and deep learning approaches had to solve the positive/negative/neutral sentiment classification task on the balanced and full dataset versions. The best deep learning results (reaching 0.706 of accuracy) were achieved on the full dataset with CNN applied on top of the FastText embeddings, replaced emoticons, and eliminated diacritics. The traditional machine learning approaches demonstrated the best performance (0.735 of accuracy) on the full dataset with the NBM method, replaced emoticons, restored diacritics, and lemma unigrams as features. Although traditional machine learning approaches were superior when compared to the deep learning methods; deep learning demonstrated good results when applied on the small datasets.


2021 ◽  
Vol 4 (1) ◽  
pp. 121-128
Author(s):  
A Iorliam ◽  
S Agber ◽  
MP Dzungwe ◽  
DK Kwaghtyo ◽  
S Bum

Social media provides opportunities for individuals to anonymously communicate and express hateful feelings and opinions at the comfort of their rooms. This anonymity has become a shield for many individuals or groups who use social media to express deep hatred for other individuals or groups, tribes or race, religion, gender, as well as belief systems. In this study, a comparative analysis is performed using Long Short-Term Memory and Convolutional Neural Network deep learning techniques for Hate Speech classification. This analysis demonstrates that the Long Short-Term Memory classifier achieved an accuracy of 92.47%, while the Convolutional Neural Network classifier achieved an accuracy of 92.74%. These results showed that deep learning techniques can effectively classify hate speech from normal speech.


2020 ◽  
Vol 9 (2) ◽  
pp. 1049-1054

In this paper, we have tried to predict flight delays using different machine learning and deep learning techniques. By using such a model it can be easier to predict whether the flight will be delayed or not. Factors like ‘WeatherDelay’, ‘NASDelay’, ‘Destination’, ‘Origin’ play a vital role in this model. Using machine learning algorithms like Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN), the f1-score, precision, recall, support and accuracy have been predicted. To add to the model, Long Short-Term Memory (LSTM) RNN architecture has also been employed. In the paper, the dataset from Bureau of Transportation Statistics (BTS) of the ‘Pittsburgh’ is being used. The results computed from the above mentioned algorithms have been compared. Further, the results were visualized for various airlines to find maximum delay and AUC-ROC curve has been plotted for Random Forest Algorithm. The aim of our research work is to predict the delay so as to minimize loses and increase customer satisfaction.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Enas A. Hakim Khalil ◽  
Enas M. F. El Houby ◽  
Hoda Korashy Mohamed

AbstractCurrently, expressing feelings through social media requires great consideration as an essential part of our lives; besides sharing ideas and thoughts, we share moments and good memories. Social media such as Facebook, Twitter, Weibo, and LinkedIn, are considered rich sources of opinionated text data. Both organizations and individuals are interested in using social media to analyze people's opinions and extract sentiments and emotions. Most studies on social media analysis mainly classified sentiment as positive, negative, or neutral classes. The challenge in emotion analysis arises because humans can express one or several emotions within one expression. Human beings can recognize these different emotions well; however, it is still not easy for an emotion analysis system. In most cases, the Arabic language used through social media is of a slangy or colloquial form, making it more challenging to preprocess and filter noise since most lemmatization and stemming tools are built on Modern Standard Arabic (MSA). An emotion analysis model has been implemented to categorize emotions. The model is a multiclass and multilabel classification problem. However, few studies have been adapted for this emotion classification problem in Arabic social media. Nearly the only work is the one of SemEval 2018 task1- sub-task E-c. Several machine learning approaches have been implemented in this task; a few studies were based on deep learning. Our model implemented a novel multilayer bidirectional long short term memory (BiLSTM) trained on top of pre-trained word embedding vectors. The model achieved state-of-the-art performance enhancement. This approach has been compared with other models developed in the same tasks using Support Vector Machines (SVM), random forest (RF), and fully connected neural networks. The proposed model achieved a performance improvement over the best results obtained for this task.


Author(s):  
Quyen G. To ◽  
Kien G. To ◽  
Van-Anh N. Huynh ◽  
Nhung TQ Nguyen ◽  
Diep TN Ngo ◽  
...  

Anti-vaccination attitudes have been an issue since the development of the first vaccines. The increasing use of social media as a source of health information may contribute to vaccine hesitancy due to anti-vaccination content widely available on social media, including Twitter. Being able to identify anti-vaccination tweets could provide useful information for formulating strategies to reduce anti-vaccination sentiments among different groups. This study aims to evaluate the performance of different natural language processing models to identify anti-vaccination tweets that were published during the COVID-19 pandemic. We compared the performance of the bidirectional encoder representations from transformers (BERT) and the bidirectional long short-term memory networks with pre-trained GLoVe embeddings (Bi-LSTM) with classic machine learning methods including support vector machine (SVM) and naïve Bayes (NB). The results show that performance on the test set of the BERT model was: accuracy = 91.6%, precision = 93.4%, recall = 97.6%, F1 score = 95.5%, and AUC = 84.7%. Bi-LSTM model performance showed: accuracy = 89.8%, precision = 44.0%, recall = 47.2%, F1 score = 45.5%, and AUC = 85.8%. SVM with linear kernel performed at: accuracy = 92.3%, Precision = 19.5%, Recall = 78.6%, F1 score = 31.2%, and AUC = 85.6%. Complement NB demonstrated: accuracy = 88.8%, precision = 23.0%, recall = 32.8%, F1 score = 27.1%, and AUC = 62.7%. In conclusion, the BERT models outperformed the Bi-LSTM, SVM, and NB models in this task. Moreover, the BERT model achieved excellent performance and can be used to identify anti-vaccination tweets in future studies.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


2021 ◽  
Vol 186 (Supplement_1) ◽  
pp. 445-451
Author(s):  
Yifei Sun ◽  
Navid Rashedi ◽  
Vikrant Vaze ◽  
Parikshit Shah ◽  
Ryan Halter ◽  
...  

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.


Author(s):  
Lu Gao ◽  
Yao Yu ◽  
Yi Hao Ren ◽  
Pan Lu

Pavement maintenance and rehabilitation (M&R) records are important as they provide documentation that M&R treatment is being performed and completed appropriately. Moreover, the development of pavement performance models relies heavily on the quality of the condition data collected and on the M&R records. However, the history of pavement M&R activities is often missing or unavailable to highway agencies for many reasons. Without accurate M&R records, it is difficult to determine if a condition change between two consecutive inspections is the result of M&R intervention, deterioration, or measurement errors. In this paper, we employed deep-learning networks of a convolutional neural network (CNN) model, a long short-term memory (LSTM) model, and a CNN-LSTM combination model to automatically detect if an M&R treatment was applied to a pavement section during a given time period. Unlike conventional analysis methods so far followed, deep-learning techniques do not require any feature extraction. The maximum accuracy obtained for test data is 87.5% using CNN-LSTM.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Jun Meng ◽  
Qiang Kang ◽  
Zheng Chang ◽  
Yushi Luan

Abstract Background Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. Results To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). Conclusions PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 495
Author(s):  
Imayanmosha Wahlang ◽  
Arnab Kumar Maji ◽  
Goutam Saha ◽  
Prasun Chakrabarti ◽  
Michal Jasinski ◽  
...  

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.


Sign in / Sign up

Export Citation Format

Share Document