scholarly journals Malicious Text Identification: Deep Learning from Public Comments and Emails

Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 312 ◽  
Author(s):  
Asma Baccouche ◽  
Sadaf Ahmed ◽  
Daniel Sierra-Sosa ◽  
Adel Elmaghraby

Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.

2019 ◽  
Vol 20 (1) ◽  
pp. 129-139 ◽  
Author(s):  
Zahra Bokaee Nezhad ◽  
Mohammad Ali Deihimi

With increasing members in social media sites today, people tend to share their views about everything online. It is a convenient way to convey their messages to end users on a specific subject. Sentiment Analysis is a subfield of Natural Language Processing (NLP) that refers to the identification of users’ opinions toward specific topics. It is used in several fields such as marketing, customer services, etc. However, limited works have been done on Persian Sentiment Analysis. On the other hand, deep learning has recently become popular because of its successful role in several Natural Language Processing tasks. The objective of this paper is to propose a novel hybrid deep learning architecture for Persian Sentiment Analysis. According to the proposed model, local features are extracted by Convolutional Neural Networks (CNN) and long-term dependencies are learned by Long Short Term Memory (LSTM). Therefore, the model can harness both CNN's and LSTM's abilities. Furthermore, Word2vec is used for word representation as an unsupervised learning step. To the best of our knowledge, this is the first attempt where a hybrid deep learning model is used for Persian Sentiment Analysis. We evaluate the model on a Persian dataset that is introduced in this study. The experimental results show the effectiveness of the proposed model with an accuracy of 85%. ABSTRAK: Hari ini dengan ahli yang semakin meningkat di laman media sosial, orang cenderung untuk berkongsi pandangan mereka tentang segala-galanya dalam talian. Ini adalah cara mudah untuk menyampaikan mesej mereka kepada pengguna akhir mengenai subjek tertentu. Analisis Sentimen adalah subfield Pemprosesan Bahasa Semula Jadi yang merujuk kepada pengenalan pendapat pengguna ke arah topik tertentu. Ia digunakan dalam beberapa bidang seperti pemasaran, perkhidmatan pelanggan, dan sebagainya. Walau bagaimanapun, kerja-kerja terhad telah dilakukan ke atas Analisis Sentimen Parsi. Sebaliknya, pembelajaran mendalam baru menjadi popular kerana peranannya yang berjaya dalam beberapa tugas Pemprosesan Bahasa Asli (NLP). Objektif makalah ini adalah mencadangkan senibina pembelajaran hibrid yang baru dalam Analisis Sentimen Parsi. Menurut model yang dicadangkan, ciri-ciri tempatan ditangkap oleh Rangkaian Neural Convolutional (CNN) dan ketergantungan jangka panjang dipelajari oleh Long Short Term Memory (LSTM). Oleh itu, model boleh memanfaatkan kebolehan CNN dan LSTM. Selain itu, Word2vec digunakan untuk perwakilan perkataan sebagai langkah pembelajaran tanpa pengawasan. Untuk pengetahuan yang terbaik, ini adalah percubaan pertama di mana model pembelajaran mendalam hibrid digunakan untuk Analisis Sentimen Persia. Kami menilai model pada dataset Persia yang memperkenalkan dalam kajian ini. Keputusan eksperimen menunjukkan keberkesanan model yang dicadangkan dengan ketepatan 85%.


2021 ◽  
pp. 1-10
Author(s):  
Hye-Jeong Song ◽  
Tak-Sung Heo ◽  
Jong-Dae Kim ◽  
Chan-Young Park ◽  
Yu-Seop Kim

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.


2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Kazi Nabiul Alam ◽  
Md Shakib Khan ◽  
Abdur Rab Dhruba ◽  
Mohammad Monirujjaman Khan ◽  
Jehad F. Al-Amri ◽  
...  

The COVID-19 pandemic has had a devastating effect on many people, creating severe anxiety, fear, and complicated feelings or emotions. After the initiation of vaccinations against coronavirus, people’s feelings have become more diverse and complex. Our aim is to understand and unravel their sentiments in this research using deep learning techniques. Social media is currently the best way to express feelings and emotions, and with the help of Twitter, one can have a better idea of what is trending and going on in people’s minds. Our motivation for this research was to understand the diverse sentiments of people regarding the vaccination process. In this research, the timeline of the collected tweets was from December 21 to July21. The tweets contained information about the most common vaccines available recently from across the world. The sentiments of people regarding vaccines of all sorts were assessed using the natural language processing (NLP) tool, Valence Aware Dictionary for sEntiment Reasoner (VADER). Initializing the polarities of the obtained sentiments into three groups (positive, negative, and neutral) helped us visualize the overall scenario; our findings included 33.96% positive, 17.55% negative, and 48.49% neutral responses. In addition, we included our analysis of the timeline of the tweets in this research, as sentiments fluctuated over time. A recurrent neural network- (RNN-) oriented architecture, including long short-term memory (LSTM) and bidirectional LSTM (Bi-LSTM), was used to assess the performance of the predictive models, with LSTM achieving an accuracy of 90.59% and Bi-LSTM achieving 90.83%. Other performance metrics such as precision,, F1-score, and a confusion matrix were also used to validate our models and findings more effectively. This study improves understanding of the public’s opinion on COVID-19 vaccines and supports the aim of eradicating coronavirus from the world.


2018 ◽  
Vol 10 (11) ◽  
pp. 113 ◽  
Author(s):  
Yue Li ◽  
Xutao Wang ◽  
Pengjian Xu

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.


Energies ◽  
2020 ◽  
Vol 13 (15) ◽  
pp. 4017 ◽  
Author(s):  
Dukhwan Yu ◽  
Wonik Choi ◽  
Myoungsoo Kim ◽  
Ling Liu

The problem of Photovoltaic (PV) power generation forecasting is becoming crucial as the penetration level of Distributed Energy Resources (DERs) increases in microgrids and Virtual Power Plants (VPPs). In order to improve the stability of power systems, a fair amount of research has been proposed for increasing prediction performance in practical environments through statistical, machine learning, deep learning, and hybrid approaches. Despite these efforts, the problem of forecasting PV power generation remains to be challenging in power system operations since existing methods show limited accuracy and thus are not sufficiently practical enough to be widely deployed. Many existing methods using long historical data suffer from the long-term dependency problem and are not able to produce high prediction accuracy due to their failure to fully utilize all features of long sequence inputs. To address this problem, we propose a deep learning-based PV power generation forecasting model called Convolutional Self-Attention based Long Short-Term Memory (LSTM). By using the convolutional self-attention mechanism, we can significantly improve prediction accuracy by capturing the local context of the data and generating keys and queries that fit the local context. To validate the applicability of the proposed model, we conduct extensive experiments on both PV power generation forecasting using a real world dataset and power consumption forecasting. The experimental results of power generation forecasting using the real world datasets show that the MAPEs of the proposed model are much lower, in fact by 7.7%, 6%, 3.9% compared to the Deep Neural Network (DNN), LSTM and LSTM with the canonical self-attention, respectively. As for power consumption forecasting, the proposed model exhibits 32%, 17% and 44% lower Mean Absolute Percentage Error (MAPE) than the DNN, LSTM and LSTM with the canonical self-attention, respectively.


Author(s):  
Tao Gui ◽  
Qi Zhang ◽  
Lujun Zhao ◽  
Yaosong Lin ◽  
Minlong Peng ◽  
...  

In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.


Author(s):  
Satish Tirumalapudi

Abstract: Chat bots are software applications that help users to communicate with the machine and get the required result, this is where Natural Language Processing (NLP) comes into the picture. Natural language processing is based on deep learning that enables computers to acquire meaning from inputs given by the users. Natural language processing techniques can make possible the use of natural language to express ideas, thus drastically increasing accessibility. NLP engines rely on the elements of intent, utterance, entity, context, and session. Here in this project, we will be using Deep learning techniques which will be trained on the dataset which contains categories, patterns, and responses. Long Short-Term Memory (LSTM) is a Recurrent Neural Network that is capable of learning order dependence in sequence prediction problems. One of the most popular RNN approaches is LSTM to identify and control a dynamic system. We use an RNN to classify the category user’s message belongs to and then will give a response from the list of responses. Keywords: NLP – Natural Language Processing, LSTM – Long Short Term Memory, RNN – Recurrent Neural Networks.


Author(s):  
Yudi Widhiyasana ◽  
Transmissia Semiawan ◽  
Ilham Gibran Achmad Mudzakir ◽  
Muhammad Randi Noor

Klasifikasi teks saat ini telah menjadi sebuah bidang yang banyak diteliti, khususnya terkait Natural Language Processing (NLP). Terdapat banyak metode yang dapat dimanfaatkan untuk melakukan klasifikasi teks, salah satunya adalah metode deep learning. RNN, CNN, dan LSTM merupakan beberapa metode deep learning yang umum digunakan untuk mengklasifikasikan teks. Makalah ini bertujuan menganalisis penerapan kombinasi dua buah metode deep learning, yaitu CNN dan LSTM (C-LSTM). Kombinasi kedua metode tersebut dimanfaatkan untuk melakukan klasifikasi teks berita bahasa Indonesia. Data yang digunakan adalah teks berita bahasa Indonesia yang dikumpulkan dari portal-portal berita berbahasa Indonesia. Data yang dikumpulkan dikelompokkan menjadi tiga kategori berita berdasarkan lingkupnya, yaitu “Nasional”, “Internasional”, dan “Regional”. Dalam makalah ini dilakukan eksperimen pada tiga buah variabel penelitian, yaitu jumlah dokumen, ukuran batch, dan nilai learning rate dari C-LSTM yang dibangun. Hasil eksperimen menunjukkan bahwa nilai F1-score yang diperoleh dari hasil klasifikasi menggunakan metode C-LSTM adalah sebesar 93,27%. Nilai F1-score yang dihasilkan oleh metode C-LSTM lebih besar dibandingkan dengan CNN, dengan nilai 89,85%, dan LSTM, dengan nilai 90,87%. Dengan demikian, dapat disimpulkan bahwa kombinasi dua metode deep learning, yaitu CNN dan LSTM (C-LSTM),memiliki kinerja yang lebih baik dibandingkan dengan CNN dan LSTM.


2021 ◽  
Vol 4 (1) ◽  
pp. 121-128
Author(s):  
A Iorliam ◽  
S Agber ◽  
MP Dzungwe ◽  
DK Kwaghtyo ◽  
S Bum

Social media provides opportunities for individuals to anonymously communicate and express hateful feelings and opinions at the comfort of their rooms. This anonymity has become a shield for many individuals or groups who use social media to express deep hatred for other individuals or groups, tribes or race, religion, gender, as well as belief systems. In this study, a comparative analysis is performed using Long Short-Term Memory and Convolutional Neural Network deep learning techniques for Hate Speech classification. This analysis demonstrates that the Long Short-Term Memory classifier achieved an accuracy of 92.47%, while the Convolutional Neural Network classifier achieved an accuracy of 92.74%. These results showed that deep learning techniques can effectively classify hate speech from normal speech.


2019 ◽  
Vol 19 (01) ◽  
pp. 1940005 ◽  
Author(s):  
ULAS BARAN BALOGLU ◽  
ÖZAL YILDIRIM

Background and objective: Deep learning structures have recently achieved remarkable success in the field of machine learning. Convolutional neural networks (CNN) in image processing and long-short term memory (LSTM) in the time-series analysis are commonly used deep learning algorithms. Healthcare applications of deep learning algorithms provide important contributions for computer-aided diagnosis research. In this study, convolutional long-short term memory (CLSTM) network was used for automatic classification of EEG signals and automatic seizure detection. Methods: A new nine-layer deep network model consisting of convolutional and LSTM layers was designed. The signals processed in the convolutional layers were given as an input to the LSTM network whose outputs were processed in densely connected neural network layers. The EEG data is appropriate for a model having 1-D convolution layers. A bidirectional model was employed in the LSTM layer. Results: Bonn University EEG database with five different datasets was used for experimental studies. In this database, each dataset contains 23.6[Formula: see text]s duration 100 single channel EEG segments which consist of 4097 dimensional samples (173.61[Formula: see text]Hz). Eight two-class and three three-class clinical scenarios were examined. When the experimental results were evaluated, it was seen that the proposed model had high accuracy on both binary and ternary classification tasks. Conclusions: The proposed end-to-end learning structure showed a good performance without using any hand-crafted feature extraction or shallow classifiers to detect the seizures. The model does not require filtering, and also automatically learns to filter the input as well. As a result, the proposed model can process long duration EEG signals without applying segmentation, and can detect epileptic seizures automatically by using the correlation of ictal and interictal signals of raw data.


Sign in / Sign up

Export Citation Format

Share Document