Detecting Cyber Threat Event from Twitter Using IDCNN and BiLSTM

In the context of increasing cyber threats and attacks, monitoring and analyzing network security incidents in a timely and effective way is the key to ensuring network infrastructure security. As one of the world’s most popular social media sites, users post all kinds of messages on Twitter, from daily life to global news and political strategy. It can aggregate a large number of network security-related events promptly and provide a source of information flow about cyber threats. In this paper, for detecting cyber threat events on Twitter, we present a multi-task learning approach based on the natural language processing technology and machine learning algorithm of the Iterated Dilated Convolutional Neural Network (IDCNN) and Bidirectional Long Short-Term Memory (BiLSTM) to establish a highly accurate network model. Furthermore, we collect a network threat-related Twitter database from the public datasets to verify our model’s performance. The results show that the proposed model works well to detect cyber threat events from tweets and significantly outperform several baselines.

Download Full-text

Adaptive particle swarm optimization algorithm based long short-term memory networks for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201644 ◽

2021 ◽

pp. 1-17

Author(s):

J. Shobana ◽

M. Murali

Keyword(s):

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Contextual Information ◽

Particle Swarm ◽

Pso Algorithm ◽

Swarm Optimization ◽

Adaptive Particle Swarm Optimization ◽

Proposed Model

Text Sentiment analysis is the process of predicting whether a segment of text has opinionated or objective content and analyzing the polarity of the text’s sentiment. Understanding the needs and behavior of the target customer plays a vital role in the success of the business so the sentiment analysis process would help the marketer to improve the quality of the product as well as a shopper to buy the correct product. Due to its automatic learning capability, deep learning is the current research interest in Natural language processing. Skip-gram architecture is used in the proposed model for better extraction of the semantic relationships as well as contextual information of words. However, the main contribution of this work is Adaptive Particle Swarm Optimization (APSO) algorithm based LSTM for sentiment analysis. LSTM is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are enhanced by presenting the Adaptive PSO algorithm. Opposition based learning (OBL) method combined with PSO algorithm becomes the Adaptive Particle Swarm Optimization (APSO) classifier which assists LSTM in selecting optimal weight for the environment in less number of iterations. So APSO - LSTM ‘s ability in adjusting the attributes such as optimal weights and learning rates combined with the good hyper parameter choices leads to improved accuracy and reduces losses. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models.

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

An Ensemble Hybrid forecasting Model for Annual Runoff Based on Sample Entropy, Secondary Decomposition, and Long Short-Term Memory Neural Network

10.21203/rs.3.rs-269127/v1 ◽

2021 ◽

Author(s):

Wenchuan Wang ◽

Yu-jin Du ◽

Kwok-wing Chau ◽

Dong-mei Xu ◽

Chang-jun Liu ◽

...

Keyword(s):

Water Resources ◽

Short Term Memory ◽

Learning Algorithm ◽

Sample Entropy ◽

Annual Runoff ◽

Intrinsic Mode Functions ◽

Proposed Model ◽

Runoff Series ◽

Runoff Prediction ◽

Quantitative Indexes

Abstract Accurate and consistent annual runoff prediction in regions is a hot topic in the management, optimization, and monitoring of water resources. A novel prediction model (ESMD-SE-WPD-LSTM) is presented in this study. Firstly, the extreme-point symmetric mode decomposition (ESMD) is used to produce several intrinsic mode functions (IMF) and a residual (Res) by decomposing the original runoff series. Secondly, the sample entropy (SE) method is employed to measure the complexity of each IMF. Thirdly, we adopt wavelet packet decomposition (WPD) to further decompose the IMF with the maximum SE into several appropriate components and detailed components. Then the LSTM model, a deep learning algorithm based recurrent approach, is employed to predict all components obtained in the previous step. Finally, the forecasting results of all components are aggregated to generate the final prediction. The proposed model, which is applied to five annual series from different areas in China, is evaluated based on four quantitative indexes (R, NSEC, MAPE and RMSE). The results indicate that the ESMD-SE-WPD-LSTM outperforms other benchmark models in terms of four quantitative indexes. Hence the proposed model can provide higher accuracy and consistency for annual runoff prediction, making it an efficient instrument for scientific management and planning of water resources.

Download Full-text

Long Short-Term Memory with Dynamic Skip Connections

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016481 ◽

2019 ◽

Vol 33 ◽

pp. 6481-6488 ◽

Cited By ~ 3

Author(s):

Tao Gui ◽

Qi Zhang ◽

Lujun Zhao ◽

Yaosong Lin ◽

Minlong Peng ◽

...

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Training Data ◽

Sequential Data ◽

Short Term ◽

Term Memory ◽

Transition Functions ◽

Proposed Model ◽

Long Short Term Memory

In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.

Download Full-text

A Tweet Sentiment Classification Approach Using a Hybrid Stacked Ensemble Technique

Information ◽

10.3390/info12090374 ◽

2021 ◽

Vol 12 (9) ◽

pp. 374

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

State Of The Art ◽

Accuracy Score ◽

Learning Models ◽

Proposed Model

With the extensive availability of social media platforms, Twitter has become a significant tool for the acquisition of peoples’ views, opinions, attitudes, and emotions towards certain entities. Within this frame of reference, sentiment analysis of tweets has become one of the most fascinating research areas in the field of natural language processing. A variety of techniques have been devised for sentiment analysis, but there is still room for improvement where the accuracy and efficacy of the system are concerned. This study proposes a novel approach that exploits the advantages of the lexical dictionary, machine learning, and deep learning classifiers. We classified the tweets based on the sentiments extracted by TextBlob using a stacked ensemble of three long short-term memory (LSTM) as base classifiers and logistic regression (LR) as a meta classifier. The proposed model proved to be effective and time-saving since it does not require feature extraction, as LSTM extracts features without any human intervention. We also compared our proposed approach with conventional machine learning models such as logistic regression, AdaBoost, and random forest. We also included state-of-the-art deep learning models in comparison with the proposed model. Experiments were conducted on the sentiment140 dataset and were evaluated in terms of accuracy, precision, recall, and F1 Score. Empirical results showed that our proposed approach manifested state-of-the-art results by achieving an accuracy score of 99%.

Download Full-text

Deep Learning based Semantic Similarity Detection using Text Data

Information Technology And Control ◽

10.5755/j01.itc.49.4.27118 ◽

2020 ◽

Vol 49 (4) ◽

pp. 495-510

Author(s):

Muhammad Mansoor ◽

Zahoor ur Rehman ◽

Muhammad Shaheen ◽

Muhammad Attique Khan ◽

Mohamed Habib

Keyword(s):

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Main Task ◽

Detection Algorithms ◽

Similarity Detection ◽

Novel Approach ◽

Proposed Model ◽

Memory Network ◽

Numeric Data

Similarity detection in the text is the main task for a number of Natural Language Processing (NLP) applications. As textual data is comparatively large in quantity and huge in volume than the numeric data, therefore measuring textual similarity is one of the important problems. Most of the similarity detection algorithms are based upon word to word matching, sentence/paragraph matching, and matching of the whole document. In this research, a novel approach is proposed using deep learning models, combining Long Short Term Memory network (LSTM) with Convolutional Neural Network (CNN) for measuring semantics similarity between two questions. The proposed model takes sentence pairs as input to measure the similarity between them. The model is tested on publicly available Quora’s dataset. The model in comparison to the existing techniques gave 87.50 % accuracy which is better than the previous approaches.

Download Full-text

Malicious Text Identification: Deep Learning from Public Comments and Emails

Information ◽

10.3390/info11060312 ◽

2020 ◽

Vol 11 (6) ◽

pp. 312 ◽

Cited By ~ 1

Author(s):

Asma Baccouche ◽

Sadaf Ahmed ◽

Daniel Sierra-Sosa ◽

Adel Elmaghraby

Keyword(s):

Social Media ◽

Deep Learning ◽

Language Processing ◽

Short Term Memory ◽

Good Alternative ◽

Classification Problems ◽

Short Term ◽

Independent Dataset ◽

Proposed Model ◽

Long Short Term Memory

Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset.

Download Full-text

An efficient sentiment analysis methodology based on long short-term memory networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00436-4 ◽

2021 ◽

Author(s):

J. Shobana ◽

M. Murali

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Contextual Information ◽

Short Term ◽

Good Decision ◽

Term Memory ◽

Proposed Model ◽

Long Short Term Memory ◽

Current Research Interest

AbstractSentiment analysis is the process of determining the sentiment polarity (positivity, neutrality or negativity) of the text. As online markets have become more popular over the past decades, online retailers and merchants are asking their buyers to share their opinions about the products they have purchased. As a result, millions of reviews are generated daily, making it difficult to make a good decision about whether a consumer should buy a product. Analyzing these enormous concepts is difficult and time-consuming for product manufacturers. Deep learning is the current research interest in Natural language processing. In the proposed model, Skip-gram architecture is used for better feature extraction of semantic and contextual information of words. LSTM (long short-term memory) is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are optimized by the adaptive particle Swarm Optimization algorithm. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models in different metrics.

Download Full-text

Automatic Word Spacing of Korean Using Syllable and Morpheme

Applied Sciences ◽

10.3390/app11020626 ◽

2021 ◽

Vol 11 (2) ◽

pp. 626

Author(s):

Jeong-Myeong Choi ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Sequence Information ◽

Morphological Pattern ◽

Word Level ◽

Proposed Model ◽

Correction Problem ◽

Long Short Term Memory ◽

N Gram ◽

Pattern Information

In Korean, spacing is very important to understand the readability and context of sentences. In addition, in the case of natural language processing for Korean, if a sentence with an incorrect spacing is used, the structure of the sentence is changed, which affects performance. In the previous study, spacing errors were corrected using n-gram based statistical methods and morphological analyzers, and recently many studies using deep learning have been conducted. In this study, we try to solve the spacing error correction problem using both the syllable-level and morpheme-level. The proposed model uses a structure that combines the convolutional neural network layer that can learn syllable and morphological pattern information in sentences and the bidirectional long short-term memory layer that can learn forward and backward sequence information. When evaluating the performance of the proposed model, the accuracy was evaluated at the syllable-level, and also precision, recall, and f1 score were evaluated at the word-level. As a result of the experiment, it was confirmed that performance was improved from the previous study.

Download Full-text

Automatic sentiment analysis of public opinion on nuclear energy

Kerntechnik ◽

10.1515/kern-2021-0034 ◽

2022 ◽

Vol 0 (0) ◽

Author(s):

Hong Xu ◽

Tao Tang ◽

Baorui Zhang ◽

Yuechan Liu

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Nuclear Energy ◽

Opinion Mining ◽

Short Term Memory ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Innovative Technology ◽

The Public

Abstract Opinion mining and sentiment analysis based on social media has been developed these years, especially with the popularity of social media and the development of machine learning. But in the community of nuclear engineering and technology, sentiment analysis is seldom studied, let alone the automatic analysis by using machine learning algorithms. This work concentrates on the public sentiment mining of nuclear energy in German-speaking countries based on the public comments of nuclear news in social media by using the automatic methodology, since compared with the news itself, the comments are closer to the public real opinions. The results showed that majority comments kept in neutral sentiment. 23% of comments were in positive tones, which were approximate 4 times those in negative tones. The concerning issues of the public are the innovative technology development, safety, nuclear waste, accidents and the cost of nuclear power. Decision tree, random forest and long short-term memory networks (LSTM) are adopted for the automatic sentiment analysis. The results show that all of the proposed methods can be applied in practice to some extent. But as a deep learning algorithm, LSTM gets the highest accuracy approximately 85.6% with also the best robustness of all.

Download Full-text