scholarly journals A two-phase plagiarism detection system based on multi-layer long short-term memory networks

Author(s):  
Nguyen Van Son ◽  
Le Thanh Huong ◽  
Nguyen Chi Thanh

Finding plagiarism strings between two given documents are the main task of the plagiarism detection problem. Traditional approaches based on string matching are not very useful in cases of similar semantic plagiarism. Deep learning approaches solve this problem by measuring the semantic similarity between pairs of sentences. However, these approaches still face the following challenging points. First, it is impossible to solve cases where only part of a sentence belongs to a plagiarism passage. Second, measuring the sentential similarity without considering the context of surrounding sentences leads to decreasing in accuracy. To solve the above problems, this paper proposes a two-phase plagiarism detection system based on multi-layer long short-term memory network model and feature extraction technique: (i) a passage-phase to recognize plagiarism passages, and (ii) a word-phase to determine the exact plagiarism strings. Our experiment results on PAN 2014 corpus reached 94.26% F-measure, higher than existing research in this field.

2020 ◽  
Vol 26 (11) ◽  
pp. 1422-1434
Author(s):  
Vibekananda Dutta ◽  
Michał Choraś ◽  
Marek Pawlicki ◽  
Rafał Kozik

Artificial Intelligence plays a significant role in building effective cybersecurity tools. Security has a crucial role in the modern digital world and has become an essential area of research. Network Intrusion Detection Systems (NIDS) are among the first security systems that encounter network attacks and facilitate attack detection to protect a network. Contemporary machine learning approaches, like novel neural network architectures, are succeeding in network intrusion detection. This paper tests modern machine learning approaches on a novel cybersecurity benchmark IoT dataset. Among other algorithms, Deep AutoEncoder (DAE) and modified Long Short Term Memory (mLSTM) are employed to detect network anomalies in the IoT-23 dataset. The DAE is employed for dimensionality reduction and a host of ML methods, including Deep Neural Networks and Long Short-Term Memory to classify the outputs of into normal/malicious. The applied method is validated on the IoT-23 dataset. Furthermore, the results of the analysis in terms of evaluation matrices are discussed.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Hyejin Cho ◽  
Hyunju Lee

Abstract Background In biomedical text mining, named entity recognition (NER) is an important task used to extract information from biomedical articles. Previously proposed methods for NER are dictionary- or rule-based methods and machine learning approaches. However, these traditional approaches are heavily reliant on large-scale dictionaries, target-specific rules, or well-constructed corpora. These methods to NER have been superseded by the deep learning-based approach that is independent of hand-crafted features. However, although such methods of NER employ additional conditional random fields (CRF) to capture important correlations between neighboring labels, they often do not incorporate all the contextual information from text into the deep learning layers. Results We propose herein an NER system for biomedical entities by incorporating n-grams with bi-directional long short-term memory (BiLSTM) and CRF; this system is referred to as a contextual long short-term memory networks with CRF (CLSTM). We assess the CLSTM model on three corpora: the disease corpus of the National Center for Biotechnology Information (NCBI), the BioCreative II Gene Mention corpus (GM), and the BioCreative V Chemical Disease Relation corpus (CDR). Our framework was compared with several deep learning approaches, such as BiLSTM, BiLSTM with CRF, GRAM-CNN, and BERT. On the NCBI corpus, our model recorded an F-score of 85.68% for the NER of diseases, showing an improvement of 1.50% over previous methods. Moreover, although BERT used transfer learning by incorporating more than 2.5 billion words, our system showed similar performance with BERT with an F-scores of 81.44% for gene NER on the GM corpus and a outperformed F-score of 86.44% for the NER of chemicals and diseases on the CDR corpus. We conclude that our method significantly improves performance on biomedical NER tasks. Conclusion The proposed approach is robust in recognizing biological entities in text.


2018 ◽  
Vol 7 (2.27) ◽  
pp. 88 ◽  
Author(s):  
Merin Thomas ◽  
Latha C.A

Sentiment analysis has been an important topic of discussion from two decades since Lee published his first paper on the sentimental analysis in 2002. Apart from the sentimental analysis in English, it has spread its wing to other natural languages whose significance is very important in a multi linguistic country like India. The traditional approaches in machine learning have paved better accuracy for the Analysis. Deep Learning approaches have gained its momentum in recent years in sentimental analysis. Deep learning mimics the human learning so expectations are to meet higher levels of accuracy. In this paper we have implemented sentimental analysis of tweets in South Indian language Malayalam. The model used is Recurrent Neural Networks Long Short-Term Memory, a deep learning technique to predict the sentiments analysis. Achieved accuracy was found increasing with quality and depth of the datasets. 


2021 ◽  
Author(s):  
Ashwini Bhaskar Abhale ◽  
S S Manivannan

Abstract Because of the ever increasing number of Internet users, Internet security is becoming more essential. To identify and detect attackers, many researchers utilized data mining methods. Existing data mining techniques are unable to provide a sufficient degree of detection precision. An intrusion detection system for wireless networks is being developed to ensure data transmission security. The Network Intrusion Detection Algorithm (NIDS) uses a deep classification system to classify network connections as good or harmful. Deep Convolution Neural Network (DCNN), Deep Recurrent Neural Network (DRNN), Deep Long Short-Term Memory (DLSTM), Deep Convolution Neural Network Long Short-Term Memory (DCNN LSTM), and Deep Gated Recurrent Unit (DGRU) methods that use NSLKDD data records to train models are proposed. The experiments were carried out for a total of 1000 epochs. During the experiment, we achieved a model accuracy of more than 98 percent. We also discovered that as the number of layers in a model grows, so does the accuracy.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Najla M. Alharbi ◽  
Norah S. Alghamdi ◽  
Eman H. Alkhammash ◽  
Jehad F. Al Amri

Consumer feedback is highly valuable in business to assess their performance and is also beneficial to customers as it gives them an idea of what to expect from new products. In this research, the aim is to evaluate different deep learning approaches to accurately predict the opinion of customers based on mobile phone reviews obtained from Amazon.com. The prediction is based on analysing these reviews and categorizing them as positive, negative, or neutral. Different deep learning algorithms have been implemented and evaluated such as simple RNN with its four variants, namely, Long Short-Term Memory Networks (LRNN), Group Long Short-Term Memory Networks (GLRNN), gated recurrent unit (GRNN), and update recurrent unit (UGRNN). All evaluated algorithms are combined with word embedding as feature extraction approach for sentiment analysis including Glove, word2vec, and FastText by Skip-grams. The five different algorithms with the three feature extraction methods are evaluated based on accuracy, recall, precision, and F1-score for both balanced and unbalanced datasets. For the unbalanced dataset, it was found that the GLRNN algorithms with FastText feature extraction scored the highest accuracy of 93.75%. This result achieved the highest accuracy on this dataset when compared with other methods mentioned in the literature. For the balanced dataset, the highest achieved accuracy was 88.39% by the LRNN algorithm.


2021 ◽  
Author(s):  
Naresh Kumar Thapa K ◽  
N. Duraipandian

Abstract Malicious traffic classification is the initial and primary step for any network-based security systems. This traffic classification systems include behavior-based anomaly detection system and Intrusion Detection System. Existing methods always relies on the conventional techniques and process the data in the fixed sequence, which may leads to performance issues. Furthermore, conventional techniques require proper annotation to process the volumetric data. Relying on the data annotation for efficient traffic classification may leads to network loops and bandwidth issues within the network. To address the above-mentioned issues, this paper presents a novel solution based on artificial intelligence perspective. The key idea of this paper is to propose a novel malicious classification system using Long Short-Term Memory (LSTM) model. To validate the efficiency of the proposed model, an experimental setup along with experimental validation is carried out. From the experimental results, it is proven that the proposed model is better in terms of accuracy, throughput when compared to the state-of-the-art models. Further, the accuracy of the proposed model outperforms the existing state of the art models with increase in 5% and overall 99.5% in accuracy.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-18
Author(s):  
Hasan Alkahtani ◽  
Theyazn H. H. Aldhyani

Smart grids, advanced information technology, have become the favored intrusion targets due to the Internet of Things (IoT) using sensor devices to collect data from a smart grid environment. These data are sent to the cloud, which is a huge network of super servers that provides different services to different smart infrastructures, such as smart homes and smart buildings. These can provide a large space for attackers to launch destructive cyberattacks. The novelty of this proposed research is the development of a robust framework system for detecting intrusions based on the IoT environment. An IoTID20 dataset attack was employed to develop the proposed system; it is a newly generated dataset from the IoT infrastructure. In this framework, three advanced deep learning algorithms were applied to classify the intrusion: a convolution neural network (CNN), a long short-term memory (LSTM), and a hybrid convolution neural network with the long short-term memory (CNN-LSTM) model. The complexity of the network dataset was dimensionality reduced, and to improve the proposed system, the particle swarm optimization method (PSO) was used to select relevant features from the network dataset. The obtained features were processed using deep learning algorithms. The experimental results showed that the proposed systems achieved accuracy as follows: CNN = 96.60%, LSTM = 99.82%, and CNN-LSTM = 98.80%. The proposed framework attained the desired performance on a new variable dataset, and the system will be implemented in our university IoT environment. The results of comparative predictions between the proposed framework and existing systems showed that the proposed system more efficiently and effectively enhanced the security of the IoT environment from attacks. The experimental results confirmed that the proposed framework based on deep learning algorithms for an intrusion detection system can effectively detect real-world attacks and is capable of enhancing the security of the IoT environment.


Author(s):  
Aishwarya R. Verma

Abstract: Words are the meaty component which can be expressed through speech, writing or signals. It is important that the actual message or meaning of the words sent must conveys the same meaning to the one receives. The evolution from manual language translator to the digital machine translation have helped us a lot for finding the exact meaning such that each word must give at least close to exact actual meaning. To make machine translator more human-friendly feeling, natural language processing (NLP) with machine learning (ML) can make the best combination. The main challenges in machine translated sentence can involve ambiguities, lexical divergence, syntactic, lexical mismatches, semantic issues, etc. which can be seen in grammar, spellings, punctuations, spaces, etc. After analysis on different algorithms, we have implemented a two different machine translator using two different Long Short-Term Memory (LSTM) approaches and performed the comparative study of the quality of the translated text based on their respective accuracy. We have used two different training approaches of encodingdecoding techniques using same datasets, which translates the source English text to the target Hindi text. To detect the text entered is English or Hindi language, we have used Sequential LSTM training model for which the analysis has been performed based on its accuracy. As the result, the first LSTM trained model is 84% accurate and the second LSTM trained model is 71% accurate in its translation from English to Hindi text, while the detection LSTM trained model is 78% accurate in detecting English text and 81% accurate in detecting Hindi text. This study has helped us to analyze the appropriate machine translation based on its accuracy. Keywords: Accuracy, Decoding, Machine Learning (ML), Detection System, Encoding, Long Short-Term Memory (LSTM), Machine Translation, Natural Language Processing (NLP), Sequential


Sign in / Sign up

Export Citation Format

Share Document