Analysis of Text Feature Extractors using Deep Learning on Fake News

Social media and easy internet access have allowed the instant sharing of news, ideas, and information on a global scale. However, rapid spread and instant access to information/news can also enable rumors or fake news to spread very easily and rapidly. In order to monitor and minimize the spread of fake news in the digital community, fake news detection using Natural Language Processing (NLP) has attracted significant attention. In NLP, different text feature extractors and word embeddings are used to process the text data. The aim of this paper is to analyze the performance of a fake news detection model based on neural networks using 3 feature extractors: TD-IDF vectorizer, Glove embeddings, and BERT embeddings. For the evaluation, multiple metrics, namely accuracy, precision, F1, recall, AUC ROC, and AUC PR were computed for each feature extractor. All the transformation techniques were fed to the deep learning model. It was found that BERT embeddings for text transformation delivered the best performance. TD-IDF has been performed far better than Glove and competed the BERT as well at some stages.

Download Full-text

Medical Text Classification Using Hybrid Deep Learning Models with Multihead Attention

Computational Intelligence and Neuroscience ◽

10.1155/2021/9425655 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Sunil Kumar Prabhakar ◽

Dong-Ok Won

Keyword(s):

Deep Learning ◽

Language Processing ◽

Text Classification ◽

Patient Information ◽

Classification Accuracy ◽

Learning Model ◽

Training Data ◽

Machine Learning Techniques ◽

Medical Text ◽

Deep Learning Model

To unlock information present in clinical description, automatic medical text classification is highly useful in the arena of natural language processing (NLP). For medical text classification tasks, machine learning techniques seem to be quite effective; however, it requires extensive effort from human side, so that the labeled training data can be created. For clinical and translational research, a huge quantity of detailed patient information, such as disease status, lab tests, medication history, side effects, and treatment outcomes, has been collected in an electronic format, and it serves as a valuable data source for further analysis. Therefore, a huge quantity of detailed patient information is present in the medical text, and it is quite a huge challenge to process it efficiently. In this work, a medical text classification paradigm, using two novel deep learning architectures, is proposed to mitigate the human efforts. The first approach is that a quad channel hybrid long short-term memory (QC-LSTM) deep learning model is implemented utilizing four channels, and the second approach is that a hybrid bidirectional gated recurrent unit (BiGRU) deep learning model with multihead attention is developed and implemented successfully. The proposed methodology is validated on two medical text datasets, and a comprehensive analysis is conducted. The best results in terms of classification accuracy of 96.72% is obtained with the proposed QC-LSTM deep learning model, and a classification accuracy of 95.76% is obtained with the proposed hybrid BiGRU deep learning model.

Download Full-text

Text Data Augmentation for Deep Learning

10.21203/rs.3.rs-650804/v1 ◽

2021 ◽

Author(s):

Connor Shorten ◽

Taghi M. Khoshgoftaar ◽

Borko Furht

Keyword(s):

Deep Learning ◽

Language Processing ◽

Data Augmentation ◽

Early Stage ◽

Practical Implementation ◽

Text Data ◽

Training Strategy ◽

Local Decision ◽

Decision Boundaries

Abstract Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.

Download Full-text

FastPacket: Towards Pre-trained Packets Embedding based on FastText for next-generation NIDS

10.21203/rs.3.rs-555961/v1 ◽

2021 ◽

Author(s):

Khloud Al Jallad

Keyword(s):

Big Data ◽

Deep Learning ◽

Language Processing ◽

Intrusion Detection Systems ◽

Dynamic Feature ◽

Statistical Features ◽

Text Data ◽

New Approach ◽

New Era ◽

Detection Systems

Abstract New Attacks are increasingly used by attackers every day but many of them are not detected by Intrusion Detection Systems as most IDS ignore raw packet information and only care about some basic statistical information extracted from PCAP files. Using networking programs to extract fixed statistical features from packets is good, but may not enough to detect nowadays challenges. We think that it is time to utilize big data and deep learning for automatic dynamic feature extraction from packets. It is time to get inspired by deep learning pre-trained models in computer vision and natural language processing, so security deep learning solutions will have its pre-trained models on big datasets to be used in future researches. In this paper, we proposed a new approach for embedding packets based on character-level embeddings, inspired by FastText success on text data. We called this approach FastPacket. Results are measured on subsets of CIC-IDS-2017 dataset, but we expect promising results on big data pre-trained models. We suggest building pre-trained FastPacket on MAWI big dataset and make it available to community, similar to FastText. To be able to outperform currently used NIDS, to start a new era of packet-level NIDS that can better detect complex attacks

Download Full-text

Detecting Fake News Using Deep Learning and NLP

Advances in Digital Crime, Forensics, and Cyber Terrorism - Confluence of AI, Machine, and Deep Learning in Cyber Forensics ◽

10.4018/978-1-7998-4900-1.ch007 ◽

2021 ◽

pp. 117-133

Author(s):

Uma Maheswari Sadasivam ◽

Nitin Ganesan

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Reliable Information ◽

Citizen Journalism ◽

Fake News ◽

Social Unrest ◽

Safe Place ◽

The Social

Fake news is the word making more talk these days be it election, COVID 19 pandemic, or any social unrest. Many social websites have started to fact check the news or articles posted on their websites. The reason being these fake news creates confusion, chaos, misleading the community and society. In this cyber era, citizen journalism is happening more where citizens do the collection, reporting, dissemination, and analyse news or information. This means anyone can publish news on the social websites and lead to unreliable information from the readers' points of view as well. In order to make every nation or country safe place to live by holding a fair and square election, to stop spreading hatred on race, religion, caste, creed, also to have reliable information about COVID 19, and finally from any social unrest, we need to keep a tab on fake news. This chapter presents a way to detect fake news using deep learning technique and natural language processing.

Download Full-text

Performance of Optimization Algorithms in Attention-Based Deep Learning Model for Fake News Detection System

10.1007/978-981-16-3802-2_9 ◽

2021 ◽

pp. 113-126

Author(s):

S. P. Ramya ◽

R. Eswari

Keyword(s):

Deep Learning ◽

Detection System ◽

Optimization Algorithms ◽

Learning Model ◽

Fake News ◽

Deep Learning Model

Download Full-text

Content Noise Detection Model Using Deep Learning in Web Forums

Sustainability ◽

10.3390/su12125074 ◽

2020 ◽

Vol 12 (12) ◽

pp. 5074

Author(s):

Jiyoung Woo ◽

Jaeseok Yun

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Model ◽

Detection Model ◽

Proposed Model ◽

Web Forum ◽

Web Forums ◽

Conventional Machine ◽

Text Features ◽

Deep Learning Model

Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively.

Download Full-text

Ship Detection from X-Band SAR Images Using M2Det Deep Learning Model

Applied Sciences ◽

10.3390/app10217751 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7751

Author(s):

Seong-Jae Hong ◽

Won-Kyung Baek ◽

Hyung-Sup Jung

Keyword(s):

Deep Learning ◽

Learning Process ◽

Intensity Difference ◽

Sar Images ◽

Ship Detection ◽

Detection Model ◽

Learning Techniques ◽

Vessel Detection ◽

Intensity Images ◽

Deep Learning Model

Synthetic aperture radar (SAR) images have been used in many studies for ship detection because they can be captured without being affected by time and weather. In recent years, the development of deep learning techniques has facilitated studies on ship detection in SAR images using deep learning techniques. However, because the noise from SAR images can negatively affect the learning of the deep learning model, it is necessary to reduce the noise through preprocessing. In this study, deep learning vessel detection was performed using preprocessed SAR images, and the effects of the preprocessing of the images on deep learning vessel detection were compared and analyzed. Through the preprocessing of SAR images, (1) intensity images, (2) decibel images, and (3) intensity difference and texture images were generated. The M2Det object detection model was used for the deep learning process and preprocessed SAR images. After the object detection model was trained, ship detection was performed using test images. The test results are presented in terms of precision, recall, and average precision (AP), which were 93.18%, 91.11%, and 89.78% for the intensity images, respectively, 94.16%, 94.16%, and 92.34% for the decibel images, respectively, and 97.40%, 94.94%, and 95.55% for the intensity difference and texture images, respectively. From the results, it can be found that the preprocessing of the SAR images can facilitate the deep learning process and improve the ship detection performance. The results of this study are expected to contribute to the development of deep learning-based ship detection techniques in SAR images in the future.

Download Full-text

Fake News Detection using Deep Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7059.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 226-228

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Deep Neural Network ◽

Negative Impact ◽

Research Area ◽

Fake News ◽

The World

News is a routine in everyone's life. It helps in enhancing the knowledge on what happens around the world. Fake news is a fictional information madeup with the intension to delude and hence the knowledge acquired becomes of no use. As fake news spreads extensively it has a negative impact in the society and so fake news detection has become an emerging research area. The paper deals with a solution to fake news detection using the methods, deep learning and Natural Language Processing. The dataset is trained using deep neural network. The dataset needs to be well formatted before given to the network which is made possible using the technique of Natural Language Processing and thus predicts whether a news is fake or not.

Download Full-text

Deep learning model for metagenome fragment classification using spaced k-mers feature extraction

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13407 ◽

2020 ◽

Vol 8 (3) ◽

pp. 234-238

Author(s):

Nur Choiriyati ◽

Yandra Arkeman ◽

Wisnu Ananta Kusuma

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Deep Learning ◽

Language Processing ◽

Computational Time ◽

Genus Level ◽

Computational Resources ◽

Learning Architectures ◽

Deep Learning Model

An open challenge in bioinformatics is the analysis of the sequenced metagenomes from the various environments. Several studies demonstrated bacteria classification at the genus level using k-mers as feature extraction where the highest value of k gives better accuracy but it is costly in terms of computational resources and computational time. Spaced k-mers method was used to extract the feature of the sequence using 111 1111 10001 where 1 was a match and 0 was the condition that could be a match or did not match. Currently, deep learning provides the best solutions to many problems in image recognition, speech recognition, and natural language processing. In this research, two different deep learning architectures, namely Deep Neural Network (DNN) and Convolutional Neural Network (CNN), trained to approach the taxonomic classification of metagenome data and spaced k-mers method for feature extraction. The result showed the DNN classifier reached 90.89 % and the CNN classifier reached 88.89 % accuracy at the genus level taxonomy.

Download Full-text

Deep Learning Based Truth Discovery Algorithm for Research the Genuineness of Given Text Corpus

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1112.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 605-611

Keyword(s):

Deep Learning ◽

Language Processing ◽

Learning Algorithm ◽

Multiple Sources ◽

Text Data ◽

Text Corpus ◽

Automatic Feature Extraction ◽

Deep Learning Algorithm ◽

Truth Discovery ◽

Improved Accuracy

Lot of research has gone into Natural language processing and the state of the art algorithms in deep learning that unambiguously helps in converting an English text into a data structure without loss of meaning. Also with the advent of neural networks for learning word representations as vectors has helped a lot in revolutionizing the automatic feature extraction from text data corpus. A combination of word embedding and the use of a deep learning algorithm like a convolution neural network helped in better accuracy for text classification. In this era of Internet of things and the voluminous amounts of data that is overwhelming the users determining the veracity of the data is a very challenging task. There are many truth discovery algorithms in literature that help in resolving the conflicts that arise due to multiple sources of data. These algorithms help in estimating the trustworthiness of the data and reliability of the sources. In this paper, a convolution based truth discovery with multitasking is proposed to estimate the genuineness of the data for a given text corpus. The proposed algorithm has been tested on analysing the genuineness of Quora questions dataset and experimental results showed an improved accuracy and speed over other existing approaches.

Download Full-text