scholarly journals Detection of Fake News Text Classification on COVID-19 Using Deep Learning Approaches

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Waqas Haider Bangyal ◽  
Rukhma Qasim ◽  
Najeeb ur Rehman ◽  
Zeeshan Ahmad ◽  
Hafsa Dar ◽  
...  

A vast amount of data is generated every second for microblogs, content sharing via social media sites, and social networking. Twitter is an essential popular microblog where people voice their opinions about daily issues. Recently, analyzing these opinions is the primary concern of Sentiment analysis or opinion mining. Efficiently capturing, gathering, and analyzing sentiments have been challenging for researchers. To deal with these challenges, in this research work, we propose a highly accurate approach for SA of fake news on COVID-19. The fake news dataset contains fake news on COVID-19; we started by data preprocessing (replace the missing value, noise removal, tokenization, and stemming). We applied a semantic model with term frequency and inverse document frequency weighting for data representation. In the measuring and evaluation step, we applied eight machine-learning algorithms such as Naive Bayesian, Adaboost, K -nearest neighbors, random forest, logistic regression, decision tree, neural networks, and support vector machine and four deep learning CNN, LSTM, RNN, and GRU. Afterward, based on the results, we boiled a highly efficient prediction model with python, and we trained and evaluated the classification model according to the performance measures (confusion matrix, classification rate, true positives rate...), then tested the model on a set of unclassified fake news on COVID-19, to predict the sentiment class of each fake news on COVID-19. Obtained results demonstrate a high accuracy compared to the other models. Finally, a set of recommendations is provided with future directions for this research to help researchers select an efficient sentiment analysis model on Twitter data.

2021 ◽  
pp. 016555152110065
Author(s):  
Rahma Alahmary ◽  
Hmood Al-Dossari

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.


TEM Journal ◽  
2020 ◽  
pp. 1663-1668
Author(s):  
Shorouq Fathi Eletter

The exponential growth of unstructured data and the ability of businesses to utilize such data in decision-making have led to competitive advantages. The knowledge provided by analyzing unstructured data is crucial for product developers or service providers because it might affect the sustainability of the business. Sentiment analysis is used to gain an understanding of the attitudes, opinions, and emotions expressed within an online review. Naïve Bayes (NB), logistic regression (LR), decision trees (DT), deep learning (DL), and support vector machines (SVM) were used to build a classification model. In the data mining settings, the classification accuracy is the best metric to highlight the best classifier. The DL classifier outperformed other models in terms of accuracy rate. Classifying customers' feelings toward a product or service is critical for providing actionable insights. Utilizing such models will help to analyze huge volumes of reviews, saving both time and costs.


In this digitized world, the Internet has become a prominent source to glean various kinds of information. In today’s scenario, people prefer virtual reality instead of one to one communication. The Majority of the population prefers social networking sites to voice themselves through posts, blogs, comments, likes, dislikes. Their sentiments can be found/traced using opinion mining or Sentiment analysis. Sentiment analysis of social media text is a useful technique for identifying peoples’ positive, negative or neutral emotions/sentiments/opinions. Sentiment analysis has gained special attention by researchers from last few years. Traditionally many machine learning algorithms were used to implement it like navie bays, Support Vector Machine and many more. But to overcome the drawbacks of ML in terms of complex classification algorithms different deep learning-based algorithms are introduced like CNN, RNN, and HNN. In this paper, we have studied different deep learning algorithms and intended to propose a deep learning-based model to analyze the behavior of an individual using social media text. Results given by the proposed model can utilize in a range of different fields like business, education, industry, politics, psychology, security, etc.


Ethiopia is the leading producer of chickpea in Africa and among the top ten most important producers of chickpea in the world. Debre Zeit Agriculture Research Center is a research center in Ethiopia which is mandated for the improvement of chickpea and other crops. Genome enabled prediction technologies trying to transform the classification of chickpea types and upgrading the existing identification paradigm.Current state of the identification of chickpea types in Ethiopia still sticks to a manual. Domain experts tried to recognize every chickpea type, the way and efficiency of identifying each chickpea types mainly depend on the skills and experience of experts in the domain area and this frequently causes error and sometimes inaccurate. Most of the classification and identification of crops researches were done outside Ethiopia; for local and emerging varieties, there is a need to design classification model that assists selection mechanisms of chickpea and even accuracy of an existing algorithm should be verified and optimized. The main aim of this study is to design chickpea type classification model using machine learning algorithm that classify chickpea types. This research work has a total of 8303 records with 8 features and 80% for training and 20% for testing were used. Data preprocessing were done to prepare the dataset for experiments. ANN, SVM and DT were used to build the model. For evaluating the performance of the model confusion matrix with Accuracy, Recall and Precision were used. The experimental results show that the best-performed algorithms were decision tree and achieve 97.5% accuracy. After the evaluation of results found in this research work, agriculture research centers and companies have benefited. The model of chickpea type classification will be applied in Debre Zeit agriculture research center in Ethiopia as a base to support the experts during chickpea type identification process. In addition it enables the expertise to save time, effort and cost with the support of the identification model. Moreover, this research can also be used as a corner stone in the area and will be referred by future researchers in the domain area.


2020 ◽  
Vol 9 (2) ◽  
pp. 1049-1054

In this paper, we have tried to predict flight delays using different machine learning and deep learning techniques. By using such a model it can be easier to predict whether the flight will be delayed or not. Factors like ‘WeatherDelay’, ‘NASDelay’, ‘Destination’, ‘Origin’ play a vital role in this model. Using machine learning algorithms like Random Forest, Support Vector Machine (SVM) and K-Nearest Neighbors (KNN), the f1-score, precision, recall, support and accuracy have been predicted. To add to the model, Long Short-Term Memory (LSTM) RNN architecture has also been employed. In the paper, the dataset from Bureau of Transportation Statistics (BTS) of the ‘Pittsburgh’ is being used. The results computed from the above mentioned algorithms have been compared. Further, the results were visualized for various airlines to find maximum delay and AUC-ROC curve has been plotted for Random Forest Algorithm. The aim of our research work is to predict the delay so as to minimize loses and increase customer satisfaction.


2021 ◽  
Vol 7 ◽  
pp. e437
Author(s):  
Arushi Agarwal ◽  
Purushottam Sharma ◽  
Mohammed Alshehri ◽  
Ahmed A. Mohamed ◽  
Osama Alfarraj

In today’s cyber world, the demand for the internet is increasing day by day, increasing the concern of network security. The aim of an Intrusion Detection System (IDS) is to provide approaches against many fast-growing network attacks (e.g., DDoS attack, Ransomware attack, Botnet attack, etc.), as it blocks the harmful activities occurring in the network system. In this work, three different classification machine learning algorithms—Naïve Bayes (NB), Support Vector Machine (SVM), and K-nearest neighbor (KNN)—were used to detect the accuracy and reducing the processing time of an algorithm on the UNSW-NB15 dataset and to find the best-suited algorithm which can efficiently learn the pattern of the suspicious network activities. The data gathered from the feature set comparison was then applied as input to IDS as data feeds to train the system for future intrusion behavior prediction and analysis using the best-fit algorithm chosen from the above three algorithms based on the performance metrics found. Also, the classification reports (Precision, Recall, and F1-score) and confusion matrix were generated and compared to finalize the support-validation status found throughout the testing phase of the model used in this approach.


2021 ◽  
pp. 1-28
Author(s):  
Aakanksha Sharaff ◽  
Ramya Allenki ◽  
Rakhi Seth

Sentiment analysis works on the principle of categorizing and identifying the text-based content and the process of classifying documents into one of the predefined classes commonly known as text classification. Hackers deploy a strategy by sending malicious content as an advertisement link and attack the user system to gain information. For protecting the system from this type of phishing attack, one needs to classify the spam data. This chapter is based on a discussion and comparison of various classification models that are used for phishing SMS detection through sentiment analysis. In this chapter, SMS data is collected from Kaggle, which is classified as ham or spam; while implementing the deep learning techniques like Convolutional Neural Network (CNN), CNN with 7 layers, and CNN with 11 layers, different results are generated. For evaluating these results, different machine learning techniques are used as a baseline algorithm like Naive Bayes, Decision Trees, Support Vector Machine (SVM), and Artificial Neural Network (ANN). After evaluation, CNN showed the highest accuracy of 99.47% as a classification model.


Author(s):  
Ayesha Rafique ◽  
Kamran Malik ◽  
Zubair Nawaz ◽  
Faisal Bukhari ◽  
Akhtar Hussain Jalbani

The majority of online comments/opinions are written in text-free format. Sentiment Analysis can be used as a measure to express the polarity (positive/negative) of comments/opinions. These comments/ opinions can be in different languages i.e. English, Urdu, Roman Urdu, Hindi, Arabic etc. Mostly, people have worked on the sentiment analysis of the English language. Very limited research work has been done in Urdu or Roman Urdu languages. Whereas, Hindi/Urdu is the third largest language in the world. In this paper, we focus on the sentiment analysis of comments/opinions in Roman Urdu. There is no publicly available Roman Urdu public opinion dataset. We prepare a dataset by taking comments/opinions of people in Roman Urdu from different websites. Three supervised machine learning algorithms namely NB (Naive Bayes), LRSGD (Logistic Regression with Stochastic Gradient Descent) and SVM (Support Vector Machine) have been applied on this dataset. From results of experiments, it can be concluded that SVM performs better than NB and LRSGD in terms of accuracy. In case of SVM, an accuracy of 87.22% is achieved.


Author(s):  
Asha. J ◽  
Meenakowshalya. A

A detection of fake news is difficult due to limited publicly available resources (Datasets). Fake news is a false information which present in news or stories, blog so on. Fake news easily spread and damage the reputation of person or an organisation, therefore, detection of fake news is important. This project work detects fake news using unsupervised and deep learning algorithms. In unsupervised learning method One Class SVM (Support Vector Machine) and in deep learning method Hybrid CNN-RNN is implemented. Experimental results with NEWS dataset showed an accuracy of 58% for One Class SVM and 96.4% for Hybrid CNN-RNN. The proposed method performs better in terms of application performance compared to already existing Machine learning algorithms. This project can be further extended by exploiting high dimensional datasets in future.


2020 ◽  
Vol 23 (4) ◽  
pp. 274-284 ◽  
Author(s):  
Jingang Che ◽  
Lei Chen ◽  
Zi-Han Guo ◽  
Shuaiqun Wang ◽  
Aorigele

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.


Sign in / Sign up

Export Citation Format

Share Document