Examination of fake news from a viral perspective: an interplay of emotions, resonance, and sentiments

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Krishnadas Nanath ◽  
Supriya Kaitheri ◽  
Sonia Malik ◽  
Shahid Mustafa

Purpose The purpose of this paper is to examine the factors that significantly affect the prediction of fake news from the virality theory perspective. The paper looks at a mix of emotion-driven content, sentimental resonance, topic modeling and linguistic features of news articles to predict the probability of fake news. Design/methodology/approach A data set of over 12,000 articles was chosen to develop a model for fake news detection. Machine learning algorithms and natural language processing techniques were used to handle big data with efficiency. Lexicon-based emotion analysis provided eight kinds of emotions used in the article text. The cluster of topics was extracted using topic modeling (five topics), while sentiment analysis provided the resonance between the title and the text. Linguistic features were added to the coding outcomes to develop a logistic regression predictive model for testing the significant variables. Other machine learning algorithms were also executed and compared. Findings The results revealed that positive emotions in a text lower the probability of news being fake. It was also found that sensational content like illegal activities and crime-related content were associated with fake news. The news title and the text exhibiting similar sentiments were found to be having lower chances of being fake. News titles with more words and content with fewer words were found to impact fake news detection significantly. Practical implications Several systems and social media platforms today are trying to implement fake news detection methods to filter the content. This research provides exciting parameters from a viral theory perspective that could help develop automated fake news detectors. Originality/value While several studies have explored fake news detection, this study uses a new perspective on viral theory. It also introduces new parameters like sentimental resonance that could help predict fake news. This study deals with an extensive data set and uses advanced natural language processing to automate the coding techniques in developing the prediction model.

Author(s):  
Rashida Ali ◽  
Ibrahim Rampurawala ◽  
Mayuri Wandhe ◽  
Ruchika Shrikhande ◽  
Arpita Bhatkar

Internet provides a medium to connect with individuals of similar or different interests creating a hub. Since a huge hub participates on these platforms, the user can receive a high volume of messages from different individuals creating a chaos and unwanted messages. These messages sometimes contain a true information and sometimes false, which leads to a state of confusion in the minds of the users and leads to first step towards spam messaging. Spam messages means an irrelevant and unsolicited message sent by a known/unknown user which may lead to a sense of insecurity among users. In this paper, the different machine learning algorithms were trained and tested with natural language processing (NLP) to classify whether the messages are spam or ham.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 218-239 ◽  
Author(s):  
Ravikumar Patel ◽  
Kalpdrum Passi

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.


Author(s):  
Anurag Langan

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.


JAMIA Open ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 139-149 ◽  
Author(s):  
Meijian Guan ◽  
Samuel Cho ◽  
Robin Petro ◽  
Wei Zhang ◽  
Boris Pasche ◽  
...  

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.


2010 ◽  
Vol 3 ◽  
pp. BII.S4706 ◽  
Author(s):  
John Pestian ◽  
Henry Nasrallah ◽  
Pawel Matykiewicz ◽  
Aurora Bennett ◽  
Antoon Leenaars

Suicide is the second leading cause of death among 25–34 year olds and the third leading cause of death among 15–25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient's thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ivano Lauriola ◽  
Fabio Aiolli ◽  
Alberto Lavelli ◽  
Fabio Rinaldi

Abstract Background Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task. Results This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score. Conclusions Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.


2020 ◽  
Vol 30 (2) ◽  
pp. 155-174
Author(s):  
Tim Hutchinson

Purpose This study aims to provide an overview of recent efforts relating to natural language processing (NLP) and machine learning applied to archival processing, particularly appraisal and sensitivity reviews, and propose functional requirements and workflow considerations for transitioning from experimental to operational use of these tools. Design/methodology/approach The paper has four main sections. 1) A short overview of the NLP and machine learning concepts referenced in the paper. 2) A review of the literature reporting on NLP and machine learning applied to archival processes. 3) An overview and commentary on key existing and developing tools that use NLP or machine learning techniques for archives. 4) This review and analysis will inform a discussion of functional requirements and workflow considerations for NLP and machine learning tools for archival processing. Findings Applications for processing e-mail have received the most attention so far, although most initiatives have been experimental or project based. It now seems feasible to branch out to develop more generalized tools for born-digital, unstructured records. Effective NLP and machine learning tools for archival processing should be usable, interoperable, flexible, iterative and configurable. Originality/value Most implementations of NLP for archives have been experimental or project based. The main exception that has moved into production is ePADD, which includes robust NLP features through its named entity recognition module. This paper takes a broader view, assessing the prospects and possible directions for integrating NLP tools and techniques into archival workflows.


Sign in / Sign up

Export Citation Format

Share Document