A Stylistic Features Based Approach for Author Profiling

Author(s):  
Karunakar Kavuri ◽  
M. Kavitha
Keyword(s):  
2020 ◽  
Vol 24 (3) ◽  
Author(s):  
Miguel Á. Álvarez Carmona ◽  
Esaú Villatoro Tello ◽  
Manuel Montes y Gómez ◽  
Luis Villaseñor Pineda

2021 ◽  
Author(s):  
Shloak Rathod

<div><div><div><p>The proliferation of online media allows for the rapid dissemination of unmoderated news, unfortunately including fake news. The extensive spread of fake news poses a potent threat to both individuals and society. This paper focuses on designing author profiles to detect authors who are primarily engaged in publishing fake news articles. We build on the hypothesis that authors who write fake news repeatedly write only fake news articles, at least in short-term periods. Fake news authors have a distinct writing style compared to real news authors, who naturally want to maintain trustworthiness. We explore the potential to detect fake news authors by designing authors’ profiles based on writing style, sentiment, and co-authorship patterns. We evaluate our approach using a publicly available dataset with over 5000 authors and 20000 articles. For our evaluation, we build and compare different classes of supervised machine learning models. We find that the K-NN model performed the best, and it could detect authors who are prone to writing fake news with an 83% true positive rate with only a 5% false positive rate.</p></div></div></div>


Author(s):  
Ch. Swathi ◽  
K. Karunakar ◽  
G. Archana ◽  
T. Raghunadha Reddy
Keyword(s):  

Author(s):  
Michael P. Oakes

Author profiling is the analysis of people’s writing in an attempt to find out which classes they belong to, such as gender, age group or native language. Many of the techniques for author profiling are derived from the related task of Author Identification, so we will look at this topic first. Author identification is the task of finding out who is most likely to have written a disputed document, and there are a number of computational approaches to this. The three main subtasks are the compilation of corpora of texts known to be written by the candidate authors, the selection of linguistic features to represent those texts, and statistics for discriminating between those features which are most indicative of a particular author’s writing style. Plagiarism is the unacknowledged use of another author’s original work, and we will look at software for its detection. The chapter will cover the types of text obfuscation strategies used by plagiarists, commercial plagiarism detection software and its shortcomings, and recent research systems. Strategies have been developed for both external plagiarism detection (where the original source is searched for in a large document collection) and intrinsic plagiarism detection (where the source text is not available, necessitating a search for inconsistencies within the suspicious document). The specific problems of plagiarism by translation of an original in another language, and the unauthorized copying of sections of computer code, are described. Evaluation forums and publicly available test data sets are covered for each of the main topics of this chapter.


2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
Helena Gómez-Adorno ◽  
Ilia Markov ◽  
Grigori Sidorov ◽  
Juan-Pablo Posadas-Durán ◽  
Miguel A. Sanchez-Perez ◽  
...  

We introduce a lexical resource for preprocessing social media data. We show that a neural network-based feature representation is enhanced by using this resource. We conducted experiments on the PAN 2015 and PAN 2016 author profiling corpora and obtained better results when performing the data preprocessing using the developed lexical resource. The resource includes dictionaries of slang words, contractions, abbreviations, and emoticons commonly used in social media. Each of the dictionaries was built for the English, Spanish, Dutch, and Italian languages. The resource is freely available.


Sign in / Sign up

Export Citation Format

Share Document