Autonomic Author Identification in Internet Relay Chat (IRC)

Abstract How to classify short texts effectively remains an important question in computational stylometry. This study presents the results of an experiment involving authorship attribution of ancient Greek texts. These texts were chosen to explore the effectiveness of digital methods as a supplement to the author’s work on text classification based on traditional stylometry. Here it is crucial to avoid confounding effects of shared topic, etc. Therefore, this study attempts to identify authorship using only morpho-syntactic data without regard to specific vocabulary items. The data are taken from the dependency annotations published in the Ancient Greek and Latin Dependency Treebank. The independent variables for classification are combinations generated from the dependency label and the morphology of each word in the corpus and its dependency parent. To avoid the effects of the combinatorial explosion, only the most frequent combinations are retained as input features. The authorship classification (with thirteen classes) is done with standard algorithms—logistic regression and support vector classification. During classification, the corpus is partitioned into increasingly smaller ‘texts’. To explore and control for the possible confounding effects of, e.g. different genre and annotator, three corpora were tested: a mixed corpus of several genres of both prose and verse, a corpus of prose including oratory, history, and essay, and a corpus restricted to narrative history. Results are surprisingly good as compared to those previously published. Accuracy for fifty-word inputs is 84.2–89.6%. Thus, this approach may prove an important addition to the prevailing methods for small text classification.

Download Full-text

Author Identification of Micro-Messages via Multi-Channel Convolutional Neural Networks

2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc42975.2020.9283214 ◽

2020 ◽

Author(s):

Sarp Aykent ◽

Gerry Dozier

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Author Identification

Download Full-text

ICodeNet - A Hierarchical Neural Network Approach For Source Code Author Identification

2021 13th International Conference on Machine Learning and Computing ◽

10.1145/3457682.3457709 ◽

2021 ◽

Author(s):

Pranali Bora ◽

Tulika Awalgaonkar ◽

Himanshu Palve ◽

Raviraj Joshi ◽

Purvi Goel

Keyword(s):

Neural Network ◽

Source Code ◽

Network Approach ◽

Neural Network Approach ◽

Author Identification ◽

Hierarchical Neural Network

Download Full-text

Disentangling Chat

Computational Linguistics ◽

10.1162/coli_a_00003 ◽

2010 ◽

Vol 36 (3) ◽

pp. 389-409 ◽

Cited By ~ 21

Author(s):

Micha Elsner ◽

Eugene Charniak

Keyword(s):

Internet Relay Chat ◽

Clustering Model ◽

Highly Correlated ◽

Graph Based Clustering

When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately. We refer to this task as disentanglement. We present a corpus of Internet Relay Chat dialogue in which the various conversations have been manually disentangled, and evaluate annotator reliability. We propose a graph-based clustering model for disentanglement, using lexical, timing, and discourse-based features. The model's predicted disentanglements are highly correlated with manual annotations. We conclude by discussing two extensions to the model, specificity tuning and conversation start detection, both of which are promising but do not currently yield practical improvements.

Download Full-text

Strangers in a Strange Land Interaction Management on Internet Relay Chat

Human Communication Research ◽

10.1111/j.1468-2958.1997.tb00408.x ◽

1997 ◽

Vol 23 (4) ◽

pp. 507-534 ◽

Cited By ~ 59

Author(s):

E. SEAN RINTEL ◽

JEFFERY PITTAM

Keyword(s):

Internet Relay Chat ◽

Interaction Management

Download Full-text

Electronic Chat: Social Issues on Internet Relay Chat

Media Information Australia ◽

10.1177/1329878x9306700108 ◽

1993 ◽

Vol 67 (1) ◽

pp. 62-70 ◽

Cited By ~ 15

Author(s):

Elizabeth Reid

Keyword(s):

Social Issues ◽

Internet Relay Chat

Download Full-text

Author Identification on Literature in Different Languages: A Systematic Survey

2018 International Conference On Advances in Communication and Computing Technology (ICACCT) ◽

10.1109/icacct.2018.8529635 ◽

2018 ◽

Cited By ~ 1

Author(s):

Kale Sunil Digamberrao ◽

Rajesh S. Prasad

Keyword(s):

Systematic Survey ◽

Author Identification

Download Full-text

Evaluation of the Performance and Efficiency of the Automated Linguistic Features for Author Identification in Short Text Messages Using Different Variable Selection Techniques

Studies in Media and Communication ◽

10.11114/smc.v6i2.3892 ◽

2018 ◽

Vol 6 (2) ◽

pp. 83

Author(s):

Refat Aljumily

Keyword(s):

Variable Selection ◽

Text Messages ◽

Sentence Length ◽

Function Word ◽

Linguistic Features ◽

Linguistic Feature ◽

Short Text ◽

Parts Of Speech ◽

Author Identification ◽

Type Frequency

The aim of this paper was to evaluate the efficiency of automated linguistic features to test its capacity or discriminating power as style markers for author identification in short text messages of the Facebook genre. The corpus used to evaluate the automated linguistics features was compiled from 221 Facebook texts (each text is about 2 to 3 lines/35-40 words) written in English, which were written in the same genre and topic and posted in the same year group, totaling 7530 words. To compose the dataset for linguistic features performance or evaluation, frequency values were collected from 16 linguistic feature types involving parts of speech, function words, word bigrams, character tri grams, average sentence length in terms of words, average sentence length in terms of characters, Yule’s K measure, Simpson’s D measure, average words length, FW/CW ratio, average characters, content specific key words, type/token ratio, total number of short words less than four characters, contractions, and total number of characters in words which were selected from five corpora, totalling 328 test features. The evaluation of the 16 linguistic feature types differ from those of other analyses because the study used different variable selection methods including feature type frequency, variance, term frequency/ inverse document frequency (TF.IDF), signal-noise ratio, and Poisson term distribution. The relationships between known and anonymous text messages were examined using hierarchical linear and non-hierarchical nonlinear clustering methods, taking into accounts the nonlinear patterns among the data. There were similarities between the anonymous text messages and the authors of the non-anonymous text messages in terms function word and parts of speech usages based on TF.IDF technique and the efficiency of function word usages (=60%) and the efficiency of parts of speech frequencies (=50%). There were no similarities between the anonymous text messages and the authors of the non-anonymous text messages in terms of the other features using feature type frequency and variance techniques in this test and the efficiency of these features in the corpus (< 40%). There was a positive effect on identification performance using parts of speech and function word frequency usages and applying TF.IDF technique as the length of text messages increased (N≥ 100). Through this way, the performance and efficiency of syntactic features and function word usages to identify anonymous authors or text messages is improved by increasing the length of the text messages using TF.IDF variable selection technique, but decreased as feature type frequency and variance techniques in the selection process apply.

Download Full-text

USING TEXT MINING AND RANDOM FORESTS FOR AUTHOR IDENTIFICATION. THE CASE OF CLARIVATE WEB OF SCIENCE DATABASE

10.12948/ie2019.01.01 ◽

2019 ◽

Author(s):

Marin FOTACHE

Keyword(s):

Text Mining ◽

Random Forests ◽

Web Of Science ◽

Author Identification

Download Full-text

A robust authorship attribution on big period

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp3167-3174 ◽

2019 ◽

Vol 9 (4) ◽

pp. 3167 ◽

Cited By ~ 1

Author(s):

Mubin Shoukat Tamboli ◽

Rajesh Prasad

Keyword(s):

Identification Problem ◽

Authorship Attribution ◽

Support Vector ◽

Writing Style ◽

Author Identification ◽

Time Period ◽

N Gram ◽

Corpus Selection ◽

Writing Sample ◽

Small Period

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram.

Download Full-text