The coming age of adversarial social bot detection

First Monday ◽  
2021 ◽  
Author(s):  
Stefano Cresci ◽  
Marinella Petrocchi ◽  
Angelo Spognardi ◽  
Stefano Tognazzi

Social bots are automated accounts often involved in unethical or illegal activities. Academia has shown how these accounts evolve over time, becoming increasingly smart at hiding their true nature by disguising themselves as genuine accounts. If they evade, bots hunters adapt their solutions to find them: the cat and mouse game. Inspired by adversarial machine learning and computer security, we propose an adversarial and proactive approach to social bot detection, and we call scholars to arms, to shed light on this open and intriguing field of study.

2020 ◽  
Vol 62 (5-6) ◽  
pp. 279-286
Author(s):  
Christian Wressnegger

AbstractDetecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. Manually crafting detection rules is by no means feasible at scale, and automatically generated signatures often lack context, such that they fall short in detecting slight variations of known threats.In the thesis “Efficient Machine Learning for Attack Detection” [35], we address the necessity of advanced attack detection. For the effective application of machine learning in this domain, a periodic retraining over time is crucial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model for establishing an higher degree of automation for defenses can be achieved in linear time.


By the late second century, early Christian gospels had been divided into two groups by a canonical boundary that assigned normative status to four of them while consigning their competitors to the margins. The project of this volume is to find ways to reconnect these divided texts. The primary aim is not to address the question whether the canonical/non-canonical distinction reflects substantive and objectively verifiable differences between the two bodies of texts—although that issue may arise at various points. Starting from the assumption that, in spite of their differences, all early gospels express a common belief in the absolute significance of Jesus and his earthly career, the intention is to make their interconnectedness fruitful for interpretation. The approach taken is thematic and comparative: a selected theme or topic is traced across two or more gospels on either side of the canonical boundary, and the resulting convergences and divergences shed light not least on the canonical texts themselves as they are read from new and unfamiliar vantage points. The outcome is to demonstrate that early gospel literature can be regarded as a single field of study, in contrast to the overwhelming predominance of the canonical four characteristic of traditional gospels scholarship.


2021 ◽  
Vol 5 (1) ◽  
pp. 5
Author(s):  
Ninghan Chen ◽  
Zhiqiang Zhong ◽  
Jun Pang

The outbreak of the COVID-19 led to a burst of information in major online social networks (OSNs). Facing this constantly changing situation, OSNs have become an essential platform for people expressing opinions and seeking up-to-the-minute information. Thus, discussions on OSNs may become a reflection of reality. This paper aims to figure out how Twitter users in the Greater Region (GR) and related countries react differently over time through conducting a data-driven exploratory study of COVID-19 information using machine learning and representation learning methods. We find that tweet volume and COVID-19 cases in GR and related countries are correlated, but this correlation only exists in a particular period of the pandemic. Moreover, we plot the changing of topics in each country and region from 22 January 2020 to 5 June 2020, figuring out the main differences between GR and related countries.


2020 ◽  
Vol 46 (Supplement_1) ◽  
pp. S290-S291
Author(s):  
Johannes Lieslehto ◽  
Erika Jääskeläinen ◽  
Jouko Miettunen ◽  
Matti Isohanni ◽  
Dominic Dwyer ◽  
...  

Abstract Background Previous machine learning studies using structural MRI (sMRI) have been able to separate schizophrenia from controls with relatively high (about 80%) sensitivity and specificity (Kambeitz et al. Neuropsychopharmacology 2015). Interestingly, prediction accuracy in first-episode psychosis is lower compared to older and probably more chronic patients. One possibility is that the appearance of the neurodiagnostic fingerprints (NF) originated from the schizophrenia vs. controls classifier become more visible over time in schizophrenia due to the progressive nature of the disorder. Methods Using the Cobre sample (70 schizophrenia and 74 controls), we trained support vector machine (SVM) to differentiate schizophrenia from controls using sMRI. Next, we utilized the Northern Finland Birth Cohort 1966 (NFBC 1966) sample of 29 schizophrenia and 61 non-psychotic controls who participated in the nine-year follow-up. We applied the Cobre-trained SVM models at the baseline (participants 34 years old) and the follow-up (participants 43 years old) using out of sample cross-validation without any in-between retraining. Two independent schizophrenia datasets (the Neuromorphometry by Computer Algorithm Chicago [NMorphCH] and the Consortium for Neuropsychiatric Phenomics [CNP]) were utilized for replication analyses of the SVM generalizability. To address the possibility that the NF mainly capture some general psychopathology, we tested whether the NF generalize to depression using two independent MDD samples from Munich and Münster, Germany. Results Using the Cobre-trained SVM models for schizophrenia vs. controls differentiation in the NFBC 1966, we found balanced accuracy (i.e. mean of sensitivity and specificity, [BAC]) of 72.8% (sensitivity=58.6%, specificity=86.9%) at the baseline and BAC of 79.7% (sensitivity=75.9%, specificity=83.6%) at the follow-up. In the NFBC 1966 schizophrenia patients, we found that SVM decision scores varied as a function of timepoint into the direction of more schizophrenia-likeness at the follow-up (paired T-test, Cohen’s d=0.58, P=0.004). The same was not true in controls (Cohen’s d=0.09, P=0.49). The SVM decision score difference*timepoint interaction related to the decrease of hippocampus and medial prefrontal cortex. The SVM models’ performance was also validated at the two replication samples (BAC of 77.5% in the CNP and BAC of 69.1% in the NMorphCH). In the NFBC 1966 the strongest clinical variable correlating with the trajectory of SVM decision scores over the follow-up was poor performance in the California Verbal Learning Test. This finding was also replicated in the CNP dataset. Further, in the NFBC 1966, those schizophrenia patients with a low degree of SVM decision scores had a higher probability of being in remission, being able to work, and being without antipsychotic medication at the follow-up. The generalization of the SVM models to MDD was worse compared to schizophrenia classification (DeLong’s tests for the two ROC curves: P<0.001). Discussion The degree of schizophrenia-related neurodiagnostic fingerprints appear to magnify over time in schizophrenia. By contrast, the discernibility of these fingerprints in controls does not change over time. This indicates that the NF captures some schizophrenia-related progressive neural changes, and not, e.g., normal aging-related brain volume loss. The fingerprints were also generalizable to other schizophrenia samples. Further, the fingerprints seem to have some disorder specificity as the SVM models do not generalize to depression. Lastly, it appears that a low degree of schizophrenia-related NF in schizophrenia might possess some value in predicting patients’ future remission and recovery-related factors.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 179.2-179
Author(s):  
G. Robinson ◽  
J. Peng ◽  
P. Dönnes ◽  
L. Coelewij ◽  
M. Naja ◽  
...  

Background:Juvenile-onset systemic lupus erythematosus (JSLE) is a complex and heterogeneous disease characterised by diagnosis and treatment delays. An unmet need exists to better characterise the immunological profile of JSLE patients and investigate its links with the disease trajectory over time.Objectives:A machine learning (ML) approach was applied to explore new diagnostic signatures for JSLE based on immune-phenotyping data and stratify patients by specific immune characteristics to investigate longitudinal clinical outcome.Methods:Immune-phenotyping of 28 T-cell, B-cell and myeloid-cell subsets in 67 age and sex-matched JSLE patients and 39 healthy controls (HCs) was performed by flow cytometry. A balanced random forest (BRF) ML predictive model was developed (10,000 decision trees). 10-fold cross validation, Sparse Partial Least Squares-Discriminant Analysis (sPLS-DA) and logistic regression was used to validate the model. Longitudinal clinical data were related to the immunological features identified by ML analysis.Results:The BRF-model discriminated JSLE patients from healthy controls with 91% prediction accuracy suggesting that JSLE patients could be distinguished from HCs with high confidence using immunological parameters. The top-ranked immunological features from the BRF-model were confirmed using sPLS-DA and logistic regression and included CD19+ unswitched memory B-cells, naïve B-cells, CD14+monocytes and total CD4+, CD8+and memory T-cell subsets.K-mean clustering was applied to stratify patients using the validated signature. Four groups were identified, each with a distinct immune and clinical profile. Notably, CD8+T-cell subsets were important in driving patient stratification while B-cell markers were similarly expressed across the JSLE cohort. JSLE patients with elevated effector memory CD8+T-cell frequencies had more persistently active disease over time, and this was associated with increased treatment burden and prevalence of lupus nephritis. Finally, network analysis identified specific clinical features associated with each of the top JSLE immune-signature variables.Conclusion:Using a combined ML approach, a distinct immune signature was identified that discriminated between JSLE patients and HCs and further stratified patients. This signature could have diagnostic and therapeutic implications. Further immunological association studies are warranted to develop data-driven personalised medicine approaches for JSLE.Acknowledgments:Lupus UK, Rosetrees Trust, Versus ArthritisDisclosure of Interests:George Robinson: None declared, Junjie Peng: None declared, Pierre Dönnes: None declared, Leda Coelewij: None declared, Meena Naja: None declared, Anna Radziszewska: None declared, Chris Wincup: None declared, Hannah Peckham: None declared, David Isenberg Consultant of: Study Investigator and Consultant to Genentech, Yiannis Ioannou: None declared, Ines Pineda Torra: None declared, Coziana Ciurtin Grant/research support from: Pfizer, Consultant of: Roche, Modern Biosciences, Elizabeth Jury: None declared


2021 ◽  
Vol 41 (2) ◽  
pp. 99-181
Author(s):  
Ann Blair ◽  
Maryam Patton

Abstract We study the paratexts in Erasmus’ imprints with Johann then Hieronymus Froben of Basel between 1514 and 1536. From Valentina Sebastiani’s bibliography of Johann Froben we observe that Erasmus was a more abundant paratexter than other authors who published with Johann Froben. We supplement that work with a bibliography of Erasmus’ imprints with Hieronymus Froben. We note trends across the Erasmus-Froben corpus, including: a remarkable number of imprints, equally balanced between new editions and re-editions, abundant dedications without correlation to format, indexes in folio volumes especially, a growing attention to errata lists over time. These patterns shed light on one author-printer partnership but also on more general trends in learned publishing in the early 16th century.


2020 ◽  
Author(s):  
Murad Megjhani ◽  
Kalijah Terilli ◽  
Ayham Alkhachroum ◽  
David J. Roh ◽  
Sachin Agarwal ◽  
...  

AbstractObjectiveTo develop a machine learning based tool, using routine vital signs, to assess delayed cerebral ischemia (DCI) risk over time.MethodsIn this retrospective analysis, physiologic data for 540 consecutive acute subarachnoid hemorrhage patients were collected and annotated as part of a prospective observational cohort study between May 2006 and December 2014. Patients were excluded if (i) no physiologic data was available, (ii) they expired prior to the DCI onset window (< post bleed day 3) or (iii) early angiographic vasospasm was detected on admitting angiogram. DCI was prospectively labeled by consensus of treating physicians. Occurrence of DCI was classified using various machine learning approaches including logistic regression, random forest, support vector machine (linear and kernel), and an ensemble classifier, trained on vitals and subject characteristic features. Hourly risk scores were generated as the posterior probability at time t. We performed five-fold nested cross validation to tune the model parameters and to report the accuracy. All classifiers were evaluated for good discrimination using the area under the receiver operating characteristic curve (AU-ROC) and confusion matrices.ResultsOf 310 patients included in our final analysis, 101 (32.6%) patients developed DCI. We achieved maximal classification of 0.81 [0.75-0.82] AU-ROC. We also predicted 74.7 % of all DCI events 12 hours before typical clinical detection with a ratio of 3 true alerts for every 2 false alerts.ConclusionA data-driven machine learning based detection tool offered hourly assessments of DCI risk and incorporated new physiologic information over time.


2020 ◽  
Vol 2 (4) ◽  
pp. 554-568
Author(s):  
Chris Graf ◽  
Dave Flanagan ◽  
Lisa Wylie ◽  
Deirdre Silver

Data availability statements can provide useful information about how researchers actually share research data. We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019. We categorized the data availability statements, and looked at trends over time. We found expected increases in the number of data availability statements submitted over time, and marked increases that correlate with policy changes made by journals. Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.


Sign in / Sign up

Export Citation Format

Share Document