Efficient machine learning for attack detection

AbstractDetecting and fending off attacks on computer systems is an enduring problem in computer security. In light of a plethora of different threats and the growing automation used by attackers, we are in urgent need of more advanced methods for attack detection. Manually crafting detection rules is by no means feasible at scale, and automatically generated signatures often lack context, such that they fall short in detecting slight variations of known threats.In the thesis “Efficient Machine Learning for Attack Detection” [35], we address the necessity of advanced attack detection. For the effective application of machine learning in this domain, a periodic retraining over time is crucial. We show that with the right data representation, efficient algorithms for mining substring statistics, and implementations based on probabilistic data structures, training the underlying model for establishing an higher degree of automation for defenses can be achieved in linear time.

Download Full-text

The coming age of adversarial social bot detection

First Monday ◽

10.5210/fm.v26i7.11474 ◽

2021 ◽

Author(s):

Stefano Cresci ◽

Marinella Petrocchi ◽

Angelo Spognardi ◽

Stefano Tognazzi

Keyword(s):

Machine Learning ◽

Computer Security ◽

Field Of Study ◽

True Nature ◽

Proactive Approach ◽

Bot Detection ◽

Illegal Activities ◽

Shed Light ◽

Over Time

Social bots are automated accounts often involved in unethical or illegal activities. Academia has shown how these accounts evolve over time, becoming increasingly smart at hiding their true nature by disguising themselves as genuine accounts. If they evade, bots hunters adapt their solutions to find them: the cat and mouse game. Inspired by adversarial machine learning and computer security, we propose an adversarial and proactive approach to social bot detection, and we call scholars to arms, to shed light on this open and intriguing field of study.

Download Full-text

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

Journal of Artificial Intelligence Research ◽

10.1613/jair.453 ◽

1998 ◽

Vol 8 ◽

pp. 67-91 ◽

Cited By ~ 93

Author(s):

A. Moore ◽

M. S. Lee

Keyword(s):

Machine Learning ◽

Data Structures ◽

Rule Learning ◽

Worst Case ◽

Sufficient Statistics ◽

Frequent Sets ◽

Efficient Machine ◽

Real World Datasets ◽

Selection Algorithms ◽

New Algorithms

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.

Download Full-text

Password Similarity Using Probabilistic Data Structures

Journal of Cybersecurity and Privacy ◽

10.3390/jcp1010005 ◽

2020 ◽

Vol 1 (1) ◽

pp. 78-92

Author(s):

Davide Berardi ◽

Franco Callegati ◽

Andrea Melis ◽

Marco Prandini

Keyword(s):

Data Structures ◽

Bloom Filters ◽

Substantial Decrease ◽

Probabilistic Data ◽

Frequent Change ◽

Creative Methods ◽

Over Time

Passwords should be easy to remember, yet expiration policies mandate their frequent change. Caught in the crossfire between these conflicting requirements, users often adopt creative methods to perform slight variations over time. While easily fooling the most basic checks for similarity, these schemes lead to a substantial decrease in actual security, because leaked passwords, albeit expired, can be effectively exploited as seeds for crackers. This work describes an approach based on Bloom Filters to detect password similarity, which can be used to discourage password reuse habits. The proposed scheme intrinsically obfuscates the stored passwords to protect them in case of database leaks, and can be tuned to be resistant to common cryptanalytic techniques, making it suitable for usage on exposed systems.

Download Full-text

Anomaly Detection in Market Data Structures Via Machine Learning Algorithms

SSRN Electronic Journal ◽

10.2139/ssrn.3516028 ◽

2020 ◽

Author(s):

Dirk Röder ◽

Henning Mueller

Keyword(s):

Machine Learning ◽

Anomaly Detection ◽

Data Structures ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Market Data

Download Full-text

A Machine Learning Application for Raising WASH Awareness in the Times of COVID-19 Pandemic (Preprint)

10.2196/preprints.25320 ◽

2020 ◽

Cited By ~ 1

Author(s):

Rohan Pandey ◽

Vaibhav Gautam ◽

Ridam Pal ◽

Harsh Bandhey ◽

Lovedeep Singh Dhingra ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

User Feedback ◽

Who Guidelines ◽

The Times ◽

The Right ◽

Local Languages

BACKGROUND The COVID-19 pandemic has uncovered the potential of digital misinformation in shaping the health of nations. The deluge of unverified information that spreads faster than the epidemic itself is an unprecedented phenomenon that has put millions of lives in danger. Mitigating this ‘Infodemic’ requires strong health messaging systems that are engaging, vernacular, scalable, effective and continuously learn the new patterns of misinformation. OBJECTIVE We created WashKaro, a multi-pronged intervention for mitigating misinformation through conversational AI, machine translation and natural language processing. WashKaro provides the right information matched against WHO guidelines through AI, and delivers it in the right format in local languages. METHODS We theorize (i) an NLP based AI engine that could continuously incorporate user feedback to improve relevance of information, (ii) bite sized audio in the local language to improve penetrance in a country with skewed gender literacy ratios, and (iii) conversational but interactive AI engagement with users towards an increased health awareness in the community. RESULTS A total of 5026 people who downloaded the app during the study window, among those 1545 were active users. Our study shows that 3.4 times more females engaged with the App in Hindi as compared to males, the relevance of AI-filtered news content doubled within 45 days of continuous machine learning, and the prudence of integrated AI chatbot “Satya” increased thus proving the usefulness of an mHealth platform to mitigate health misinformation. CONCLUSIONS We conclude that a multi-pronged machine learning application delivering vernacular bite-sized audios and conversational AI is an effective approach to mitigate health misinformation. CLINICALTRIAL Not Applicable

Download Full-text

An efficient machine learning model for malicious activities recognition in water‐based industrial internet of things

Security and Privacy ◽

10.1002/spy2.154 ◽

2021 ◽

Author(s):

Gamal E. I. Selim ◽

Ezz El‐Din Hemdan ◽

Ahmed M. Shehata ◽

Nawal A. El‐Fishawy

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Learning Model ◽

Industrial Internet Of Things ◽

Industrial Internet ◽

Machine Learning Model ◽

Water Based ◽

Efficient Machine

Download Full-text

Analyzing the Interplay Between Random Shuffling and Storage Devices for Efficient Machine Learning

2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass51385.2021.00050 ◽

2021 ◽

Author(s):

Zhi-Lin Ke ◽

Hsiang-Yun Cheng ◽

Chia-Lin Yang ◽

Han-Wei Huang

Keyword(s):

Machine Learning ◽

Storage Devices ◽

Efficient Machine ◽

And Storage

Download Full-text

Cyber-attack detection in healthcare using cyber-physical system and machine learning techniques

Soft Computing ◽

10.1007/s00500-021-05926-8 ◽

2021 ◽

Author(s):

Ahmad Ali AlZubi ◽

Mohammed Al-Maitah ◽

Abdulaziz Alarifi

Keyword(s):

Machine Learning ◽

Physical System ◽

Attack Detection ◽

Machine Learning Techniques ◽

Cyber Physical System ◽

Cyber Attack ◽

Learning Techniques

Download Full-text

An Efficient Machine Learning Framework for Stress Prediction via Sensor Integrated Keyboard Data

IEEE Access ◽

10.1109/access.2021.3094334 ◽

2021 ◽

pp. 1-1

Author(s):

P.B. Pankajavalli ◽

G.S. Karthick ◽

R. Sakthivel

Keyword(s):

Machine Learning ◽

Learning Framework ◽

Stress Prediction ◽

Efficient Machine

Download Full-text

Improvement in prefrontal thalamic connectivity during the early course of the illness in recent-onset psychosis: a 12-month longitudinal follow-up resting-state fMRI study

Psychological Medicine ◽

10.1017/s0033291720004808 ◽

2020 ◽

pp. 1-9

Author(s):

Daniel Bergé ◽

Tyler A. Lesh ◽

Jason Smucny ◽

Cameron S. Carter

Keyword(s):

Resting State ◽

Resting State Fmri ◽

Functional Changes ◽

Recent Onset ◽

Fmri Study ◽

Clinical Meaning ◽

Mixed Pattern ◽

The Right ◽

Over Time

Abstract Background Previous research in resting-state functional magnetic resonance imaging (rs-fMRI) has shown a mixed pattern of disrupted thalamocortical connectivity in psychosis. The clinical meaning of these findings and their stability over time remains unclear. We aimed to study thalamocortical connectivity longitudinally over a 1-year period in participants with recent-onset psychosis. Methods To this purpose, 129 individuals with recent-onset psychosis and 87 controls were clinically evaluated and scanned using rs-fMRI. Among them, 43 patients and 40 controls were re-scanned and re-evaluated 12 months later. Functional connectivity between the thalamus and the rest of the brain was calculated using a seed to voxel approach, and then compared between groups and correlated with clinical features cross-sectionally and longitudinally. Results At baseline, participants with recent-onset psychosis showed increased connectivity (compared to controls) between the thalamus and somatosensory and temporal regions (k = 653, T = 5.712), as well as decreased connectivity between the thalamus and left cerebellum and right prefrontal cortex (PFC; k = 201, T = −4.700). Longitudinal analyses revealed increased connectivity over time in recent-onset psychosis (relative to controls) in the right middle frontal gyrus. Conclusions Our results support the concept of abnormal thalamic connectivity as a core feature in psychosis. In agreement with a non-degenerative model of illness in which functional changes occur early in development and do not deteriorate over time, no evidence of progressive deterioration of connectivity during early psychosis was observed. Indeed, regionally increased connectivity between thalamus and PFC was observed.

Download Full-text