Mining Statistically Significant Substrings based on the Chi-Square Measure

With the tremendous expansion of reservoirs of sequence data stored worldwide, efficient mining of large string databases in various domains including intrusion detection systems, player statistics, texts, and proteins, has emerged as a practical challenge. Searching for an unusual pattern within long strings of data is one of the foremost requirements for many diverse applications. Given a string, the problem is to identify the substrings that differ the most from the expected or normal behavior, i.e., the substrings that are statistically significant (or, in other words, less likely to occur due to chance alone). We first survey and analyze the different statistical measures available to meet this end. Next, we argue that the most appropriate metric is the chi-square measure. Finally, we discuss different approaches and algorithms proposed for retrieving the top-k substrings with the largest chi-square measure.

Download Full-text

Mining Statistically Significant Substrings Based on the Chi-Square Measure

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch004 ◽

2012 ◽

pp. 73-82 ◽

Cited By ~ 1

Author(s):

Sourav Dutta ◽

Arnab Bhattacharya

Keyword(s):

Intrusion Detection ◽

Sequence Data ◽

Long Strings ◽

Intrusion Detection Systems ◽

Chi Square ◽

Detection Systems ◽

Statistical Measures ◽

Normal Behavior ◽

Practical Challenge ◽

String Databases

Download Full-text

Comparative Analysis of Architectures for Intrusion Detection Systems against DoS Attacks in MANETs based on Chi-Square Test

International Journal of Computer Applications ◽

10.5120/15198-3580 ◽

2014 ◽

Vol 87 (4) ◽

pp. 27-33

Author(s):

A. AnnaLakshmi ◽

S. Anandkumar ◽

G. Nagarajan ◽

K. R. Valluvan

Keyword(s):

Comparative Analysis ◽

Intrusion Detection ◽

Intrusion Detection Systems ◽

Dos Attacks ◽

Chi Square ◽

Detection Systems ◽

Chi Square Test

Download Full-text

An Approach for the Application of a Dynamic Multi-Class Classifier for Network Intrusion Detection Systems

Electronics ◽

10.3390/electronics9111759 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1759

Author(s):

Xavier Larriva-Novo ◽

Carmen Sánchez-Zas ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Intelligent Systems ◽

Research Work ◽

Machine Learning Algorithms ◽

Intrusion Detection Systems ◽

Detection Range ◽

Detection Systems ◽

Network Intrusion ◽

Normal Behavior

Currently, the use of machine learning models for developing intrusion detection systems is a technology trend which improvement has been proven. These intelligent systems are trained with labeled datasets, including different types of attacks and the normal behavior of the network. Most of the studies use a unique machine learning model, identifying anomalies related to possible attacks. In other cases, machine learning algorithms are used to identify certain type of attacks. However, recent studies show that certain models are more accurate identifying certain classes of attacks than others. Thus, this study tries to identify which model fits better with each kind of attack in order to define a set of reasoner modules. In addition, this research work proposes to organize these modules to feed a selection system, that is, a dynamic classifier. Finally, the study shows that when using the proposed dynamic classifier model, the detection range increases, improving the detection by each individual model in terms of accuracy.

Download Full-text