Dynamic and Advanced Data Mining for Progressing Technological Development

Machine Learning Techniques for Network Intrusion Detection

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch012 ◽

2010 ◽

pp. 273-299 ◽

Cited By ~ 1

Author(s):

Tich Phuoc Tran ◽

Pohsiang Tsai ◽

Tony Jan ◽

Xiangjian He

Keyword(s):

Machine Learning ◽

Network Security ◽

Intrusion Detection ◽

Computer Systems ◽

Computational Cost ◽

Training Data ◽

Machine Learning Techniques ◽

Complex Nature ◽

Processing Power ◽

Linear Relationships

Most of the currently available network security techniques are not able to cope with the dynamic and increasingly complex nature of cyber attacks on distributed computer systems. Therefore, an automated and adaptive defensive tool is imperative for computer networks. Alongside the existing prevention techniques such as encryption and firewalls, Intrusion Detection System (IDS) has established itself as an emerging technology that is able to detect unauthorized access and abuse of computer systems by both internal users and external offenders. Most of the novel approaches in this field have adopted Artificial Intelligence (AI) technologies such as Artificial Neural Networks (ANN) to improve performance as well as robustness of IDS. The true power and advantages of ANN lie in its ability to represent both linear and non-linear relationships and learn these relationships directly from the data being modeled. However, ANN is computationally expensive due to its demanding processing power and this leads to overfitting problem, i.e. the network is unable to extrapolate accurately once the input is outside of the training data range. These limitations challenge IDS with low detection rate, high false alarm rate and excessive computation cost. This chapter proposes a novel Machine Learning (ML) algorithm to alleviate those difficulties of existing AI techniques in the area of computer network security. The Intrusion Detection dataset provided by Knowledge Discovery and Data Mining (KDD-99) is used as a benchmark to compare our model with other existing techniques. Extensive empirical analysis suggests that the proposed method outperforms other state-of-the-art learning algorithms in terms of learning bias, generalization variance and computational cost. It is also reported to significantly improve the overall detection capability for difficult-to-detect novel attacks which are unseen or irregularly occur in the training phase.

Download Full-text

On the Mining of Cointegrated Econometric Models

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch007 ◽

2010 ◽

pp. 122-135

Author(s):

J.L. van Velsen ◽

R. Choenni

Keyword(s):

Common Factor ◽

Special Kind ◽

Information Criterion ◽

Factor Models ◽

Econometric Models ◽

Heuristic Model ◽

Future Developments ◽

Model Generator ◽

Common Factor Models

The authors describe a process of extracting a cointegrated model from a database. An important part of the process is a model generator that automatically searches for cointegrated models and orders them according to an information criterion. They build and test a non-heuristic model generator that mines for common factor models, a special kind of cointegrated models. An outlook on potential future developments is given.

Download Full-text

Intrusion Detection Using Machine Learning

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch005 ◽

2010 ◽

pp. 70-107

Author(s):

Mohammed M. Mazid ◽

A. B.M. Shawkat Ali ◽

Kevin S. Tickle

Keyword(s):

Intrusion Detection ◽

Medical Diagnosis ◽

Computer Network ◽

Model Identification ◽

Intrusion Detection Systems ◽

Network Technology ◽

Rule Based ◽

Detection Systems ◽

Research Areas ◽

Future Direction

Intrusion detection has received enormous attention from the beginning of computer network technology. It is the task of detecting attacks against a network and its resources. To detect and counteract any unauthorized activity, it is desirable for network and system administrators to monitor the activities in their network. Over the last few years a number of intrusion detection systems have been developed and are in use for commercial and academic institutes. But still there have some challenges to be solved. This chapter will provide the review, demonstration and future direction on intrusion detection. The authors’ emphasis on Intrusion Detection is various kinds of rule based techniques. The research aims are also to summarize the effectiveness and limitation of intrusion detection technologies in the medical diagnosis, control and model identification in engineering, decision making in marketing and finance, web and text mining, and some other research areas.

Download Full-text

ODARM

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch003 ◽

2010 ◽

pp. 40-56

Author(s):

Fu Xiao ◽

Xie Li

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Domain Knowledge ◽

Reduction Rate ◽

Main Idea ◽

False Positives ◽

Training Data ◽

Intrusion Detection Systems ◽

Data Mining Technique ◽

Detection Systems

Intrusion Detection Systems (IDSs) are widely deployed with increasing of unauthorized activities and attacks. However they often overload security managers by triggering thousands of alerts per day. And up to 99% of these alerts are false positives (i.e. alerts that are triggered incorrectly by benign events). This makes it extremely difficult for managers to correctly analyze security state and react to attacks. In this chapter the authors describe a novel system for reducing false positives in intrusion detection, which is called ODARM (an Outlier Detection-Based Alert Reduction Model). Their model based on a new data mining technique, outlier detection that needs no labeled training data, no domain knowledge and little human assistance. The main idea of their method is using frequent attribute values mined from historical alerts as the features of false positives, and then filtering false alerts by the score calculated based on these features. In order to filter alerts in real time, they also design a two-phrase framework that consists of the learning phrase and the online filtering phrase. Now they have finished the prototype implementation of our model. And through the experiments on DARPA 2000, they have proved that their model can effectively reduce false positives in IDS alerts. And on real-world dataset, their model has even higher reduction rate.

Download Full-text

Application of Machine Learning Techniques for Railway Health Monitoring

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch016 ◽

2010 ◽

pp. 396-421

Author(s):

G.M. Shafiullah ◽

Adam Thompson ◽

Peter J. Wolfs ◽

A.B.M. Shawkat Ali

Keyword(s):

Machine Learning ◽

Health Monitoring ◽

Absolute Error ◽

Machine Learning Techniques ◽

Vertical Acceleration ◽

Lateral Instability ◽

Learning Techniques ◽

Safety And Reliability ◽

Efficient Data ◽

Modern Machine

Emerging wireless sensor networking (WSN) and modern machine learning techniques have encouraged interest in the development of vehicle health monitoring (VHM) systems that ensure secure and reliable operation of the rail vehicle. The performance of rail vehicles running on railway tracks is governed by the dynamic behaviours of railway bogies especially in the cases of lateral instability and track irregularities. In order to ensure safety and reliability of railway in this chapter, a forecasting model has been developed to investigate vertical acceleration behaviour of railway wagons attached to a moving locomotive using modern machine learning techniques. Initially, an energy-efficient data acquisition model has been proposed for WSN applications using popular learning algorithms. Later, a prediction model has been developed to investigate both front and rear body vertical acceleration behaviour. Different types of models can be built using a uniform platform to evaluate their performances and estimate different attributes’ correlation coefficient (CC), root mean square error (RMSE), mean absolute error (MAE), root relative squared error (RRSE), relative absolute error (RAE) and computation complexity for each of the algorithm. Finally, spectral analysis of front and rear body vertical condition is produced from the predicted data using Fast Fourier Transform (FFT) and used to generate precautionary signals and system status which can be used by the locomotive driver for deciding upon necessary actions.

Download Full-text

Financial Data Mining Using Flexible ICA-GARCH Models

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch011 ◽

2010 ◽

pp. 255-272

Author(s):

Philip L.H. Yu ◽

Edmond H.C. Wu ◽

W.K. Li

Keyword(s):

Data Mining ◽

Time Series ◽

Risk Management ◽

Independent Component Analysis ◽

Value At Risk ◽

Financial Time Series ◽

Multivariate Time Series ◽

Independent Component ◽

Garch Models ◽

Financial Time

As a data mining technique, independent component analysis (ICA) is used to separate mixed data signals into statistically independent sources. In this chapter, we apply ICA for modeling multivariate volatility of financial asset returns which is a useful tool in portfolio selection and risk management. In the finance literature, the generalized autoregressive conditional heteroscedasticity (GARCH) model and its variants such as EGARCH and GJR-GARCH models have become popular standard tools to model the volatility processes of financial time series. Although univariate GARCH models are successful in modeling volatilities of financial time series, the problem of modeling multivariate time series has always been challenging. Recently, Wu, Yu, & Li (2006) suggested using independent component analysis (ICA) to decompose multivariate time series into statistically independent time series components and then separately modeled the independent components by univariate GARCH models. In this chapter, we extend this class of ICA-GARCH models to allow more flexible univariate GARCH-type models. We also apply the proposed models to compute the value-at-risk (VaR) for risk management applications. Backtesting and out-of-sample tests suggest that the ICA-GARCH models have a clear cut advantage over some other approaches in value-at-risk estimation.

Download Full-text

A Re-Ranking Method of Search Results Based on Keyword and User Interest

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch006 ◽

2010 ◽

pp. 108-121

Author(s):

Ming Xu ◽

Hong-Rong Yang ◽

Ning Zheng

Keyword(s):

Experimental Results ◽

Search Process ◽

Improved Method ◽

User Interest ◽

High Recall ◽

String Match ◽

Electronic Evidence ◽

Search Results ◽

Text String ◽

Digital Forensic

It is a pivotal task for a forensic investigator to search a hard disk to find interesting evidences. Currently, most search tools in digital forensic field, which utilize text string match and index technology, produce high recall (100%) and low precision. Therefore, the investigators often waste vast time on huge irrelevant search hits. In this chapter, an improved method for ranking of search results was proposed to reduce human efforts on locating interesting hits. The K-UIH (the keyword and user interest hierarchies) was constructed by both investigator-defined keywords and user interest learnt from electronic evidence adaptive, and then the K-UIH was used to re-rank the search results. The experimental results indicated that the proposed method is feasible and valuable in digital forensic search process.

Download Full-text

Time Series Analysis and Structural Change Detection

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch015 ◽

2010 ◽

pp. 377-395

Author(s):

Kwok Pan Pang

Keyword(s):

Time Series ◽

Structural Change ◽

Time Series Analysis ◽

Change Detection ◽

Structural Breaks ◽

Time Series Data ◽

Series Data ◽

Series Analysis ◽

Us Economy ◽

Over Time

Most research on time series analysis and forecasting is normally based on the assumption of no structural change, which implies that the mean and the variance of the parameter in the time series model are constant over time. However, when structural change occurs in the data, the time series analysis methods based on the assumption of no structural change will no longer be appropriate; and thus there emerges another approach to solving the problem of structural change. Almost all time series analysis or forecasting methods always assume that the structure is consistent and stable over time, and all available data will be used for the time series prediction and analysis. When any structural change occurs in the middle of time series data, any analysis result and forecasting drawn from full data set will be misleading. Structural change is quite common in the real world. In the study of a very large set of macroeconomic time series that represent the ‘fundamentals’ of the US economy, Stock and Watson (1996) has found evidence of structural instability in the majority of the series. Besides, ignoring structural change reduces the prediction accuracy. Persaran and Timmermann (2003), Hansen (2001) and Clement and Hendry (1998, 1999) showed that structural change is pervasive in time series data, ignoring structural breaks which often occur in time series significantly reduces the accuracy of the forecast, and results in misleading or wrong conclusions. This chapter mainly focuses on introducing the most common time series methods. The author highlights the problems when applying to most real situations with structural changes, briefly introduce some existing structural change methods, and demonstrate how to apply structural change detection in time series decomposition.

Download Full-text

Bayesian Networks in the Health Domain

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch014 ◽

2010 ◽

pp. 342-376 ◽

Cited By ~ 2

Author(s):

Shyamala G. Nadathur

Keyword(s):

Machine Learning ◽

Missing Data ◽

Bayesian Networks ◽

Network Model ◽

Graphical Models ◽

Probabilistic Graphical Models ◽

Large Datasets ◽

Statistical Dependence ◽

Health Domain ◽

New Methods

Large datasets are regularly collected in biomedicine and healthcare (here referred to as the ‘health domain’). These datasets have some unique characteristics and problems. Therefore there is a need for methods which allow modelling in spite of the uniqueness of the datasets, capable of dealing with missing data, allow integrating data from various sources, explicitly indicate statistical dependence and independence and allow modelling with uncertainties. These requirements have given rise to an influx of new methods, especially from the fields of machine learning and probabilistic graphical models. In particular, Bayesian Networks (BNs), which are a type of graphical network model with directed links that offer a general and versatile approach to capturing and reasoning with uncertainty. In this chapter some background mathematics/statistics, description and relevant aspects of building the networks are given to better understand s and appreciate BN’s potential. There are also brief discussions of their applications, the unique value and the challenges of this modelling technique for the domain. As will be seen in this chapter, with the additional advantages the BNs can offer, it is not surprising that it is becoming an increasingly popular modelling tool in the health domain.

Download Full-text

Patterns Relevant to the Temporal Data-Context of an Alarm of Interest

Dynamic and Advanced Data Mining for Progressing Technological Development ◽

10.4018/978-1-60566-908-3.ch002 ◽

2010 ◽

pp. 18-39 ◽

Cited By ~ 1

Author(s):

Savo Kordic ◽

Chiou Peng Lam ◽

Jitian Xiao ◽

Huaizhong Li

Keyword(s):

Data Mining ◽

Main Idea ◽

High Volume ◽

Chemical Plants ◽

Data Mining Technique ◽

Time Intervals ◽

Alarm Systems ◽

Large Numbers ◽

Petroleum Refineries ◽

Active Exploration

The productivity of chemical plants and petroleum refineries depends on the performance of alarm systems. Alarm history collected from distributed control systems (DCS) provides useful information about past plant alarm system performance. However, the discovery of patterns and relationships from such data can be very difficult and costly. Due to various factors such as a high volume of alarm data (especially during plant upsets), huge amounts of nuisance alarms, and very large numbers of individual alarm tags, manual identification and analysis of alarm logs is usually a labor-intensive and time-consuming task. This chapter describes a data mining approach for analyzing alarm logs in a chemical plant. The main idea of the approach is to investigate dependencies between alarms effectively by considering the temporal context and time intervals between different alarm types, and then employing a data mining technique capable of discovering patterns associated with these time intervals. A prototype has been implemented to allow an active exploration of the alarm grouping data space relevant to the tags of interest.

Download Full-text

Dynamic and Advanced Data Mining for Progressing Technological Development
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Machine Learning Techniques for Network Intrusion Detection

On the Mining of Cointegrated Econometric Models

Intrusion Detection Using Machine Learning

ODARM

Application of Machine Learning Techniques for Railway Health Monitoring

Financial Data Mining Using Flexible ICA-GARCH Models

A Re-Ranking Method of Search Results Based on Keyword and User Interest

Time Series Analysis and Structural Change Detection

Bayesian Networks in the Health Domain

Patterns Relevant to the Temporal Data-Context of an Alarm of Interest

Export Citation Format

Dynamic and Advanced Data Mining for Progressing Technological DevelopmentLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Machine Learning Techniques for Network Intrusion Detection

On the Mining of Cointegrated Econometric Models

Intrusion Detection Using Machine Learning

ODARM

Application of Machine Learning Techniques for Railway Health Monitoring

Financial Data Mining Using Flexible ICA-GARCH Models

A Re-Ranking Method of Search Results Based on Keyword and User Interest

Time Series Analysis and Structural Change Detection

Bayesian Networks in the Health Domain

Patterns Relevant to the Temporal Data-Context of an Alarm of Interest

Dynamic and Advanced Data Mining for Progressing Technological Development
Latest Publications