Adaptive Drift Detection Mechanism for Non-Stationary Data Stream

Mining is a challenging and important task in a non-stationary data stream. It is used in financial sectors, web log analysis, sensor networks, network traffic management, etc. In this environment, data distribution may change overtime and is called concept drift. So, it is necessary to identify the changes and address them to keep the model relevant to the incoming data. Many researchers have used Drift Detection Method (DDM). However, DDM is very sensitive to detect gradual drift where the detection delay is high. In this paper, we propose Adaptive Drift Detection Method (ADDM) which improves the performance of the drift detection mechanism. The ADDM uses a new parameter to detect the gradual drift in order to reduce the detection delay. The proposed method, ADDM, experiments with six synthetic datasets and four real-world datasets. Experimental results confirm that ADDM reduces the drift detection delay and false-positive rate (FPR) while preserving high classification accuracy.

Download Full-text

Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems

Sensors ◽

10.3390/s20072131 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2131 ◽

Cited By ~ 3

Author(s):

Affan Ahmed Toor ◽

Muhammad Usman ◽

Farah Younas ◽

Alvis Cheuk M. Fong ◽

Sajid Ali Khan ◽

...

Keyword(s):

Data Streams ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Health Data ◽

Smart Devices ◽

Detection Delay ◽

Medical Sensors ◽

Synthetic Datasets ◽

Almost All

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.

Download Full-text

Concept drift detection with False Positive rate for multi-label classification in IoT data stream

2020 International Conference on UK-China Emerging Technologies (UCET) ◽

10.1109/ucet51115.2020.9205421 ◽

2020 ◽

Author(s):

Pingfan Wang ◽

Nanlin Jin ◽

Gerhard Fehringer

Keyword(s):

False Positive ◽

Data Stream ◽

Concept Drift ◽

False Positive Rate ◽

Positive Rate ◽

Concept Drift Detection

Download Full-text

PRATD: A Phased Remote Access Trojan Detection Method with Double-Sided Features

Electronics ◽

10.3390/electronics9111894 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1894

Author(s):

Chun Guo ◽

Zihua Song ◽

Yuan Ping ◽

Guowei Shen ◽

Yuhei Cui ◽

...

Keyword(s):

False Positive ◽

Detection Method ◽

False Positive Rate ◽

True Positive Rate ◽

Remote Access ◽

Detection Methods ◽

Security Threats ◽

True Positive ◽

Trojan Detection ◽

Positive Rate

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.

Download Full-text

Bhattacharyya Distance based Concept Drift Detection Method For evolving data stream

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115303 ◽

2021 ◽

pp. 115303

Author(s):

Ishwar Baidari ◽

Nagaraj Honnikoll

Keyword(s):

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Bhattacharyya Distance ◽

Concept Drift Detection ◽

Evolving Data

Download Full-text

Gene set enrichment for reproducible science: comparison of CERNO and eight other algorithms

Bioinformatics ◽

10.1093/bioinformatics/btz447 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5146-5154 ◽

Cited By ~ 19

Author(s):

Joanna Zyla ◽

Michal Marczyk ◽

Teresa Domaszewska ◽

Stefan H E Kaufmann ◽

Joanna Polanska ◽

...

Keyword(s):

False Positive Rate ◽

R Package ◽

Supplementary Information ◽

Computational Time ◽

P Value ◽

Gene Set ◽

Related Data ◽

Novel Approach ◽

Positive Rate ◽

Real World Datasets

Abstract Motivation Analysis of gene set (GS) enrichment is an essential part of functional omics studies. Here, we complement the established evaluation metrics of GS enrichment algorithms with a novel approach to assess the practical reproducibility of scientific results obtained from GS enrichment tests when applied to related data from different studies. Results We evaluated eight established and one novel algorithm for reproducibility, sensitivity, prioritization, false positive rate and computational time. In addition to eight established algorithms, we also included Coincident Extreme Ranks in Numerical Observations (CERNO), a flexible and fast algorithm based on modified Fisher P-value integration. Using real-world datasets, we demonstrate that CERNO is robust to ranking metrics, as well as sample and GS size. CERNO had the highest reproducibility while remaining sensitive, specific and fast. In the overall ranking Pathway Analysis with Down-weighting of Overlapping Genes, CERNO and over-representation analysis performed best, while CERNO and GeneSetTest scored high in terms of reproducibility. Availability and implementation tmod package implementing the CERNO algorithm is available from CRAN (cran.r-project.org/web/packages/tmod/index.html) and an online implementation can be found at http://tmod.online/. The datasets analyzed in this study are widely available in the KEGGdzPathwaysGEO, KEGGandMetacoreDzPathwaysGEO R package and GEO repository. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Detection of DoS Attacks Using ARFIMA Modeling of GOOSE Communication in IEC 61850 Substations

Energies ◽

10.3390/en13195176 ◽

2020 ◽

Vol 13 (19) ◽

pp. 5176

Author(s):

Ghada Elbez ◽

Hubert B. Keller ◽

Atul Bohara ◽

Klara Nahrstedt ◽

Veit Hagenmeyer

Keyword(s):

False Positive ◽

False Positive Rate ◽

Denial Of Service ◽

Statistical Hypothesis ◽

Physical Security ◽

Dos Attacks ◽

Iec 61850 ◽

Detection Delay ◽

Positive Rate ◽

Arfima Model

Integration of Information and Communication Technology (ICT) in modern smart grids (SGs) offers many advantages including the use of renewables and an effective way to protect, control and monitor the energy transmission and distribution. To reach an optimal operation of future energy systems, availability, integrity and confidentiality of data should be guaranteed. Research on the cyber-physical security of electrical substations based on IEC 61850 is still at an early stage. In the present work, we first model the network traffic data in electrical substations, then, we present a statistical Anomaly Detection (AD) method to detect Denial of Service (DoS) attacks against the Generic Object Oriented Substation Event (GOOSE) network communication. According to interpretations on the self-similarity and the Long-Range Dependency (LRD) of the data, an Auto-Regressive Fractionally Integrated Moving Average (ARFIMA) model was shown to describe well the GOOSE communication in the substation process network. Based on this ARFIMA-model and in view of cyber-physical security, an effective model-based AD method is developed and analyzed. Two variants of the statistical AD considering statistical hypothesis testing based on the Generalized Likelihood Ratio Test (GLRT) and the cumulative sum (CUSUM) are presented to detect flooding attacks that might affect the availability of the data. Our work presents a novel AD method, with two different variants, tailored to the specific features of the GOOSE traffic in IEC 61850 substations. The statistical AD is capable of detecting anomalies at unknown change times under the realistic assumption of unknown model parameters. The performance of both variants of the AD method is validated and assessed using data collected from a simulation case study. We perform several Monte-Carlo simulations under different noise variances. The detection delay is provided for each detector and it represents the number of discrete time samples after which an anomaly is detected. In fact, our statistical AD method with both variants (CUSUM and GLRT) has around half the false positive rate and a smaller detection delay when compared with two of the closest works found in the literature. Our AD approach based on the GLRT detector has the smallest false positive rate among all considered approaches. Whereas, our AD approach based on the CUSUM test has the lowest false negative rate thus the best detection rate. Depending on the requirements as well as the costs of false alarms or missed anomalies, both variants of our statistical detection method can be used and are further analyzed using composite detection metrics.

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

Outlier Detection Method for Flash Flood Disaster Monitoring Data based on Information Entropy

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012013 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012013

Author(s):

Yongzhi Chen ◽

Ziao Xu ◽

Chaoqun Niu

Keyword(s):

Outlier Detection ◽

Information Entropy ◽

Detection Method ◽

Flash Flood ◽

False Positive Rate ◽

Flood Disaster ◽

Detection Methods ◽

Positive Rate ◽

Disaster Monitoring ◽

Local Outlier

Abstract In the research of flash flood disaster monitoring and early warning, the Internet of Things is widely used in real-time information collection. There are abnormal situations such as noise, repetition and errors in a large amount of data collected by sensors, which will lead to false alarm, lower prediction accuracy and other problems. Aiming at the characteristic that outliers flow of sensors will cause obvious fluctuation of information entropy, this paper proposes a local outlier detection method based on information entropy and optimized by sliding window and LOF (Local Outlier Factor). This method can be used to improve the data quality, thus improving the accuracy of disaster prediction. The method is applied to data stream processing of water sensor, and the experimental results show that the method can accurately detect outliers. Compared with the existing detection methods that only use data distance to determine, the test positive rate is improved and the false positive rate is reduced.

Download Full-text

Physical Tampering Detection Using Single COTS Wi-Fi Endpoint

Sensors ◽

10.3390/s21165665 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5665

Author(s):

Poh Yuen Chan ◽

Alexander I-Chi Lai ◽

Pei-Yuan Wu ◽

Ruey-Beei Wu

Keyword(s):

False Positive Rate ◽

True Positive Rate ◽

Relative Orientation ◽

True Positive ◽

Channel State ◽

Detection Mechanism ◽

Tampering Detection ◽

State Information ◽

Positive Rate ◽

Commercial Off The Shelf

This paper proposes a practical physical tampering detection mechanism using inexpensive commercial off-the-shelf (COTS) Wi-Fi endpoint devices with a deep neural network (DNN) on channel state information (CSI) in the Wi-Fi signals. Attributed to the DNN that identifies physical tampering events due to the multi-subcarrier characteristics in CSI, our methodology takes effect using only one COTS Wi-Fi endpoint with a single embedded antenna to detect changes in the relative orientation between the Wi-Fi infrastructure and the endpoint, in contrast to previous sophisticated, proprietary approaches. Preliminary results show that our detectors manage to achieve a 95.89% true positive rate (TPR) with no worse than a 4.12% false positive rate (FPR) in detecting physical tampering events.

Download Full-text

Machine Learning Based Technique for Detection of Rank Attack in RPL based Internet of Things Networks

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i3044.0789s319 ◽

2019 ◽

Vol 8 (9S3) ◽

pp. 244-248

Keyword(s):

Machine Learning ◽

Internet Of Things ◽

Nearest Neighbor ◽

False Positive Rate ◽

Security And Privacy ◽

Weak Links ◽

K Nearest Neighbor ◽

Detection Mechanism ◽

Wormhole Attack ◽

Positive Rate

Internet of Things (IoT) is a new Paradiagram in the network technology. It has the vast application in almost every field like retail, industries, and healthcare etc. It has challenges like security and privacy, robustness, weak links, less power, etc. A major challenge among these is security. Due to the weak connectivity links, these Internet of Things network leads to many attacks in the network layer. RPL is a routing protocol which establishes a path particularly for the constrained nodes in Internet of Things based networks. These RPL based network is exposed to many attacks like black hole attack, wormhole attack, sinkhole attack, rank attack, etc. This paper proposed a detection technique for rank attack based on the machine learning approach called MLTKNN, based on K-nearest neighbor algorithm. The proposed technique was simulated in the Cooja simulation with 30 motes and calculated the true positive rate and false positive rate of the proposed detection mechanism. Finally proved that, the performance of the proposed technique was efficient in terms of the delay, packet delivery rate and in detection of the rank attack.

Download Full-text