Anomaly Detection via Mining Numerical Workflow Relations from Logs

10.36227/techrxiv.12570926.v2 ◽

2020 ◽

Author(s):

Bo Zhang ◽

Hongyu Zhang ◽

Pablo Moscato ◽

Aozhong Zhang

Keyword(s):

Distributed Systems ◽

Anomaly Detection ◽

Text Messages ◽

Distributed File System ◽

Log Data ◽

Sliding Windows ◽

Novel Approach ◽

Hadoop Distributed File System ◽

Blue Gene ◽

Online Anomaly Detection

<div>Complex software intensive systems, especially distributed systems, generate logs for troubleshooting. The logs are text messages recording system events, which can help engineers determine the system's runtime status. This paper proposes a novel approach named ADR (stands for Anomaly Detection by workflow Relations) that employs matrix nullspace to mine numerical relations from log data. The mined relations can be used for both offline and online anomaly detection and facilitate fault diagnosis. We have evaluated ADR on log data collected from two distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM Blue Gene/L supercomputers system). ADR successfully mined 87 and 669 numerical relations from the logs and used them to detect anomalies with high precision and recall. For online anomaly detection, ADR employs PSO (Particle Swarm Optimization) to find the optimal sliding windows' size and achieves fast anomaly detection.</div><div>The experimental results confirm that ADR is effective for both offline and online anomaly detection. </div>

Download Full-text

Anomaly Detection via Mining Numerical Workflow Relations from Logs

10.36227/techrxiv.12570926 ◽

2020 ◽

Author(s):

Bo Zhang ◽

Hongyu Zhang ◽

Pablo Moscato ◽

Aozhong Zhang

Keyword(s):

Distributed Systems ◽

Anomaly Detection ◽

Text Messages ◽

Distributed File System ◽

Log Data ◽

Sliding Windows ◽

Novel Approach ◽

Hadoop Distributed File System ◽

Blue Gene ◽

Online Anomaly Detection

<div>Complex software intensive systems, especially distributed systems, generate logs for troubleshooting. The logs are text messages recording system events, which can help engineers determine the system's runtime status. This paper proposes a novel approach named ADR (stands for Anomaly Detection by workflow Relations) that employs matrix nullspace to mine numerical relations from log data. The mined relations can be used for both offline and online anomaly detection and facilitate fault diagnosis. We have evaluated ADR on log data collected from two distributed systems, HDFS (Hadoop Distributed File System) and BGL (IBM Blue Gene/L supercomputers system). ADR successfully mined 87 and 669 numerical relations from the logs and used them to detect anomalies with high precision and recall. For online anomaly detection, ADR employs PSO (Particle Swarm Optimization) to find the optimal sliding windows' size and achieves fast anomaly detection.</div><div>The experimental results confirm that ADR is effective for both offline and online anomaly detection. </div>

Download Full-text

A Novel Approach of Fair Scheduling to Enhance Performance of Hadoop Distributed File System

2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) ◽

10.1109/ecace.2019.8679252 ◽

2019 ◽

Author(s):

Rubayet Hussain ◽

Mostafijur Rahman ◽

Khawja Imran Masud ◽

Sheikh Md Roky ◽

Md. Nasim Akhtar ◽

...

Keyword(s):

File System ◽

Distributed File System ◽

Fair Scheduling ◽

Novel Approach ◽

Hadoop Distributed File System

Download Full-text

A Comprehensive Survey for Hadoop Distributed File System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v11i230260 ◽

2021 ◽

pp. 46-57

Author(s):

Karwan Jameel Merceedi ◽

Nareen Abdulla Sabry

Keyword(s):

Distributed Systems ◽

Data Storage ◽

File System ◽

Low Cost ◽

File Systems ◽

Cost Effective ◽

Distributed File System ◽

Software Frameworks ◽

Hadoop Distributed File System ◽

Basic Ideas

In the last few days, data and the internet have become increasingly growing, occurring in big data. For these problems, there are many software frameworks used to increase the performance of the distributed system. This software is used for available ample data storage. One of the most beneficial software frameworks used to utilize data in distributed systems is Hadoop. This software creates machine clustering and formatting the work between them. Hadoop consists of two major components: Hadoop Distributed File System (HDFS) and Map Reduce (MR). By Hadoop, we can process, count, and distribute each word in a large file and know the number of affecting for each of them. The HDFS is designed to effectively store and transmit colossal data sets to high-bandwidth user applications. The differences between this and other file systems provided are relevant. HDFS is intended for low-cost hardware and is exceptionally tolerant to defects. Thousands of computers in a vast cluster both have directly associated storage functions and user programmers. The resource scales with demand while being cost-effective in all sizes by distributing storage and calculation through numerous servers. Depending on the above characteristics of the HDFS, many researchers worked in this field trying to enhance the performance and efficiency of the addressed file system to be one of the most active cloud systems. This paper offers an adequate study to review the essential investigations as a trend beneficial for researchers wishing to operate in such a system. The basic ideas and features of the investigated experiments were taken into account to have a robust comparison, which simplifies the selection for future researchers in this subject. According to many authors, this paper will explain what Hadoop is and its architectures, how it works, and its performance analysis in a distributed systems. In addition, assessing each Writing and compare with each other.

Download Full-text

Improving downloading performance in hadoop distributed file system

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.02060 ◽

2010 ◽

Vol 30 (8) ◽

pp. 2060-2065 ◽

Cited By ~ 4

Author(s):

Ning CAO ◽

Zhong-hai WU ◽

Hong-zhi LIU ◽

Qi-xun ZHANG

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

A Technique For Big Statistics Security Based on Hadoop Distributed File System

SSRN Electronic Journal ◽

10.2139/ssrn.3508526 ◽

2019 ◽

Author(s):

Sindhu D M ◽

DR.Ravikumar G.K ◽

Manu Y.M

Keyword(s):

File System ◽

Distributed File System ◽

Hadoop Distributed File System

Download Full-text

A flexible framework for anomaly Detection via dimensionality reduction

Neural Computing and Applications ◽

10.1007/s00521-021-05839-5 ◽

2021 ◽

Author(s):

Alireza Vafaei Sadr ◽

Bruce A. Bassett ◽

M. Kunz

Keyword(s):

Anomaly Detection ◽

Dimensionality Reduction ◽

Dimensional Space ◽

High Dimensions ◽

Detection Algorithms ◽

Latent Space ◽

Wide Range ◽

Flexible Framework ◽

Online Anomaly Detection ◽

Python Package

AbstractAnomaly detection is challenging, especially for large datasets in high dimensions. Here, we explore a general anomaly detection framework based on dimensionality reduction and unsupervised clustering. DRAMA is released as a general python package that implements the general framework with a wide range of built-in options. This approach identifies the primary prototypes in the data with anomalies detected by their large distances from the prototypes, either in the latent space or in the original, high-dimensional space. DRAMA is tested on a wide variety of simulated and real datasets, in up to 3000 dimensions, and is found to be robust and highly competitive with commonly used anomaly detection algorithms, especially in high dimensions. The flexibility of the DRAMA framework allows for significant optimization once some examples of anomalies are available, making it ideal for online anomaly detection, active learning, and highly unbalanced datasets. Besides, DRAMA naturally provides clustering of outliers for subsequent analysis.

Download Full-text

Unsupervised Online Anomaly Detection to Identify Cyber-Attacks on Internet Connected Photovoltaic System Inverters

2021 IEEE Power and Energy Conference at Illinois (PECI) ◽

10.1109/peci51586.2021.9435234 ◽

2021 ◽

Author(s):

C. Birk Jones ◽

Adrian Chavez ◽

Shamina Hossain-McKenzie ◽

Nicholas Jacobs ◽

Adam Summers ◽

...

Keyword(s):

Anomaly Detection ◽

Cyber Attacks ◽

Photovoltaic System ◽

Online Anomaly Detection

Download Full-text

Online Anomaly Detection of Streaming Data for Space Payloads Based on Improved GNG Algorithm

Proceedings of the 2019 4th International Conference on Multimedia Systems and Signal Processing - ICMSSP 2019 ◽

10.1145/3330393.3330412 ◽

2019 ◽

Author(s):

Taisheng Zheng ◽

Lei Song ◽

Haoran Liang ◽

Bingjun Guo ◽

Lili Guo

Keyword(s):

Anomaly Detection ◽

Streaming Data ◽

Online Anomaly Detection

Download Full-text

Data protection on hadoop distributed file system by using encryption algorithms: a systematic literature review

Journal of Physics Conference Series ◽

10.1088/1742-6596/1444/1/012012 ◽

2020 ◽

Vol 1444 ◽

pp. 012012

Author(s):

Meisuchi Naisuty ◽

Achmad Nizar Hidayanto ◽

Nabila Clydea Harahap ◽

Ahmad Rosyiq ◽

Agus Suhanto ◽

...

Keyword(s):

Literature Review ◽

Systematic Literature Review ◽

Data Protection ◽

File System ◽

Distributed File System ◽

Hadoop Distributed File System ◽

Encryption Algorithms

Download Full-text